Python Strip Usage
In Python, we usually use these function like strip, lstrip, rstrip for striping chars. In most cases, we use them to remove whitespace and newlines by default.
s.strip(‘x’) remove prefix and suffix ‘x‘ from a string, the usage is:
s.rstrip([chars]) is right-strip, remove the trailing characters.
s.lstrip([chars]) is left-strip, remove the leading characters.
strip/rstrip/lstrip will remove characters like ‘\n’ ‘\t’ if called without argument.
As I elaborated in the previous post: How to learn data structures and algorithms, it’s a good opportunity to learn more when we meet a function which we are not familiar with.
Our followed question should be how this is implemented in Python? What’s the worst complexity for this operation?
So let’s dig into code.
First, we need to find the implementation of rstrip, search the keyword ‘rstrip’ in Python’s Github repo, it should be implemented in C, so we add a filter with C programming languages:
CPython stores strings as sequences of Unicode characters, so we should check the definition in Objects/clinic/unicodeobject.c.h, unicode_rstrip in unicodeobject.c.h is a wrapper function, which will call unicode_rstrip_impl to finish strip functionality.
So we continue to search unicode_rstrip_impl in codebase, it’s located at: Objects/unicodeobject.c, follow the function call flow do_argstrip -> do_strip.
A good coding style should put all the strip functions’ implementation into one logic unit, and it’s really coded like this. Have a look at do_strip:
lstrip, rstrip, strip all call this do_strip\ finally.
It’s simple and elegant, the worst complexity is O(N), so you learn it! By the way, there are 1.5w lines of code in unicodeobject.c, and Github seems don’t index it default.
Furthermore, if you have time, you can spend more time studying the other string operations in Python, there are located at cpython/Objects/stringlib.
Because of all kinds of optimizations, strings in Python actually are very complex.
For example, whether a new object is allocated depends on many conditions, if a string’s length is longer than 20 it will not be subject to constant folding. This sometimes leads to not consistent.
Please have a look at .