Coder's Cat

Python String Strip Explained

2019-09-15

Python Strip Usage

In Python, we usually use these function like strip, lstrip, rstrip for striping chars. In most cases, we use them to remove whitespace and newlines by default.

s.strip(‘x’) remove prefix and suffix x from a string, the usage is: string.strip([chars]

s.rstrip([chars]) is right-strip, remove the trailing characters.

s.lstrip([chars]) is left-strip, remove the leading characters.

Note:

strip/rstrip/lstrip will remove characters like ‘\n’ ‘\t’ if called without argument.

Example

string = "coderscat \n\t"
print(string.rstrip()) # no argument provided, remove trailing spaces
=> "coderscat"

string = "sscoderscatsss"
print(string.strip('s')) # remove the prefix and suffix 's'
=> "coderscat"

string = "sscoderscatsss"
print(string.strip('st')) # remove the prefix,suffix 's' and 't'
=> "codersca"

string = "coderscatssss"
print(string.rstrip('s')) # remove the trailing 's'
=> "coderscat"

string = "coderscat"
print(string.rstrip('s')) # no trailing 's', original string returned
=> "coderscat"

Explanation

As I elaborated in the previous post: How to learn data structures and algorithms, it’s a good opportunity to learn more when we meet a function which we are not familiar with.

Our followed question should be how this is implemented in Python? What’s the worst complexity for this operation?

So let’s dig into code.

First, we need to find the implementation of rstrip, search the keyword ‘rstrip’ in Python’s Github repo, it should be implemented in C, so we add a filter with C programming languages:

image-20201112153035911

CPython stores strings as sequences of Unicode characters, so we should check the definition in Objects/clinic/unicodeobject.c.h, unicode_rstrip in unicodeobject.c.h is a wrapper function, which will call unicode_rstrip_impl to finish strip functionality.

So we continue to search unicode_rstrip_impl in codebase, it’s located at: Objects/unicodeobject.c, follow the function call flow do_argstrip -> do_strip.

A good coding style should put all the strip functions’ implementation into one logic unit, and it’s really coded like this. Have a look at do_strip:

static PyObject *
do_strip(PyObject *self, int striptype)
{
Py_ssize_t len, i, j;

if (PyUnicode_READY(self) == -1)
return NULL;

len = PyUnicode_GET_LENGTH(self);

if (PyUnicode_IS_ASCII(self)) {
// blah blah
}
else {
int kind = PyUnicode_KIND(self);
void *data = PyUnicode_DATA(self);

i = 0;
if (striptype != RIGHTSTRIP) {
while (i < len) {
Py_UCS4 ch = PyUnicode_READ(kind, data, i);
if (!Py_UNICODE_ISSPACE(ch))
break;
i++;
}
}

j = len;
if (striptype != LEFTSTRIP) {
j--;
while (j >= i) {
Py_UCS4 ch = PyUnicode_READ(kind, data, j);
if (!Py_UNICODE_ISSPACE(ch))
break;
j--;
}
j++;
}
}

return PyUnicode_Substring(self, i, j);
}

lstrip, rstrip, strip all call this do_strip\ finally.

It’s simple and elegant, the worst complexity is O(N), so you learn it! By the way, there are 1.5w lines of code in unicodeobject.c, and Github seems don’t index it default.

Furthermore, if you have time, you can spend more time studying the other string operations in Python, there are located at cpython/Objects/stringlib.

Going deeper

Because of all kinds of optimizations, strings in Python actually are very complex.

For example, whether a new object is allocated depends on many conditions, if a string’s length is longer than 20 it will not be subject to constant folding. This sometimes leads to not consistent.

Please have a look at .

Join my Email List for more insights, It's Free!😋

Tags: Python