Coder's Cat

Ruby: How to check a string contains another string

2020-01-09

One of the most common operations a programmer use on strings is to check whether a string contains another string.

In this post, I will show you several methods for finishing this operation, with a little bit more analysis of the implementation.

1. Using include? method from string

The method named include? will return true or false as result, sample code is:

str = "coderscat"
if str.include? "code"
puts "Yes, code is included!"
end

Note: This method is case sensitive. If you want to ignore case sensitive, you could do a downcase() operation firstly.

2. Using regular expression

Regex matching in Ruby returns nil when the expression doesn’t match. When it matches, it returns the index of the character where the match happens.

For example

"coderscat" =~ /cat/  # => 6
"coderscat" =~ /code/ # => 0
"coderscat" =~ /zzz/ # => nil

Note: In Ruby, only nil and the boolean false are evaluated to false. Everything else(including an empty array, empty hash, or the integer 0) are evaluated to true.

Another benefit from regular expression: it’s a general pattern match for strings, for example:

"codersCat" =~ /cat/i  # => 6
"codersCat" =~ /c[ao][dt]/i # => 0, matched with 'cod'

3. Using [] in string

You could also use [] in string, the usgage is as below:

str[regexp] → new_str or nil
str[regexp, capture] → new_str or nil
str[match_str] → new_str or nil

Code sample:

str = "CodersCat"
if str["Cat"]
puts 'Yes, Cat is included!'
else
puts 'No....'
end
# => 'Yes, Cat is included!'

Note: You maybe thought [] is actually same with str.index. No, index will return the position of first occurrence, while [] will return a string same with parameter.

"hello".index('e')             #=> 1
"hello".index('lo') #=> 3
"hello"['lo'] #=> "lo"

4. Rails: Using in operator

In Rails, a more idiom practice is like this:

str = "coderscat"
if "code".in? str
puts "Yes, code is contained in!"
end

Note: It’s only available in Rails, and not pure Ruby.

Dive deeper

As an aspiring developer, you may curious about how the string matching is implemented in Ruby’s source code.

A little bit dig will find out the related C implementation in source code of string.c:

file:img/2020_01_09_ruby-string-contains-another.org_20200109_141903.png

Let’s continue to check the function rb_memsearch located at re.c:

By default, if system have defined a function called memmem, Ruby will just call it directly. The memmem() function is not specified in POSIX.1, but normally could be found in a system like Linux. It’s implemented with Glibc.

Otherwise, If a system doesn’t have memmem, Ruby uses an algorithm called Rabin–Karp algorithm to do string pattern matching. It uses a rolling hash to quickly filter out positions of the text that cannot match the pattern and then checks for a match at the remaining positions.

#ifdef HAVE_MEMMEM
static inline long
rb_memsearch_ss(const unsigned char *xs, long m, const unsigned char *ys, long n)
{
const unsigned char *y;

if ((y = memmem(ys, n, xs, m)) != NULL)
return y - ys;
else
return -1;
}
#else
static inline long
rb_memsearch_ss(const unsigned char *xs, long m, const unsigned char *ys, long n)
{
const unsigned char *x = xs, *xe = xs + m;
const unsigned char *y = ys, *ye = ys + n;

#define VALUE_MAX ((VALUE)~(VALUE)0)
VALUE hx, hy, mask = VALUE_MAX >> ((SIZEOF_VALUE - m) * CHAR_BIT);

if (m > SIZEOF_VALUE)
rb_bug("!!too long pattern string!!");

if (!(y = memchr(y, *x, n - m + 1)))
return -1;

/* Prepare hash value */
for (hx = *x++, hy = *y++; x < xe; ++x, ++y) {
hx <<= CHAR_BIT;
hy <<= CHAR_BIT;
hx |= *x;
hy |= *y;
}
/* Searching */
while (hx != hy) {
if (y == ye)
return -1;
hy <<= CHAR_BIT;
hy |= *y;
hy &= mask;
y++;
}
return y - ys - m;
}
#endif

Conclusion

include?, regex, index could all used to do pattern matching of strings.

In the source code of Ruby, we use memmem or Rabin–Karp algorithm for string searching.

Join my Email List for more insights, It's Free!😋

Tags: Misc