LeetCode: Regular Expression Matching

Question

Given an input string (s) and a pattern (p), implement regular expression matching with support for ‘.’ and ‘*’.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Note:

s could be empty and contains only lowercase letters a-z.
p could be empty and contains only lowercase letters a-z, and characters like . or *.

Example 1:

Input:

s = "aa"
p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Explanation

This problem is not easy, some corner test cases need to handle carefully.

Please try to write the correct code first and then simplify it.

Depth-first search (DFS)

Let’s compare the current character of input and pattern.

There are two normal cases without consideration of *:

1. cur_s == cur_p
2. cur_s != cur_p, but cur_p == '.' and cur_s is any character

Now let’s consider how to handle the pattern *:

If we have a X* in the pattern(X stands for any character), we can skip zero or more than one character(same as the previous one) from the input.

According to these analyses, we can construct a depth-first search algorithm, it’s a recursive algorithm.

#include <string>
#include <iostream>
using namespace std;

class Solution {
public:
  bool isMatch(string s, string p) {
    if (p.empty()) return s.empty();

    bool first_match = !s.empty() && (s[0] == p[0] || '.' == p[0]);
    if (p.size() >= 1 && '*' == p[1])
      return isMatch(s, p.substr(2)) || (first_match && isMatch(s.substr(1), p));
    else
      return  first_match && isMatch(s.substr(1), p.substr(1));
  }
};

This solution is slow, from the result we can see it only beat 11% submissions. The time complexity is hard to analyze. You may refer to more details here.

file:img/2019_12_02_leetcode-regular-expression-matching.org_20191203_163941.png

DFS with memorization

In the recursive version, there are some overlap subproblems we can optimize.

Naive DFS algorithm could be optimized with a memorization data structure. We could add a memo map to store the computed result, it’s a typical caching technology.

In this solution, I use text + "_" + pattern as a key for memorization, which is not the best.

The result is much better: file:img/2019_12_02_leetcode-regular-expression-matching.org_20191203_164955.png

#include <string>
#include <map>
#include <iostream>
using namespace std;

class Solution {
public:
  bool isMatch(string s, string p) {
    if (p.empty()) return s.empty();
    string key = s + "_" + p;

    map<string, bool>::iterator iter = memo.find(key);
    if(iter != memo.end()) { return iter->second; }

    bool res;
    bool first_match = !s.empty() && (s[0] == p[0] || '.' == p[0]);

    if (p.size() >= 1 && '*' == p[1])
      res = isMatch(s, p.substr(2)) || (first_match && isMatch(s.substr(1), p));
    else
      res = first_match && isMatch(s.substr(1), p.substr(1));
    memo.insert(std::pair<string, bool>(key, res));
    return res;
  }

private:
  map<string, bool> memo;
};

Dynamic programming

Since we could implement the algorithm in a top-down pattern, we could also implement it from bottom to up.

Here is the scenario where the dynamic programming techniques could be applied to.

The time complexity is $O(N*M)$

file:img/2019_12_02_leetcode-regular-expression-matching.org_20191202_235517.png

#include <string>
#include <vector>
#include <iostream>
using namespace std;

class Solution {
public:
    bool isMatch(string s, string p) {
        vector<vector<bool>> dp(s.size()+1, vector<bool>(p.size()+1, false));
        dp[0][0] = true;
        for(int i = 1; i <= p.size(); i++) {
             dp[0][i] = (i>1 && p[i-1]=='*' && dp[0][i-2]);
        }
        for (int i = 1; i <= s.size(); i++) {
                for (int j = 1; j <= p.size(); j++) {
                     if (s[i-1] == p[j-1] || p[j-1] == '.')
                       dp[i][j] = dp[i-1][j-1];
                     else if (p[j-1] == '*') {
                             if (s[i-1] == p[j-2] || p[j-2] == '.')
                               dp[i][j] = dp[i][j-1] || dp[i][j-2] || dp[i-1][j];
                             else
                               dp[i][j] = dp[i][j-2];
                     };
                }
        }
        return dp[s.size()][p.size()];
    }
};

Join my Email List for more insights, It's Free!😋

← Leetcode: Longest Common Prefix Awesome Command-line tips to 10x... →