-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Problem description
Series.str contains methods for all the regular expression matching modes in the re package except for re.fullmatch(). fullmatch only returns matches that cover the entire input string, unlike match, which also returns matches that start at the beginning of the string but do not cover the complete string.
One can work around the lack of fullmatch by round-tripping to/from numpy arrays and using np.vectorize, i.e.
>>> s = pd.Series(["foo", "bar", "foobar"])
>>> my_regex = "foo"
>>> import re
>>> import numpy as np
>>> compiled_regex = re.compile(my_regex)
>>> regex_f = np.vectorize(lambda s: compiled_regex.fullmatch(s) is not None)
>>> matches_array = regex_f(s.values)
>>> matches_series = pd.Series(matches_array)
>>> matches_series
0 True
1 False
2 False
dtype: boolbut it would be more convenient for users if fullmatch was built in.
The fullmatch method was added to the re package in Python 3.4. I think that the reason this method wasn't in previous versions of Pandas was that older versions of Python don't have re.fullmatch. As of Pandas 1.0, all the supported versions of Python now have fullmatch.
I have a pull request ready that adds this functionality. After my changes, the Series.str namespace gets a new method fullmatch that evaluates re.fullmatch over the series. For example:
>>> s = pd.Series(["foo", "bar", "foobar"])
>>> s.str.fullmatch("foo")
0 True
1 False
2 False
dtype: bool[Edit: Simplified the workaround]