Skip to content

ENH: Allow literal (non-regex) replacement using .str.replace #16808 #19584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 28, 2018
Merged
28 changes: 21 additions & 7 deletions doc/source/text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,8 @@ i.e., from the end of the string to the beginning of the string:

s2.str.rsplit('_', expand=True, n=1)

Methods like ``replace`` and ``findall`` take `regular expressions
<https://docs.python.org/3/library/re.html>`__, too:
``replace`` by default replaces `regular expressions
<https://docs.python.org/3/library/re.html>`__:

.. ipython:: python

Expand All @@ -146,12 +146,25 @@ following code will cause trouble because of the regular expression meaning of
# We need to escape the special character (for >1 len patterns)
dollars.str.replace(r'-\$', '-')

.. versionadded:: 0.23.0

If you do want literal replacement of a string (equivalent to
:meth:`str.replace`), you can set the optional ``regex`` parameter to
``False``, rather than escaping each character. In this case both ``pat``
and ``repl`` must be strings:

.. ipython:: python

# These lines are equivalent
dollars.str.replace(r'-\$', '-')
dollars.str.replace('-$', '-', regex=False)

.. versionadded:: 0.20.0

The ``replace`` method can also take a callable as replacement. It is called
on every ``pat`` using :func:`re.sub`. The callable should expect one
positional argument (a regex object) and return a string.

.. versionadded:: 0.20.0

.. ipython:: python

# Reverse every lowercase alphabetic word
Expand All @@ -164,12 +177,12 @@ positional argument (a regex object) and return a string.
repl = lambda m: m.group('two').swapcase()
pd.Series(['Foo Bar Baz', np.nan]).str.replace(pat, repl)

.. versionadded:: 0.20.0

The ``replace`` method also accepts a compiled regular expression object
from :func:`re.compile` as a pattern. All flags should be included in the
compiled regular expression object.

.. versionadded:: 0.20.0

.. ipython:: python

import re
Expand All @@ -186,6 +199,7 @@ regular expression object will raise a ``ValueError``.
---------------------------------------------------------------------------
ValueError: case and flags cannot be set when pat is a compiled regex


Indexing with ``.str``
----------------------

Expand Down Expand Up @@ -432,7 +446,7 @@ Method Summary
:meth:`~Series.str.join`;Join strings in each element of the Series with passed separator
:meth:`~Series.str.get_dummies`;Split strings on the delimiter returning DataFrame of dummy variables
:meth:`~Series.str.contains`;Return boolean array if each string contains pattern/regex
:meth:`~Series.str.replace`;Replace occurrences of pattern/regex with some other string or the return value of a callable given the occurrence
:meth:`~Series.str.replace`;Replace occurrences of pattern/regex/string with some other string or the return value of a callable given the occurrence
:meth:`~Series.str.repeat`;Duplicate values (``s.str.repeat(3)`` equivalent to ``x * 3``)
:meth:`~Series.str.pad`;"Add whitespace to left, right, or both sides of strings"
:meth:`~Series.str.center`;Equivalent to ``str.center``
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -620,6 +620,7 @@ Other API Changes
- Set operations (union, difference...) on :class:`IntervalIndex` with incompatible index types will now raise a ``TypeError`` rather than a ``ValueError`` (:issue:`19329`)
- :class:`DateOffset` objects render more simply, e.g. ``<DateOffset: days=1>`` instead of ``<DateOffset: kwds={'days': 1}>`` (:issue:`19403`)
- ``Categorical.fillna`` now validates its ``value`` and ``method`` keyword arguments. It now raises when both or none are specified, matching the behavior of :meth:`Series.fillna` (:issue:`19682`)
- :func:`Series.str.replace` now takes an optional `regex` keyword which, when set to ``False``, uses literal string replacement rather than regex replacement (:issue:`16808`)

.. _whatsnew_0230.deprecations:

Expand Down
92 changes: 62 additions & 30 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,7 @@ def str_endswith(arr, pat, na=np.nan):
return _na_map(f, arr, na, dtype=bool)


def str_replace(arr, pat, repl, n=-1, case=None, flags=0):
def str_replace(arr, pat, repl, n=-1, case=None, flags=0, regex=True):
r"""
Replace occurrences of pattern/regex in the Series/Index with
some other string. Equivalent to :meth:`str.replace` or
Expand Down Expand Up @@ -337,25 +337,50 @@ def str_replace(arr, pat, repl, n=-1, case=None, flags=0):
flags : int, default 0 (no flags)
- re module flags, e.g. re.IGNORECASE
- Cannot be set if `pat` is a compiled regex
regex : boolean, default True
- If True, assumes the passed-in pattern is a regular expression.
- If False, treats the pattern as a literal string
- Cannot be set to False if `pat` is a compiled regex or `repl` is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a version added tag here

a callable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a Raises section. (ValuesError if compiled with case/flags)

.. versionadded:: 0.23.0

Returns
-------
replaced : Series/Index of objects

Raises
------
ValueError
* if `regex` is False and `repl` is a callable or `pat` is a compiled
regex
* if `pat` is a compiled regex and `case` or `flags` is set

Notes
-----
When `pat` is a compiled regex, all flags should be included in the
compiled regex. Use of `case` or `flags` with a compiled regex will
raise an error.
compiled regex. Use of `case`, `flags`, or `regex=False` with a compiled
regex will raise an error.

Examples
--------
When `repl` is a string, every `pat` is replaced as with
:meth:`str.replace`. NaN value(s) in the Series are left as is.
When `pat` is a string and `regex` is True (the default), the given `pat`
is compiled as a regex. When `repl` is a string, it replaces matching
regex patterns as with :meth:`re.sub`. NaN value(s) in the Series are
left as is:

>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f.', 'ba', regex=True)
0 bao
1 baz
2 NaN
dtype: object

>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f', 'b')
0 boo
1 buz
When `pat` is a string and `regex` is False, every `pat` is replaced with
`repl` as with :meth:`str.replace`:

>>> pd.Series(['f.o', 'fuz', np.nan]).str.replace('f.', 'ba', regex=False)
0 bao
1 fuz
2 NaN
dtype: object

Expand Down Expand Up @@ -397,34 +422,41 @@ def str_replace(arr, pat, repl, n=-1, case=None, flags=0):
1 bar
2 NaN
dtype: object

"""

# Check whether repl is valid (GH 13438, GH 15055)
if not (is_string_like(repl) or callable(repl)):
raise TypeError("repl must be a string or callable")

is_compiled_re = is_re(pat)
if is_compiled_re:
if (case is not None) or (flags != 0):
raise ValueError("case and flags cannot be set"
" when pat is a compiled regex")
else:
# not a compiled regex
# set default case
if case is None:
case = True

# add case flag, if provided
if case is False:
flags |= re.IGNORECASE

use_re = is_compiled_re or len(pat) > 1 or flags or callable(repl)

if use_re:
n = n if n >= 0 else 0
regex = re.compile(pat, flags=flags)
f = lambda x: regex.sub(repl=repl, string=x, count=n)
if regex:
if is_compiled_re:
if (case is not None) or (flags != 0):
raise ValueError("case and flags cannot be set"
" when pat is a compiled regex")
else:
# not a compiled regex
# set default case
if case is None:
case = True

# add case flag, if provided
if case is False:
flags |= re.IGNORECASE
if is_compiled_re or len(pat) > 1 or flags or callable(repl):
n = n if n >= 0 else 0
compiled = re.compile(pat, flags=flags)
f = lambda x: compiled.sub(repl=repl, string=x, count=n)
else:
f = lambda x: x.replace(pat, repl, n)
else:
if is_compiled_re:
raise ValueError("Cannot use a compiled regex as replacement "
"pattern with regex=False")
if callable(repl):
raise ValueError("Cannot use a callable replacement when "
"regex=False")
f = lambda x: x.replace(pat, repl, n)

return _na_map(f, arr)
Expand Down Expand Up @@ -1596,9 +1628,9 @@ def match(self, pat, case=True, flags=0, na=np.nan, as_indexer=None):
return self._wrap_result(result)

@copy(str_replace)
def replace(self, pat, repl, n=-1, case=None, flags=0):
def replace(self, pat, repl, n=-1, case=None, flags=0, regex=True):
result = str_replace(self._data, pat, repl, n=n, case=case,
flags=flags)
flags=flags, regex=regex)
return self._wrap_result(result)

@copy(str_repeat)
Expand Down
21 changes: 21 additions & 0 deletions pandas/tests/test_strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -530,6 +530,27 @@ def test_replace_compiled_regex(self):
exp = Series(['foObaD__baRbaD', NA])
tm.assert_series_equal(result, exp)

def test_replace_literal(self):
# GH16808 literal replace (regex=False vs regex=True)
values = Series(['f.o', 'foo', NA])
exp = Series(['bao', 'bao', NA])
result = values.str.replace('f.', 'ba')
tm.assert_series_equal(result, exp)

exp = Series(['bao', 'foo', NA])
result = values.str.replace('f.', 'ba', regex=False)
tm.assert_series_equal(result, exp)

# Cannot do a literal replace if given a callable repl or compiled
# pattern
callable_repl = lambda m: m.group(0).swapcase()
compiled_pat = re.compile('[a-z][A-Z]{2}')

pytest.raises(ValueError, values.str.replace, 'abc', callable_repl,
regex=False)
pytest.raises(ValueError, values.str.replace, compiled_pat, '',
regex=False)

def test_repeat(self):
values = Series(['a', 'b', NA, 'c', NA, 'd'])

Expand Down