error: No overload variant of "replace" of "DataFrame" matches argument types "str", "NAType" [call-overload] #262

randolf-scholz · 2022-09-04T18:13:05Z

import pandas as pd
frame = pd.DataFrame(["N/A", "foo", "bar"])
frame = frame.replace("N/A", pd.NA)

Dr-Irv · 2022-09-04T21:35:42Z

Have to add NaTType to arguments of to_replace() in various overloads in core/frame.pyi

twoertwein · 2022-09-04T23:20:21Z

Have to add NaTType to arguments of to_replace() in various overloads in core/frame.pyi

Maybe pd.NA couldbe added to _typing.Scalar?

Dr-Irv · 2022-09-05T01:04:35Z

Maybe pd.NA couldbe added to _typing.Scalar?

We'd have to check in all the uses of _typing.Scalar whether that might make the result too wide.

randolf-scholz · 2022-09-05T11:52:59Z

It might make sense to also introduce both a NullableScalar for pandas nullable data types. (Int32Dtype, Float32Dtype, etc.) I suppose ideally there would be signatures available of the form

class Series(Generic[ScalarType]):
    def replace(self, to_replace: ScalarType, value: NAType) -> Series[NullableScalarType]:

However, I am not quite sure how to actually correctly implement this without the tedious

@overload
def replace(self, to_replace: np.float32, value: NAType) -> Series[Float32Dtype]: ...
@overload
def replace(self, to_replace: np.float64, value: NAType) -> Series[Float64Dtype]: ...
@overload
def replace(self, to_replace: np.int32, value: NAType) -> Series[Int32Dtype]: ...
@overload
def replace(self, to_replace: np.int64, value: NAType) -> Series[Int64Dtype]: ...
@overload
def replace(self, to_replace: np.bool_, value: NAType) -> Series[BooleanDtype]: ...

Because NullableScalarType is conditional on the type of the Series itself. Regular TypeVars don't seem to be the solution since there are 2 types, one of them which is dependent on another. TypeVarTuples could maybe work (cf. PEP 646).

Besides, it seems that DataFrame.replace currently replaces the dtype with object when imputing pandas.NA; maybe I should open an issue as well for that?

Dr-Irv · 2022-09-05T16:51:05Z

Because NullableScalarType is conditional on the type of the Series itself. Regular TypeVars don't seem to be the solution since there are 2 types, one of them which is dependent on another. TypeVarTuples could maybe work (cf. PEP 646).

I'm open to solutions that are incremental, and don't introduce too much complexity, and don't make the corresponding stubs too wide. Making changes to how the generic aspects of Series work is pretty tricky.

Besides, it seems that DataFrame.replace currently replaces the dtype with object when imputing pandas.NA; maybe I should open an issue as well for that?

Are you saying this is an issue with the stubs or with pandas itself? Always open to having issues created in either project!

randolf-scholz · 2022-09-06T08:32:31Z

@Dr-Irv It's an issue both with pandas and the stubs I'd say. A user would expect that calling Series.replace should only widen the datatype as little as necessary. So, If I replace a value in a float64 Series with pandas.NA, I'd expect to get back a Float64 Series.

import pandas
x = pandas.Series([1.0, 3.1, float('nan')])
print(x.dtype)
y = x.replace(float('nan'), pandas.NA)
print(y.dtype)    # object ✘ (expected: Float64)
z = x.replace(1, 1+1j)
print(z.dtype)    # complex128 ✔

bashtage · 2022-09-06T08:34:26Z

What you are seeing is that promotion is using NumPy rules. Seems like promotion from numpy dtypes to extension dtypes is not implemented, or at least not complete. This isn't a surprise as it is pretty hard.

Dr-Irv · 2022-09-06T12:38:23Z

@Dr-Irv It's an issue both with pandas and the stubs I'd say. A user would expect that calling Series.replace should only widen the datatype as little as necessary. So, If I replace a value in a float64 Series with pandas.NA, I'd expect to get back a Float64 Series.

So given the response to pandas-dev/pandas#48409 , I think the best we can support here is allowing NaTType as a valid parameter to replace, and make the resulting Series result be Series[object] if NaT is replaced.

Dr-Irv added the good first issue label Sep 4, 2022

randolf-scholz mentioned this issue Sep 6, 2022

BUG: Series.replace widens dtype to object when imputing pd.NA pandas-dev/pandas#48409

Closed

3 tasks

Dr-Irv mentioned this issue Sep 17, 2022

Fix replace, fillna allowing pd.NA. Allow list of arguments for .loc #312

Merged

4 tasks

twoertwein closed this as completed in #312 Sep 18, 2022

bersbersbers mentioned this issue May 3, 2023

Argument 1 to "apply" of "Series" has incompatible type "Callable[[Any], Union[int, NAType]]" #681

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error: No overload variant of "replace" of "DataFrame" matches argument types "str", "NAType" [call-overload] #262

error: No overload variant of "replace" of "DataFrame" matches argument types "str", "NAType" [call-overload] #262

randolf-scholz commented Sep 4, 2022

Dr-Irv commented Sep 4, 2022

twoertwein commented Sep 4, 2022

Dr-Irv commented Sep 5, 2022

randolf-scholz commented Sep 5, 2022 •

edited

Loading

Dr-Irv commented Sep 5, 2022

randolf-scholz commented Sep 6, 2022

bashtage commented Sep 6, 2022

Dr-Irv commented Sep 6, 2022

error: No overload variant of "replace" of "DataFrame" matches argument types "str", "NAType" [call-overload] #262

error: No overload variant of "replace" of "DataFrame" matches argument types "str", "NAType" [call-overload] #262

Comments

randolf-scholz commented Sep 4, 2022

Dr-Irv commented Sep 4, 2022

twoertwein commented Sep 4, 2022

Dr-Irv commented Sep 5, 2022

randolf-scholz commented Sep 5, 2022 • edited Loading

Dr-Irv commented Sep 5, 2022

randolf-scholz commented Sep 6, 2022

bashtage commented Sep 6, 2022

Dr-Irv commented Sep 6, 2022

randolf-scholz commented Sep 5, 2022 •

edited

Loading