Skip to content

error: No overload variant of "replace" of "DataFrame" matches argument types "str", "NAType" [call-overload] #262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
randolf-scholz opened this issue Sep 4, 2022 · 8 comments · Fixed by #312

Comments

@randolf-scholz
Copy link
Contributor

import pandas as pd
frame = pd.DataFrame(["N/A", "foo", "bar"])
frame = frame.replace("N/A", pd.NA)
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 4, 2022

Have to add NaTType to arguments of to_replace() in various overloads in core/frame.pyi

@twoertwein
Copy link
Member

Have to add NaTType to arguments of to_replace() in various overloads in core/frame.pyi

Maybe pd.NA couldbe added to _typing.Scalar?

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 5, 2022

Maybe pd.NA couldbe added to _typing.Scalar?

We'd have to check in all the uses of _typing.Scalar whether that might make the result too wide.

@randolf-scholz
Copy link
Contributor Author

randolf-scholz commented Sep 5, 2022

It might make sense to also introduce both a NullableScalar for pandas nullable data types. (Int32Dtype, Float32Dtype, etc.) I suppose ideally there would be signatures available of the form

class Series(Generic[ScalarType]):
    def replace(self, to_replace: ScalarType, value: NAType) -> Series[NullableScalarType]:

However, I am not quite sure how to actually correctly implement this without the tedious

@overload
def replace(self, to_replace: np.float32, value: NAType) -> Series[Float32Dtype]: ...
@overload
def replace(self, to_replace: np.float64, value: NAType) -> Series[Float64Dtype]: ...
@overload
def replace(self, to_replace: np.int32, value: NAType) -> Series[Int32Dtype]: ...
@overload
def replace(self, to_replace: np.int64, value: NAType) -> Series[Int64Dtype]: ...
@overload
def replace(self, to_replace: np.bool_, value: NAType) -> Series[BooleanDtype]: ...

Because NullableScalarType is conditional on the type of the Series itself. Regular TypeVars don't seem to be the solution since there are 2 types, one of them which is dependent on another. TypeVarTuples could maybe work (cf. PEP 646).

Besides, it seems that DataFrame.replace currently replaces the dtype with object when imputing pandas.NA; maybe I should open an issue as well for that?

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 5, 2022

Because NullableScalarType is conditional on the type of the Series itself. Regular TypeVars don't seem to be the solution since there are 2 types, one of them which is dependent on another. TypeVarTuples could maybe work (cf. PEP 646).

I'm open to solutions that are incremental, and don't introduce too much complexity, and don't make the corresponding stubs too wide. Making changes to how the generic aspects of Series work is pretty tricky.

Besides, it seems that DataFrame.replace currently replaces the dtype with object when imputing pandas.NA; maybe I should open an issue as well for that?

Are you saying this is an issue with the stubs or with pandas itself? Always open to having issues created in either project!

@randolf-scholz
Copy link
Contributor Author

@Dr-Irv It's an issue both with pandas and the stubs I'd say. A user would expect that calling Series.replace should only widen the datatype as little as necessary. So, If I replace a value in a float64 Series with pandas.NA, I'd expect to get back a Float64 Series.

import pandas
x = pandas.Series([1.0, 3.1, float('nan')])
print(x.dtype)
y = x.replace(float('nan'), pandas.NA)
print(y.dtype)    # object ✘ (expected: Float64)
z = x.replace(1, 1+1j)
print(z.dtype)    # complex128 ✔

@bashtage
Copy link
Contributor

bashtage commented Sep 6, 2022

What you are seeing is that promotion is using NumPy rules. Seems like promotion from numpy dtypes to extension dtypes is not implemented, or at least not complete. This isn't a surprise as it is pretty hard.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 6, 2022

@Dr-Irv It's an issue both with pandas and the stubs I'd say. A user would expect that calling Series.replace should only widen the datatype as little as necessary. So, If I replace a value in a float64 Series with pandas.NA, I'd expect to get back a Float64 Series.

So given the response to pandas-dev/pandas#48409 , I think the best we can support here is allowing NaTType as a valid parameter to replace, and make the resulting Series result be Series[object] if NaT is replaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants