BUG: TypeError when printing dataframe with numpy.record dtype column #48526

andrii0yerko · 2022-09-13T11:37:01Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame([["a", "b"], ["c", "d"], ["e", "f"]], columns=["left", "right"])
df["record"] = df[["left", "right"]].to_records()
print(df)

Issue Description

Printing a dataframe or a series with a column that has a dtype of "numpy.record" causes TypeError exception.

For the example above:

>>> df.dtypes
left                                                 object
right                                                object
record    (numpy.record, [('index', '<i8'), ('left', 'O'...
dtype: object

And print raises an exception with the following traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 1064, in __repr__
    return self.to_string(**repr_params)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 1245, in to_string
    return fmt.DataFrameRenderer(formatter).to_string(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1136, in to_string
    string = string_formatter.to_string()
  File "/usr/local/lib/python3.10/site-packages/pandas/io/formats/string.py", line 30, in to_string
    text = self._get_string_representation()
  File "/usr/local/lib/python3.10/site-packages/pandas/io/formats/string.py", line 45, in _get_string_representation
    strcols = self._get_strcols()
  File "/usr/local/lib/python3.10/site-packages/pandas/io/formats/string.py", line 36, in _get_strcols
    strcols = self.fmt.get_strcols()
  File "/usr/local/lib/python3.10/site-packages/pandas/io/formats/format.py", line 617, in get_strcols
    strcols = self._get_strcols_without_index()
  File "/usr/local/lib/python3.10/site-packages/pandas/io/formats/format.py", line 883, in _get_strcols_without_index
    fmt_values = self.format_col(i)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/formats/format.py", line 897, in format_col
    return format_array(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1328, in format_array
    return fmt_obj.get_result()
  File "/usr/local/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1359, in get_result
    fmt_values = self._format_strings()
  File "/usr/local/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1413, in _format_strings
    & np.all(notna(vals), axis=tuple(range(1, len(vals.shape))))
  File "/usr/local/lib/python3.10/site-packages/pandas/core/dtypes/missing.py", line 434, in notna
    res = isna(obj)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/dtypes/missing.py", line 185, in isna
    return _isna(obj)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/dtypes/missing.py", line 214, in _isna
    return _isna_array(obj, inf_as_na=inf_as_na)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/dtypes/missing.py", line 304, in _isna_array
    result = np.isnan(values)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Expected Behavior

Dataframe with "numpy.record" column displayed correctly, as like it has an object dtype

>>> #  df["record"] = [x for x in df[["left", "right"]].to_records()]  # this is rendered correctly
>>> print(df)
  left right     record
0    a     b  [0, a, b]
1    c     d  [1, c, d]
2    e     f  [2, e, f]

Installed Versions

Reproduces both with the current latest master commit (below) and the latest pypi release (ca60aab)

INSTALLED VERSIONS

commit : 5514aa3
python : 3.10.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-39-generic
Version : #42-Ubuntu SMP Thu Jun 9 23:42:32 UTC 2022
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.6.0.dev0+114.g5514aa371
numpy : 1.23.3
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 63.2.0
pip : 22.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

The text was updated successfully, but these errors were encountered:

RaphSku · 2022-09-15T14:16:19Z

I gave that a check and I think the reason why the list comprehension is working and the other approach not is that in the former case we have dtype list and in the later case a dtype np.recarray. When having a look into pandas/core/dtypes/missing.py, we see in line 172-173 an _isna_array check where the obj is given as is and in case of the list it is given to 188-189 where the list is casted to a numpy array but with a specified dtype of object. Then in _isna_array since for the np.recarray the dtype is not object, it will execute result = np.isnan(values), leading to the error you described.
Maybe someone else also wants to verify this.

RaphSku · 2022-09-19T09:36:19Z

take

andrii0yerko added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 13, 2022

github-actions bot assigned RaphSku Sep 19, 2022

RaphSku mentioned this issue Sep 19, 2022

BUG: Fixed DataFrame.__repr__ Type Error when column dtype is np.record (GH 48526) #48637

Merged

4 tasks

mroeschke added Output-Formatting __repr__ of pandas objects, to_string Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 20, 2022

stefan-jansen mentioned this issue Oct 29, 2022

BUG: .unstack() with recarray column raises TypeError since 1.4.0 #49388

Open

3 tasks

mroeschke closed this as completed in #48637 Apr 21, 2023

rvernica mentioned this issue Sep 5, 2023

BUG: TypeError when printing dataframe with numpy.record #55011

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: TypeError when printing dataframe with numpy.record dtype column #48526

BUG: TypeError when printing dataframe with numpy.record dtype column #48526

andrii0yerko commented Sep 13, 2022

INSTALLED VERSIONS

RaphSku commented Sep 15, 2022

RaphSku commented Sep 19, 2022

BUG: TypeError when printing dataframe with numpy.record dtype column #48526

BUG: TypeError when printing dataframe with numpy.record dtype column #48526

Comments

andrii0yerko commented Sep 13, 2022

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

RaphSku commented Sep 15, 2022

RaphSku commented Sep 19, 2022