You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The row and column indexing mechanism of your dataframe is inefficient, leading to errors and unnecessary time consumption for users. When two dataframes are merged or concated horizontally or vertically, it can cause index duplication. If iterating the index in a for loop, the operation will be repeated twice in one iteration, which is a typical scenario that leads to calculation errors. For example,
df=pd.concat([df1, df2]).drop_duplicates('title')
df.reset_index(drop=True, inplace=True) # this expression must be included every time, otherwise duplicate indexes will cause loop iteration errors.df['name'] =Noneforidx, rowindf.iterrows():
name_list= ['mike', 'jake', 'cook']
df.at[idx, 'name'] =",".join(name_list)
If there is no expression df.reset_index(drop=True, inplace=True), this cell will have two of the name_list instead of one written in the code, (Pdb) p df.at[idx, 'name'].index Index([1, 1], dtype='int64').
So I hope that when the rows or columns of the dataframe change, you can automatically maintain the index as an internal mechanism, just like C++'s vectors or arrays. After deletion and removal, the index or iterator is automatically maintained as a continuous number, and users do not manage this. This is also competitor analysis and benchmarking. Hope for improvement. Thank you.
Feature Description
n/a
Alternative Solutions
n/a
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
The row and column indexing mechanism of your dataframe is inefficient, leading to errors and unnecessary time consumption for users. When two dataframes are merged or concated horizontally or vertically, it can cause index duplication. If iterating the index in a
for
loop, the operation will be repeated twice in one iteration, which is a typical scenario that leads to calculation errors. For example,If there is no expression
df.reset_index(drop=True, inplace=True)
, this cell will have two of the name_list instead of one written in the code,(Pdb) p df.at[idx, 'name'].index Index([1, 1], dtype='int64')
.So I hope that when the rows or columns of the dataframe change, you can automatically maintain the index as an internal mechanism, just like C++'s vectors or arrays. After deletion and removal, the index or iterator is automatically maintained as a continuous number, and users do not manage this. This is also competitor analysis and benchmarking. Hope for improvement. Thank you.
Feature Description
n/a
Alternative Solutions
n/a
Additional Context
No response
The text was updated successfully, but these errors were encountered: