Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: The row and column indexing mechanism of your dataframe is inefficient, leading to errors and unnecessary time consumption #61230

Open
1 of 3 tasks
zyy37 opened this issue Apr 4, 2025 · 0 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@zyy37
Copy link

zyy37 commented Apr 4, 2025

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The row and column indexing mechanism of your dataframe is inefficient, leading to errors and unnecessary time consumption for users. When two dataframes are merged or concated horizontally or vertically, it can cause index duplication. If iterating the index in a for loop, the operation will be repeated twice in one iteration, which is a typical scenario that leads to calculation errors. For example,

df = pd.concat([df1, df2]).drop_duplicates('title')
df.reset_index(drop=True, inplace=True) # this expression must be included every time, otherwise duplicate indexes will cause loop iteration errors.
df['name'] = None
for idx, row in df.iterrows():
    name_list = ['mike', 'jake', 'cook']
    df.at[idx, 'name'] = ",".join(name_list)

If there is no expression df.reset_index(drop=True, inplace=True), this cell will have two of the name_list instead of one written in the code, (Pdb) p df.at[idx, 'name'].index Index([1, 1], dtype='int64').
So I hope that when the rows or columns of the dataframe change, you can automatically maintain the index as an internal mechanism, just like C++'s vectors or arrays. After deletion and removal, the index or iterator is automatically maintained as a continuous number, and users do not manage this. This is also competitor analysis and benchmarking. Hope for improvement. Thank you.

Feature Description

n/a

Alternative Solutions

n/a

Additional Context

No response

@zyy37 zyy37 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

1 participant