Skip to content

Use BaseExecutionEngine for Python and Numba engines #61458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
datapythonista opened this issue May 19, 2025 · 2 comments
Open

Use BaseExecutionEngine for Python and Numba engines #61458

datapythonista opened this issue May 19, 2025 · 2 comments
Assignees
Labels
Apply Apply, Aggregate, Transform, Map

Comments

@datapythonista
Copy link
Member

In #61032 we have created a new base class BaseExecutionEngine that engines can subclass to handle apply and map operations. The base class has been initially created to allow third-party engines to be passed to DataFrame.apply(..., engine=third_party_engine). But our core engines Python and Numba can also be implemented as instances of this base class. This will make the code cleaner, more maintainable, and it may allow to move the Numba engine outside of the pandas code base easily.

The whole migration to the new interface is quite a big change, so it's recommended to make the transition step by step, in small pull requests.

@arthurlw
Copy link
Member

Thanks for assigning me this @datapythonista ! This looks interesting to work on and I'll start looking into it.

@datapythonista
Copy link
Member Author

Thanks @arthurlw. A possible approach could be starting by numba only. The numba engine is only implemented for DataFrame.apply for now, and only for certain types of the parameters. For example, it doesn't work with ufuncs.

I think all the numba engine has been introduced in two PRs, #54666 and #55104, and hasn't change much. So it should be easy to see all the changes implemented for the engine.

The main logic is implemented here: https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py#L1096

I think having all the numba engine as a sublass of the base executor would be already quite valuable, and much easier than refactoring all the Python engine code.

For reference, you have an implementation of a third-party executor engine in this PR: https://github.com/bodo-ai/Bodo/pull/410/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map
Projects
None yet
Development

No branches or pull requests

2 participants