Skip to content

'plotly' package contains 123MB of autogenerated code #3294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
huonw opened this issue Jul 9, 2021 · 13 comments
Closed

'plotly' package contains 123MB of autogenerated code #3294

huonw opened this issue Jul 9, 2021 · 13 comments
Assignees
Labels
bug something broken infrastructure build process etc. P2 considered for next cycle

Comments

@huonw
Copy link

huonw commented Jul 9, 2021

Thank you for plotly.py, it's definitely worked well for us in our app!

We're deploying our app's backend to AWS lambda, packaging dependencies in a "layer" which has a 256MB size limit. We are hitting this limit. Unfortunately, plotly's Python library is huge: for the version we're using there (4.14.3), it ends up being 58MB of Python source, and ~19MB of JavaScript (plotly.min.js, and then the Jupyter plugin). The python source seems to be almost entirely the auto-generated (AIUI) graph_objs and validators subdirectories. To reduce size, we've removed the JavaScript files, because the lambdas don't use any of that, however that still leaves the significant amount of Python code.

To make this more concrete, here's the numbers for the latest version on my Mac:

$ pip install plotly==5.1.0
...
$ pip show plotly
...
Location: /SOME/PATH/.../site-packages
...
$ cd /SOME/PATH/.../site-packages # copied from the command above
$ du -sch plotly/* | sort -h
4.0K	plotly/_version.py
4.0K	plotly/_widget_version.py
4.0K	plotly/animation.py
4.0K	plotly/config.py
4.0K	plotly/conftest.py
4.0K	plotly/dashboard_objs.py
4.0K	plotly/exceptions.py
4.0K	plotly/files.py
4.0K	plotly/grid_objs.py
4.0K	plotly/missing_ipywidgets.py
4.0K	plotly/optional_imports.py
4.0K	plotly/presentation_objs.py
4.0K	plotly/serializers.py
4.0K	plotly/session.py
4.0K	plotly/validator_cache.py
4.0K	plotly/version.py
4.0K	plotly/widgets.py
8.0K	plotly/__init__.py
8.0K	plotly/callbacks.py
8.0K	plotly/colors
8.0K	plotly/utils.py
 12K	plotly/shapeannotation.py
 16K	plotly/data
 16K	plotly/plotly
 24K	plotly/graph_objects
 28K	plotly/tools.py
 36K	plotly/basewidget.py
 52K	plotly/subplots.py
 76K	plotly/offline
220K	plotly/basedatatypes.py
264K	plotly/matplotlylib
340K	plotly/__pycache__
344K	plotly/express
364K	plotly/io
664K	plotly/figure_factory
3.5M	plotly/package_data
 43M	plotly/graph_objs
 80M	plotly/validators
129M	total

That is, 123MiB/129MiB (95%) of the package size is the autogenerated graph_objs and validators submodules.

Since these are autogenerated, potentially they could be autogenerated in a way that makes them significantly smaller without changing behaviour or structure. Some ideas:

  • reduce unnecessary whitespace, like empty lines, and, particularly, leading whitespace in doc strings (and potentially other multiline strings) or indentation (one space is enough, rather than 4)
  • other minification techniques, like those supported by https://pypi.org/project/python-minifier

These will require disabling black and generally make the files harder to read, but I don't think they're designed to be human readable anyway?

(There's also other possibilities like combining multiple files into one, allowing sharing imports, but this is probably only a small win, and will require changing other code.)

For example, starting with https://github.com/plotly/plotly.py/blob/v5.1.0/packages/python/plotly/plotly/graph_objs/bar/_stream.py one could save ~20%: https://gist.github.com/huonw/4b81b6825ebd508bbcd39f4bb2215f4e

state size (bytes) relative size
original 4104 100%
no leading whitespace in doc-strings 3792 92%
no empty lines or lines with only # ---- comments 3522 86%
1 space indent 3201 78%

Assuming this 20% decrease generalises across all the autogenerated files, this would cut nearly 25MB off the 129M package.

(Thanks again for plotly!)

@nicolaskruchten
Copy link
Contributor

Thanks for the deep look at this issue! @jonmmease what do you think? I think we could probably implement a lot of this so long as the docstrings are still readable, right?

@huonw
Copy link
Author

huonw commented Jul 11, 2021

Thanks for the quick response!

Two other potential options I thought of over the weekend could be:

  1. compress the files (e.g. gzip -9 -c _stream.py | wc -c reports 1163, i.e. less than 30% of the original), and lazily decompress and exec them on import, somehow (might require Python 3.7)
  2. place all the autogenerated files into a single zip file, and leverage https://docs.python.org/3/library/zipimport.html to import them, although this may require modifying sys.path in a way that may be fragile to support Python 3.6 (whereas Python 3.7+ might be able to be fancier and use zipimport.zipimporter.load_module directly)

@jonmmease
Copy link
Contributor

Thanks for taking a look at this @huonw. I'd have no problem running the generated code through a minimizer instead of black if that's helpful. The compression approaches would carry a bit more breakage risk I think, so that take some care.

The biggest wins would probably be in detecting the use of identical objects throughout the figure hierarchy and sharing those classes.

@huonw
Copy link
Author

huonw commented Jun 6, 2022

Just a status check: this appears to have crept upwards (123MB in 5.1 -> 128MB in 5.8) with both graph_objs. and validators:

pip install --target=/tmp/plotly/ plotly==5.8.0
du -sch /tmp/plotly/plotly/* | sort -h

Output:

4.0K	/tmp/plotly/plotly/_version.py
4.0K	/tmp/plotly/plotly/_widget_version.py
4.0K	/tmp/plotly/plotly/animation.py
4.0K	/tmp/plotly/plotly/config.py
4.0K	/tmp/plotly/plotly/conftest.py
4.0K	/tmp/plotly/plotly/dashboard_objs.py
4.0K	/tmp/plotly/plotly/exceptions.py
4.0K	/tmp/plotly/plotly/files.py
4.0K	/tmp/plotly/plotly/grid_objs.py
4.0K	/tmp/plotly/plotly/missing_ipywidgets.py
4.0K	/tmp/plotly/plotly/optional_imports.py
4.0K	/tmp/plotly/plotly/presentation_objs.py
4.0K	/tmp/plotly/plotly/serializers.py
4.0K	/tmp/plotly/plotly/session.py
4.0K	/tmp/plotly/plotly/validator_cache.py
4.0K	/tmp/plotly/plotly/version.py
4.0K	/tmp/plotly/plotly/widgets.py
8.0K	/tmp/plotly/plotly/__init__.py
8.0K	/tmp/plotly/plotly/callbacks.py
8.0K	/tmp/plotly/plotly/colors
8.0K	/tmp/plotly/plotly/utils.py
 12K	/tmp/plotly/plotly/shapeannotation.py
 12K	/tmp/plotly/plotly/subplots.py
 16K	/tmp/plotly/plotly/data
 16K	/tmp/plotly/plotly/plotly
 20K	/tmp/plotly/plotly/graph_objects
 28K	/tmp/plotly/plotly/tools.py
 36K	/tmp/plotly/plotly/basewidget.py
 52K	/tmp/plotly/plotly/_subplots.py
 76K	/tmp/plotly/plotly/offline
220K	/tmp/plotly/plotly/basedatatypes.py
264K	/tmp/plotly/plotly/matplotlylib
352K	/tmp/plotly/plotly/__pycache__
368K	/tmp/plotly/plotly/io
380K	/tmp/plotly/plotly/express
668K	/tmp/plotly/plotly/figure_factory
3.7M	/tmp/plotly/plotly/package_data
 45M	/tmp/plotly/plotly/graph_objs
 84M	/tmp/plotly/plotly/validators
135M	total

(As always, thank you for plotly.)

@olivercoleman-switchdin

Any news on this? It's making it difficult to deploy AWS Lambda functions containing plotly, even when zipped.

@gvwilson
Copy link
Contributor

Hi - we are tidying up stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for a while, I'm going to close it; if you'd like to submit a PR, we'd be happy to prioritize a review. Thank you - @gvwilson

@gvwilson gvwilson self-assigned this Jul 11, 2024
@huonw
Copy link
Author

huonw commented Jul 12, 2024

Just observing that this is continues to creep up: the package size is now 151MB, in the latest 5.22.0.

metric 5.1.0 5.8.0 5.22.0
graph_objs/ 43 45 48
validators/ 80 84 97
total package size 123 131 151

(Sizes numbers in MB)

pip install --target=/tmp/plotly plotly==5.22.0
du -sch /tmp/plotly/plotly/* | sort -h
4.0K    /tmp/plotly/plotly/_version.py
4.0K    /tmp/plotly/plotly/_widget_version.py
4.0K    /tmp/plotly/plotly/animation.py
4.0K    /tmp/plotly/plotly/config.py
4.0K    /tmp/plotly/plotly/conftest.py
4.0K    /tmp/plotly/plotly/dashboard_objs.py
4.0K    /tmp/plotly/plotly/exceptions.py
4.0K    /tmp/plotly/plotly/files.py
4.0K    /tmp/plotly/plotly/grid_objs.py
4.0K    /tmp/plotly/plotly/missing_ipywidgets.py
4.0K    /tmp/plotly/plotly/optional_imports.py
4.0K    /tmp/plotly/plotly/presentation_objs.py
4.0K    /tmp/plotly/plotly/serializers.py
4.0K    /tmp/plotly/plotly/session.py
4.0K    /tmp/plotly/plotly/validator_cache.py
4.0K    /tmp/plotly/plotly/version.py
4.0K    /tmp/plotly/plotly/widgets.py
8.0K    /tmp/plotly/plotly/__init__.py
8.0K    /tmp/plotly/plotly/callbacks.py
8.0K    /tmp/plotly/plotly/colors
8.0K    /tmp/plotly/plotly/utils.py
 12K    /tmp/plotly/plotly/shapeannotation.py
 12K    /tmp/plotly/plotly/subplots.py
 16K    /tmp/plotly/plotly/data
 16K    /tmp/plotly/plotly/plotly
 20K    /tmp/plotly/plotly/graph_objects
 28K    /tmp/plotly/plotly/tools.py
 36K    /tmp/plotly/plotly/basewidget.py
 52K    /tmp/plotly/plotly/_subplots.py
 76K    /tmp/plotly/plotly/offline
224K    /tmp/plotly/plotly/basedatatypes.py
256K    /tmp/plotly/plotly/matplotlylib
352K    /tmp/plotly/plotly/__pycache__
364K    /tmp/plotly/plotly/io
392K    /tmp/plotly/plotly/express
668K    /tmp/plotly/plotly/figure_factory
3.6M    /tmp/plotly/plotly/package_data
 48M    /tmp/plotly/plotly/graph_objs
 97M    /tmp/plotly/plotly/validators
151M    total

@gvwilson gvwilson removed their assignment Aug 2, 2024
@gvwilson gvwilson added P3 backlog infrastructure build process etc. and removed sev-4 cosmetic labels Aug 12, 2024
@sh-shahrokhi
Copy link

Just wanted to add my voice to this.

@GirayEryilmaz
Copy link

It appears, the size keeps increasing

plotly                    5.24.1
179.8M	/usr/local/lib/python3.12/site-packages/plotly

@acepace
Copy link

acepace commented Nov 18, 2024

Just to include that while it's increasing, it's a problem in cases not just of AWS lambda, but anywhere we deploy code (container images, VMDKs, etc.)

What versions of python does plotly currently support? Maybe I can whip up a PR implementing some of the above ideas.

@nocnokneo
Copy link

One consideration: We use Seekable OCI images as a workaround to slow startup time for AWS Fargate tasks. However the sheer number of files with packages like plotly becomes a problem because it's the total file count that affects the size of SOCI index.

@gvwilson
Copy link
Contributor

@acepace we currently support Python >= 3.8 (but haven't started testing with 3.13). Reducing the bundle size is high on our list - getting the 6.0 release out the door has taken priority, but we'd be grateful for help or experiments.

@gvwilson gvwilson self-assigned this Feb 6, 2025
@gvwilson gvwilson added P2 considered for next cycle and removed P3 backlog labels Feb 6, 2025
@gvwilson
Copy link
Contributor

closed in favor of #4817

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken infrastructure build process etc. P2 considered for next cycle
Projects
None yet
Development

No branches or pull requests

10 participants