Skip to content

Avoid allocating RowDimension when the row is visible by default #3527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

goetas
Copy link

@goetas goetas commented Apr 20, 2023

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

When there is an auto filter range defined as A1:A:100000, it forces the filter to iterate over all the rows in the range and see if needs to hide them ($this->workSheet->getRowDimension((int) $row)->setVisible($result);)

that allocates a RowDimension object for each row for each sheet, and boom, a lot of memory wasted.

This change allocates the RowDimension object only if the row visibility needs to be changed.

@oleibman
Copy link
Collaborator

oleibman commented May 9, 2023

You need to add a unit test to demonstrate the effectiveness of your change, perhaps something which demonstrates that the an auto-filtered cell is visible even though the row dimension is not allocated. (That's what I'm assuming your change does; please correct me if I've misunderstood.)

@oleibman
Copy link
Collaborator

In addition, the effects of this change will be undone if the file is saved because the writers use getRowDimension without testing rowDimensionExists, so will wind up allocating the row dimension after all. Changing that would need to be part of this change.

@goetas
Copy link
Author

goetas commented Aug 18, 2023

My guess is that i was pushing to the limits this library. We have ~5 tabs with ~10-90k rows each with formulas, filters and styling... all of it was really CPU and memory intensive (even by dropping some features or monkey patching some parts of the lib, 20G memory and 8 cores were not able to keep a satisfactory generation time for us). We decided to use the less supported https://en.wikipedia.org/wiki/SpreadsheetML , that has its own problems but at least CPU and memory problems are gone.

I'm closing this as I'm not actively working on it anymore.

@goetas goetas closed this Aug 18, 2023
oleibman added a commit to oleibman/PhpSpreadsheet that referenced this pull request Aug 27, 2023
This PR builds on PR PHPOffice#3527, introduced but subsequently closed by @goetas. The observation was that AutoFilter did not need to allocate a new RowDimension when the row was not to be filtered. While vetting the PR, it became apparent that Xlsx Writer also allocates RowDimension unnecessarily, with some minor adjustments possible for Ods and Mpdf as well. My tests confirm the initial observation that there can be a considerable memory savings when RowDimension is allocated only when needed. So, even though the original PR is withdrawn, there seems to be value in proceeding with it anyhow.
oleibman added a commit that referenced this pull request Sep 2, 2023
* Avoid Allocating RowDimension Unneccesarily

This PR builds on PR #3527, introduced but subsequently closed by @goetas. The observation was that AutoFilter did not need to allocate a new RowDimension when the row was not to be filtered. While vetting the PR, it became apparent that Xlsx Writer also allocates RowDimension unnecessarily, with some minor adjustments possible for Ods and Mpdf as well. My tests confirm the initial observation that there can be a considerable memory savings when RowDimension is allocated only when needed. So, even though the original PR is withdrawn, there seems to be value in proceeding with it anyhow.

* Add Some Tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants