Skip to content

Memory leak #2092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
momala454 opened this issue May 12, 2021 · 12 comments · Fixed by #2106
Closed

Memory leak #2092

momala454 opened this issue May 12, 2021 · 12 comments · Fixed by #2106

Comments

@momala454
Copy link

This is:

- [x ] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

No memory leak

What is the current behavior?

Memory leak

What are the steps to reproduce?

The memory doesn't stop growing. My files are xls and xlsx, varies from 30kB to 21MB

<?php

require __DIR__ . '/vendor/autoload.php';

foreach (glob($this->filesFolder.'*') as $fileName) {
           echo number_format(memory_get_usage())."\n";
            $spreadsheet = \PhpOffice\PhpSpreadsheet\IOFactory::load($fileName);
           $spreadsheet->disconnectWorksheets();
           $spreadsheet->garbageCollect();
           unset($spreadsheet);
        }

### Which versions of PhpSpreadsheet and PHP are affected?
1.17.1
@MarkBaker
Copy link
Member

A real memory leak would be a bug; but PHPSpreadsheet doesn't suffer from that. Rather, an "in memory" spreadsheet uses a lot of memory, and just because a file that you load is small, doesn't mean that its "in memory" representation is small..... it all depends on the number of worksheets/rows/columns

We provide a series of approaches for working with larger spreadsheets, from loading only specific worksheets, to loading only part of a worksheet; to loading only the raw data in the cells without formatting information; and also provide options to allow caching of cell data, reducing the memory requirements. These are all documented

@momala454
Copy link
Author

momala454 commented May 12, 2021

the memory requirement is not a problem. What is a problem is that each opened excel file increase the memory usage without never releasing the memory

edit : it seems to stick at around 400mb of ram usage while everything should be cleared from memory so it should go back to the same amount that at the beginning

Rather, an "in memory" spreadsheet uses a lot of memory

at the line of the memory_get_usage, the spreadsheet is not "in memory"

@MAKS-dev
Copy link

Looks to me your PHP garbage collection is not directly taking place. (Some details: https://stackoverflow.com/a/584982/4712173)
You can try to call gc_collect_cycles(); after the unset (https://www.php.net/manual/en/function.gc-collect-cycles.php)
Or maybe an usleep to give the CPU some free cycles (depending also on other processes obviously).

Maybe it's something else, but worth a look.

@momala454
Copy link
Author

momala454 commented May 13, 2021

i tried with that after unset, but it still use more and more memory, until reaching around 400mB. It must be the size necessary to load my biggest file. it reached 400mb when my xlsx file is 20mb

gc_collect_cycles();
sleep(1);

PowerKiKi added a commit that referenced this issue May 16, 2021
This also better support image cloning with a proper
clone of the GD resource.

#2092
@PowerKiKi
Copy link
Member

PowerKiKi commented May 16, 2021

Three things:

  • PhpSpreadsheet uses internal cache (that can never be cleared) for some things, so it is expected that memory increase (a bit) and never decrease totally
  • Spreadsheet::garbageCollect() is not useful here, it is not really meant to free memory. And more generally I'm not sure a user should ever call it manually
  • However I did reproduce a few leaks, the most significant one when the spreadsheet contains images. They have been fixed via Memory leak #2101

I used the following script to load pretty much all files existing in, or generated by the project. It read the file two times. First to warm up the cache, and second for real measurement. Before running the script, normal unit tests should be run once to generate all files in /tmp/phpspreadsheet/:

<?php

use PhpOffice\PhpSpreadsheet\IOFactory;

require 'vendor/autoload.php';

function memory(): int
{
    gc_collect_cycles();

    return memory_get_usage();
}

function load(string $fileName): void
{
    $spreadsheet = IOFactory::load($fileName);
    $spreadsheet->disconnectWorksheets();
}

function monitorMemory(callable $callable): int
{
    $before = memory();
    $callable();
    $after = memory();

    $diff = $after - $before;

    echo number_format($before) . PHP_EOL;
    echo number_format($after) . PHP_EOL;
    echo '+ ' . number_format($diff) . PHP_EOL;
    echo PHP_EOL;

    return $diff;
}

$files = [
    ...glob('/tmp/phpspreadsheet/*.xls'),
    ...glob('/tmp/phpspreadsheet/*.xlsx'),
    ...glob('/tmp/phpspreadsheet/*.csv'),
    ...glob('tests/data/Reader/CSV/*'),
    ...glob('tests/data/Reader/Gnumeric/*'),
    ...glob('tests/data/Reader/Ods/*'),
    ...glob('tests/data/Reader/XLS/*'),
    ...glob('tests/data/Reader/XLSX/*'),
];

$except = [
    'tests/data/Reader/CSV/empty.csv',
    'tests/data/Reader/Ods/corruptMeta.ods',
    'tests/data/Reader/XLSX/double_attr_drawing.xlsx',
];

$files = array_diff($files, $except);

foreach ($files as $fileName) {
    $do = function () use ($fileName): void {
        load($fileName);
    };

    echo $fileName . PHP_EOL;

    // Warmup caches
    $do();

    // Do it for real while monitoring memory
    $diff = monitorMemory($do);

    if ($diff > 0) {
        throw new Exception('Memory leak found !');
    }
}

echo count($files) . ' files loaded' . PHP_EOL;
echo 'with a peak memory usage of : ' . number_format(memory_get_peak_usage()) . PHP_EOL;
echo 'with a final memory usage of: ' . number_format(memory()) . PHP_EOL;

This would end up with:

247 files loaded
with a peak memory usage of : 31,957,288
with a final memory usage of: 11,191,304

@momala454
Copy link
Author

momala454 commented May 16, 2021

how to clear the internal caches ? I don't think any script is supposed to keep things in memory once everything have been unset.

thanks for the fixes, once it's published I will test if I see more leaks

@PowerKiKi
Copy link
Member

PhpSpreadsheet uses internal cache (that can never be cleared)

PowerKiKi added a commit that referenced this issue May 17, 2021
When creating a spreadsheet, and writing to Xlsx, then to Xls, then
reading the Xls, it would leak memory during reading.

Fixes #2092
@maximnl
Copy link

maximnl commented May 4, 2022

we use chunkreadfilter , this loads only the range of rows and lets u limit the arrays
however you need to take care of removing all leading NULL rows after the first chunk.
i use a php array function , works ok.

still , i am able to load about 1K rows per few sec, so large files just time out after 40K+ rows. (2-3 minutes)

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Sep 13, 2022
All but 6 chart samples can be rendered by Sample35. Of those 6, 3 of the problems are because the script runs out of memory processing them. Adopting a suggestion from @MAKS-dev in issue PHPOffice#2092, adding a call to gc_collect_cycles after the charts from each spreadsheet is rendered appears to make it possible to include those 3 spreadsheets in Sample35 after all.

Also take advantage of this opportunity to correct a number (hopefully all) of Scrutinizer problems with JpgraphRendererBases.
@oleibman oleibman mentioned this issue Sep 13, 2022
7 tasks
oleibman added a commit that referenced this issue Sep 14, 2022
* Memory Leak in Sample35

All but 6 chart samples can be rendered by Sample35. Of those 6, 3 of the problems are because the script runs out of memory processing them. Adopting a suggestion from @MAKS-dev in issue #2092, adding a call to gc_collect_cycles after the charts from each spreadsheet is rendered appears to make it possible to include those 3 spreadsheets in Sample35 after all.

Also take advantage of this opportunity to correct a number (hopefully all) of Scrutinizer problems with JpgraphRendererBases.

* Minor Fix

Problem running 8.1 unit tests.

* Resolve Problems with Pie 3D Charts

Minor fix, leaving only one spreadsheet unusable in Sample35. The reasons for its unusability are now documented in the code.

* Mitoteam Made Changes

Discussing this problem with them, they decided they should make a change for Pie3D rather than forcing us to use the workaround pushed earlier. Change to require mitoteam 10.2.3, revert workaround.
@IronSean
Copy link

PhpSpreadsheet uses internal cache (that can never be cleared)

This is a pretty frustrating behaviour if you're using PhpSpreadsheet in a webserver thread that gets reused to process excel file uploads. Users are unlikely to upload the same excel file as each other so the cache is useless, but the memory can't be cleared.

@smenzer
Copy link

smenzer commented Jul 18, 2023

I'm having similar problems...I'm using a chunkreadfilter to process a large (250k rows) spreadsheet in chunks. However, memory continues to grow even as I unset variables when I am done with them. here's a code snippet:

// self::$_CHUNK_SIZE = 25000;
for ($row = 2; $row <= $nb_rows; $row += self::$_CHUNK_SIZE) {
      $chunk_filter->setRows($row, self::$_CHUNK_SIZE);
      $reader->setReadFilter($chunk_filter);
      $sheet = $reader->load($filepath);

      $raw_data = $sheet->getActiveSheet()->toArray();
      unset($sheet);

      $upload_string = $this->prepareDataForBulkUpload($raw_data);
      unset($raw_data);

      // do something with $upload_string

      unset($upload_string);
}

I would expect that the $sheet, $raw_data, $upload_string variables to be quite large, but after unseting them, the memory should be freed up. I have put several memory_get_usage() calls throughout the code, and I can see that unset($raw_data) and unset($upload_string) clears up 10's of MBs, but the unset($sheet) clears nothing. Because of this, the script's memory grows by 150MB on each loop, which is not sustainable.

@PowerKiKi - What can be done to free up the memory usage?

@MarkBaker
Copy link
Member

Suggestion #1
From the documentation you can't simply unset the file that you've loaded, because PHP can't clear cyclic references in the object tree. You need to explicitly break those references by calling the Spreadsheet's disconnectWorksheets() method first

Suggestion #2
Avoid using toArray() if you possibly can. PHP arrays are an incredible memory hog

@smenzer
Copy link

smenzer commented Jul 18, 2023

@MarkBaker thanks for the prompt reply! the disconnectWorksheets() did the trick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

7 participants