Skip to content

When using Mpdf as a writer exported PDF contains CSS #2432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sime15 opened this issue Dec 2, 2021 · 5 comments · Fixed by #2434
Closed

When using Mpdf as a writer exported PDF contains CSS #2432

sime15 opened this issue Dec 2, 2021 · 5 comments · Fixed by #2434

Comments

@sime15
Copy link

sime15 commented Dec 2, 2021

This is:

- [x] a bug report

What is the expected behavior?

When exporting to PDF and using Mpdf as a writer, exported file should contain just data which needs to be exported, not CSS or any other HTML.

What is the current behavior?

Exported file contains CSS.

What are the steps to reproduce?

Try to export about 1000, or more, rows to PDF. CSS will be generated per row and it will contain more than 1000 lines, so exported file will have broken CSS included and is broken because of this(Mpdf class):

$html = $this->generateHTMLAll();
foreach (\array_chunk(\explode(PHP_EOL, $html), 1000) as $lines) {
    $pdf->WriteHTML(\implode(PHP_EOL, $lines));
}

Which versions of PhpSpreadsheet and PHP are affected?

1.20.0

@oleibman
Copy link
Collaborator

oleibman commented Dec 2, 2021

What exactly is the problem you are seeing? I regularly test by generating a file which consists of > 4000 lines of html (so 5 chunks are passed in turn to Mpdf), and don't see a problem with "broken css" or the look of the PDF file. The css (and html) absolutely needs to be passed to Mpdf so that its WriteHtml method can format the output correctly.

@sime15
Copy link
Author

sime15 commented Dec 3, 2021

Of course that CSS and HTML needs to be passed to Mpdf, but in my opinion issue is with chunks with length of 1000 lines of code. If CSS contains more than 1000 lines it will be split into two chunks and WriteHTML method won't process it properly. When exported file contains table with around 1000 rows(in my case 961 row) methods generateHtmlAll(), buildCss(), buildCssPerSheet() and buildCssRowHeights() will produce more than 1000 rows of CSS.

This is how the first page of my exported PDF look like:
image

This is how generated HTML look like:
image

@oleibman
Copy link
Collaborator

oleibman commented Dec 3, 2021

Interesting - thank you for the extra detail. When I generate a spreadsheet whose head section extends past the first chunk, I do see a problem. Chunking anywhere else does not seem to be a problem. This seems like a bug in Mpdf to me. At any rate, I can investigate further to see if there's anything that can be done. I do not know why the chunking is needed, although I suspect it's memory-related - I already had to override my default memory_limit just to get my script to run, even with chunking.

@sime15
Copy link
Author

sime15 commented Dec 3, 2021

I also believe that chunking is there because of memory. Maybe solution might be to first pass the whole <head> to Mpdf(explode by </head> or something like that) and than do the chunking of everything else. I don't see why CSS would break the memory no matter how long it is. Just a suggestion, I don't suspect that you will get to the best solution. Thank you for your cooperation.

@oleibman
Copy link
Collaborator

oleibman commented Dec 3, 2021

This appears to be the relevant section in the Mpdf documentation

If <body> tags are found, all $html outside these tags are discarded, and the rest is parsed as content for the document.

If no <body> tags are found, all remaining $html is parsed as content.

So, Mpdf is operating as designed here. It might be possible to get things working within that constraint.

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Dec 4, 2021
Fix PHPOffice#2432. Probably for memory reasons, PhpSpreadsheet divides its data into chunks when writing to Mpdf. However, if the first chunk has so many styles that the `body` tag is not included in the chunk, Mpdf will not handle it correctly. Code is changed to ensure that the first chunk always contains the body tag.

Because this error becomes evident only when opening the PDF file itself, it is difficult to write a test case. I have instead added a new sample file which meets the conditions which would have led to the error, and which can be examined to show that it is created correctly.
oleibman added a commit that referenced this issue Dec 7, 2021
Fix #2432. Probably for memory reasons, PhpSpreadsheet divides its data into chunks when writing to Mpdf. However, if the first chunk has so many styles that the `body` tag is not included in the chunk, Mpdf will not handle it correctly. Code is changed to ensure that the first chunk always contains the body tag.

Because this error becomes evident only when opening the PDF file itself, it is difficult to write a test case. I have instead added a new sample file which meets the conditions which would have led to the error, and which can be examined to show that it is created correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants