Skip to content

htmlDocContentDumpFormatOutput Messing Up HTML Document #1312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ossie-git opened this issue Jan 28, 2017 · 2 comments
Closed

htmlDocContentDumpFormatOutput Messing Up HTML Document #1312

ossie-git opened this issue Jan 28, 2017 · 2 comments
Assignees

Comments

@ossie-git
Copy link

ossie-git commented Jan 28, 2017

The call to htmlDocContentDumpFormatOutput in msc_crypt.c truncates the document in strange ways. Even a document as simple as:

<!DOCTYPE html>
<html>
<body>

<a href="../2.html">This is a link</a>
<a href="../3.html">This is another link</a>

</body>
</html>

has the < on the first line truncated with more sophisticated documents having more issues. The HTML is generated fine with the hashes (see my path for the other bug) and I verified this by using the following:

FILE *fp;
fp=fopen("/tmp/htmlDocDump.txt", "w"); 
...
htmlDocDump(fp,(xmlDocPtr) msr->crypto_html_tree);

in inject_hashed_response_body. Any ideas?

@ossie-git
Copy link
Author

ossie-git commented Jan 29, 2017

Just to add an example, the default Apache homepage on Ubuntu was truncated by about 8000 bytes (from 11321 bytes before adding the hash to 3013 after adding it). Here are the debug logs:

[/index.html][4] Hook insert_filter: Adding output filter (r 7f02fdacc0a0).
[/index.html][9] Output filter: Receiving output (f 7f02fdadac00, r 7f02fdacc0a0).
[/index.html][4] Starting phase RESPONSE_HEADERS.
[/index.html][9] This phase consists of 0 rule(s).
[/index.html][9] Content Injection: Nothing to inject.
[/index.html][9] Output filter: Bucket type MMAP contains 11321 bytes.
[/index.html][9] Output filter: Bucket type EOS contains 0 bytes.
[/index.html][4] Output filter: Completed receiving response body (buffered full - 11321 bytes).
[/index.html][4] init_response_body_html_parser: assuming ISO-8859-1.
[/index.html][4] init_response_body_html_parser: Successfully html parser generated.
[/index.html][4] Signing data [/docs/2.4/mod/mod_userdir.html]
[/index.html][4] hash_response_body_links: Processed [0] iframe src, [0] hashed.
[/index.html][4] hash_response_body_links: Processed [0] frame src, [0] hashed.
[/index.html][4] hash_response_body_links: Processed [0] form actions, [0] hashed.
[/index.html][4] hash_response_body_links: Processed [9] links, [1] hashed.
[/index.html][4] inject_hashed_response_body: Detected encoding type [ISO-8859-1].
[/index.html][4] inject_hashed_response_body: Using content-type [ISO-8859-1].
[/index.html][4] inject_hashed_response_body: Copying XML tree from CONV to stream buffer [%zu] bytes.
[/index.html][4] inject_hashed_response_body: Setting new content value 3013
[/index.html][4] inject_hashed_response_body: Stream buffer [3013]. Done
[/index.html][4] Hash completed in 7996 usec.
[/index.html][4] Starting phase RESPONSE_BODY.
[/index.html][9] This phase consists of 0 rule(s).
[/index.html][9] Content Injection: Data reinjected bytes [3013]
[/index.html][4] Output filter: Output forwarding complete.
[/index.html][4] Initialising logging.
[/index.html][4] Starting phase LOGGING.
[/index.html][9] This phase consists of 0 rule(s).
[/index.html][4] Recording persistent data took 0 microseconds.
[/index.html][4] Audit log: Logging this transaction.

@zimmerle zimmerle self-assigned this May 22, 2017
@zimmerle
Copy link
Contributor

Same as #742

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants