-
-
Notifications
You must be signed in to change notification settings - Fork 2
Performance Issue: fromHtml is 10x Slower than hast-util-from-parse5 #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Did you see the source? https://github.com/syntax-tree/hast-util-from-html/blob/main/lib/index.js I'm not sure what you're thinking. Like, source's pretty clear. You're probably turning some options on or off and profiling the difference? |
This comment was marked as off-topic.
This comment was marked as off-topic.
And I'm not sure what you're thinking since everything I did is described pretty clearly to me and easy to reproduce. It's actually 10 times slower, using parse5 ^7.1.2.
|
Setting file to true, which results in type a error, but returns an equal hast tree after removing only the position property from root object. import { fromHtml } from "hast-util-from-html";
import { fromParse5 } from "hast-util-from-parse5";
import { parseFragment } from "parse5";
import { describe, expect, it } from "vitest";
describe("hast parser hast-util-from-html vs hast-util-from-parse5", () => {
// Deep equal fromHtml and fromParse
it("should deep equal HAST correctly", async () => {
const html = `<div class="container"><header><h1>Welcome to My Website</h1><nav><ul><li><a href="/home">Home</a></li><li><a href="/about">About Us</a></li><li><a href="/services">Services</a></li><li><a href="/contact">Contact</a></li></ul></nav></header><main><section class="intro"><h2>About Our Company</h2><p>We are a <strong>leading</strong> company in the industry, with over <em>20 years</em> of experience.</p><p>Our mission is to provide <a href="/services#quality">top-notch services</a> to our clients.</p></section><section class="features"><h2>Our Features</h2><ul><li>Feature 1: High quality</li><li>Feature 2: Reliability</li><li>Feature 3: Affordability</li></ul></section><section class="testimonials"><h2>What Our Clients Say</h2><blockquote><p>"This company exceeded our expectations!"</p><footer>- Jane Doe, CEO of ExampleCorp</footer></blockquote><blockquote><p>"Outstanding service and support!"</p><footer>- John Smith, Manager at TechSolutions</footer></blockquote></section></main><aside><h3>Latest News</h3><article><h4>New Product Launch</h4><p>We are excited to announce the launch of our new product, <strong>ProductX</strong>. <a href="/news/productx">Read more</a></p></article><article><h4>Upcoming Webinar</h4><p>Join us for an exclusive webinar on industry trends. <a href="/webinar">Register now</a></p></article></aside><footer><p>© 2024 My Website. All rights reserved.</p><p><a href="/privacy-policy">Privacy Policy</a> | <a href="/terms-of-service">Terms of Service</a></p></footer></div>`;
const hast = fromHtml(html, { fragment: true });
const hast1 = fromParse5(parseFragment(html, { sourceCodeLocationInfo: true }), {
file: true
});
// Remove last position property for deep equal check
delete (hast as any).position;
delete (hast1 as any).position;
expect(hast).toEqual(hast1);
});
}); |
Please learn more about TypeScript.
Why stop there? If you are profiling things, you can find out what branches take time? |
I wish you good luck with the management of your projects @wooorm. |
Am I to understand that you are not at all intersted in contributing to open source? You just want to say that there might be a problem, and then expect someone to do work for you? |
@wooorm your attitude demotivates me significantly to even make any contribution to your repositories, it surely motivates me to fork and do a complete recreation of it... other then that, I am done here. |
You don‘t understand what open source is. What you describe is exactly what I give you: https://github.com/syntax-tree/hast-util-from-html/blob/main/license. What you don’t get, is free customer service. |
This comment has been minimized.
This comment has been minimized.
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
This comment was marked as abuse.
The benchmark here was indeed benchmarking different things. The comparison would be: import fs from 'node:fs/promises'
import {VFile} from 'vfile'
import {fromHtml} from 'hast-util-from-html'
import {fromParse5} from 'hast-util-from-parse5'
import {parse} from 'parse5'
const html = await fs.readFile('example.html', 'utf8')
const file = new VFile(html)
console.time('hast-util-from-html')
fromHtml(file)
console.timeEnd('hast-util-from-html')
console.time('parse + hast-util-from-parse5')
fromParse5(parse(html, {sourceCodeLocationInfo: true}), {file})
console.timeEnd('parse + hast-util-from-parse5') (note the There’s still a tiny bit more work done in |
VFile results in a significant performance reduction... |
Also as I've stated here... slows it down by 10 times. |
See the difference by yourself. Note: I've disabled the vFile. So I would have 10 times the performance it should have. import fs from "node:fs/promises";
import { fromHtml } from "hast-util-from-html";
import { fromParse5 } from "hast-util-from-parse5";
import { parse } from "parse5";
import { VFile } from "vfile";
function calculateStatistics(times) {
times.sort((a, b) => a - b);
const sum = times.reduce((a, b) => a + b, 0);
const mean = sum / times.length;
const p75 = times[Math.floor(0.75 * times.length)];
const p99 = times[Math.floor(0.99 * times.length)];
const p995 = times[Math.floor(0.995 * times.length)];
const p999 = times[Math.floor(0.999 * times.length)];
const min = times[0];
const max = times[times.length - 1];
const rme = (Math.max(...times.map(t => Math.abs(t - mean))) / mean) * 100;
return {
hz: (1000 / mean).toFixed(2), // operations per second
min: `${min.toFixed(2)}ms`,
max: `${max.toFixed(2)}ms`,
mean: `${mean.toFixed(2)}ms`,
p75: `${p75.toFixed(2)}ms`,
p99: `${p99.toFixed(2)}ms`,
p995: `${p995.toFixed(2)}ms`,
p999: `${p999.toFixed(2)}ms`,
rme: `${rme.toFixed(2)}%`,
samples: times.length
};
}
const html = await fs.readFile("src/example.html", "utf8");
const file = new VFile(html);
let times = [];
console.time("hast-util-from-html");
for (let i = 0; i < 10; i++) {
const start = performance.now();
fromHtml(file);
const end = performance.now();
times.push(end - start);
}
console.timeEnd("hast-util-from-html");
let stats = calculateStatistics(times);
console.log("hast-util-from-html stats:", stats);
times = [];
console.time("parse + hast-util-from-parse5");
for (let i = 0; i < 10; i++) {
const start = performance.now();
fromParse5(parse(html, { sourceCodeLocationInfo: true }), { file: true });
const end = performance.now();
times.push(end - start);
}
console.timeEnd("parse + hast-util-from-parse5");
stats = calculateStatistics(times);
console.log("parse + hast-util-from-parse5 stats:", stats); |
The MIT license grants you the right to use the code freely. However, it does not give you the right to demand maintainer time or to engage in sarcastic or disrespectful behavior. Performance is important, and there is a shared interest in maintaining it. I encourage you to adopt a more constructive tone in your feedback. If you're concerned about performance issues, consider generating a flamegraph to identify what is causing the delays on your system: https://nodejs.org/en/learn/diagnostics/flame-graphs. The tool https://github.com/davidmarkclements/0x is particularly effective for this purpose. |
@ChristianMurphy Hey for the record, I was trying to help to state that this vFile is causing significant performance reduction. I'm here being called someone who doesn't know typescript, my code is faulty and that I should read the code first. After that he's telling me I should read the MIT license and don't know about opensource. This @wooorm is being completely indecent with his comments and you expect me to keep my respect towards him?, |
I understand that you're frustrated, and I appreciate your intent to help by pointing out the performance issue. However, let's address a few things clearly:
You made a comment about this project not being in TypeScript, when in fact, it uses TypeScript through JSDoc syntax. The comment was dismissive and suggested a lack of familiarity with a feature of TypeScript. We all have gaps in our knowledge—what matters is how we address them. Being rude isn't necessary.
The point being made, which I support, is that while you're free to use, read, and modify the source code, that doesn't entitle you to free support or to treat others disrespectfully.
Yes, respect is non-negotiable. Open source relies on the generosity and collaboration of its community. The maintainers are dedicating their time to help, often without compensation. Keeping the discourse respectful is essential for a productive environment. I understand your frustration, but directing that frustration at the people trying to assist you is counterproductive. Let's work together constructively. |
Can you tell this Titus as well... not the first time he's being disrespectful towards others. If he's not willing to be able receive critics he might be not suited for handling issues. |
I hear your concerns, and I want to emphasize that maintaining a respectful tone is important for everyone involved, including Titus. It's clear that tensions are high, but we need to remember that open-source projects thrive on constructive criticism and collaboration. Titus, as the project owner and primary contributor, has invested a significant amount of time and effort into this project. His directness can sometimes come across as harsh, but his intention is to maintain the quality and integrity of the project. That said, it's crucial for everyone—regardless of their role—to communicate respectfully. I'll address your feedback with Titus, but I also encourage you to approach these discussions with the same respect you’re asking for. Let’s work on keeping the dialogue productive and focused on the issues at hand. |
Initial checklist
Affected packages and versions
hast-util-from-html
Link to runnable example
No response
Steps to reproduce
Set up a benchmark using src/parser.bench.ts.
Load html test: https://gist.github.com/zolero/6ecc0d238595b37fabb42cc86588998b
Run the benchmarks comparing hast-util-from-html and hast-util-from-parse5.
Observe the performance difference.
Expected behavior
I expected fromHtml to have similar or better performance compared to hast-util-from-parse5, considering both are intended to perform similar tasks.
Actual behavior
fromHtml is significantly slower than hast-util-from-parse5, which raises concerns about its efficiency in performance-critical applications.
Benchmark Results
Here are the results of my performance tests:
Additional benchmark data:
BENCH Summary
hast-util-from-parse5 + w/o sourceCodeLocationInfo is 1.47x faster than hast-util-from-parse5 + sourceCodeLocationInfo.
hast-util-from-parse5 + w/o sourceCodeLocationInfo is 7.32x faster than hast-util-from-html.
Affected runtime and version
v22.6.0
Affected package manager and version
1.22.22
Affected OS and version
Win 11
Build and bundle tools
Rollup
The text was updated successfully, but these errors were encountered: