-
Notifications
You must be signed in to change notification settings - Fork 510
HTML in text handling error? #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yeah, essentially this.
and then diffs the sequence of tokens. So without the space, |
More thoughts: I'm hoping to significantly rework the
Then you could diff this with The crux from my perspective is, though, a tokenization approach optimised for meaningfully diffing HTML is likely to look very different for one meant for diffing natural language text, because the syntax of HTML vs ordinary text is meaningfully different, and so I wouldn't even really want to add options to It's fine, of course, to ignore this and use For that reason, although this issue is not unreasonable, I'm gonna close it as "Won't fix". |
In the demo, with the text
<p>Guess what?</p>
in the first field, and a space after the?
in the second ...It seems to think the
</
has changed as well? Perhaps it groups all non-alphanumerics? I'd suggest adding a parameter that gives tags special treatment.Edit: I guess this isn't designed to handle HTML tags at all :) Is there a way to render existing tags (such as
<p></p>
or<h1></h1>
)?The text was updated successfully, but these errors were encountered: