Skip to content

Fix HTML parse with empty lines #537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 24, 2017
Merged

Fix HTML parse with empty lines #537

merged 1 commit into from
Jan 24, 2017

Conversation

facelessuser
Copy link
Collaborator

If both open and close was not found in first block, additional blocks
were evaluated without context of previous blocks. The algorithm needs
to evaluate a buffer with the left bracket present. So feed in all
items and get the right bracket, then adjust the data_index to be
relative to the last block. Ref: #452

@facelessuser
Copy link
Collaborator Author

Wasn't planning on fixing anything else, but I overlooked this as one that could be fixed pre 3.0. I don't know about the HTML parsing algorithm on a whole (seems it takes some shortcuts), but I was at least able to identify why it couldn't resolve the right bracket across a gap.

If both open and close was not found in first block, additional blocks
were evaluated without context of previous blocks.  The algorithm needs
to evaluate a buffer with the left bracket present.  So feed in all
items and get the right bracket, then adjust the data_index to be
relative to the last block.
@facelessuser
Copy link
Collaborator Author

Personal opinion on this. For 3.0, I would either rewrite this to not use recursion and also track a state of depth. So when you call it in subsequent blocks, it knows how many openings it has to resolve until done. I don't recall if this is already on your roadmap or not. This current algorithm gets slower the more blocks it has to process. For every block it doesn't find the end tag it is looking for, it has to process i + 1 blocks. So if you had to process 3 blocks to find your close tag, you actually processed 1 + 2 + 3 = 6 blocks (re-iterating blocks you already processed). I think this is a fine fix for now as that is how the algorithm is designed, but not ideal for the future. I wanted to fix the current algorithm, but I wasn't willing at this time to tackle rewriting. Maybe if I'm more ambitious in the future or you haven't gotten to it first :).

@waylan waylan merged commit 94962cb into Python-Markdown:master Jan 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants