Skip to content

Commit a79ac8b

Browse files
authored
Fix performance issue caused by using repeated > characters inside <!DOCTYPE root [<!-- PAYLOAD -->]> (#174)
A `<` is treated as a string delimiter. In certain cases, if `<` is used in succession, read and match are repeated, which slows down the process. Therefore, the following is used to read ahead to a specific part of the string in advance.
1 parent c33ea49 commit a79ac8b

File tree

2 files changed

+8
-1
lines changed

2 files changed

+8
-1
lines changed

lib/rexml/parsers/baseparser.rb

+1-1
Original file line numberDiff line numberDiff line change
@@ -378,7 +378,7 @@ def pull_event
378378
raise REXML::ParseException.new(message, @source)
379379
end
380380
return [:notationdecl, name, *id]
381-
elsif md = @source.match(/--(.*?)-->/um, true)
381+
elsif md = @source.match(/--(.*?)-->/um, true, term: Private::COMMENT_TERM)
382382
case md[1]
383383
when /--/, /-\z/
384384
raise REXML::ParseException.new("Malformed comment", @source)

test/parse/test_document_type_declaration.rb

+7
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,13 @@ def test_gt_linear_performance_malformed_entity
290290
end
291291
end
292292

293+
def test_gt_linear_performance_comment
294+
seq = [10000, 50000, 100000, 150000, 200000]
295+
assert_linear_performance(seq, rehearsal: 10) do |n|
296+
REXML::Document.new('<!DOCTYPE root [<!-- ' + ">" * n + ' -->]>')
297+
end
298+
end
299+
293300
private
294301
def parse(internal_subset)
295302
super(<<-DOCTYPE)

0 commit comments

Comments
 (0)