Skip to content

Commit 67efb59

Browse files
authored
Fix performance issue caused by using repeated > characters inside <!DOCTYPE name [<!ENTITY>]> (#175)
A `<` is treated as a string delimiter. In certain cases, if `<` is used in succession, read and match are repeated, which slows down the process. Therefore, the following is used to read ahead to a specific part of the string in advance.
1 parent a79ac8b commit 67efb59

File tree

2 files changed

+13
-2
lines changed

2 files changed

+13
-2
lines changed

lib/rexml/parsers/baseparser.rb

+6-2
Original file line numberDiff line numberDiff line change
@@ -124,11 +124,15 @@ class BaseParser
124124
}
125125

126126
module Private
127-
INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
127+
# Terminal requires two or more letters.
128128
INSTRUCTION_TERM = "?>"
129129
COMMENT_TERM = "-->"
130130
CDATA_TERM = "]]>"
131131
DOCTYPE_TERM = "]>"
132+
# Read to the end of DOCTYPE because there is no proper ENTITY termination
133+
ENTITY_TERM = DOCTYPE_TERM
134+
135+
INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
132136
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
133137
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
134138
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
@@ -313,7 +317,7 @@ def pull_event
313317
raise REXML::ParseException.new( "Bad ELEMENT declaration!", @source ) if md.nil?
314318
return [ :elementdecl, "<!ELEMENT" + md[1] ]
315319
elsif @source.match("ENTITY", true)
316-
match = [:entitydecl, *@source.match(Private::ENTITYDECL_PATTERN, true).captures.compact]
320+
match = [:entitydecl, *@source.match(Private::ENTITYDECL_PATTERN, true, term: Private::ENTITY_TERM).captures.compact]
317321
ref = false
318322
if match[1] == '%'
319323
ref = true

test/parse/test_entity_declaration.rb

+7
Original file line numberDiff line numberDiff line change
@@ -32,5 +32,12 @@ def test_empty
3232
<!ENTITY> ]> <r/>
3333
DETAIL
3434
end
35+
36+
def test_gt_linear_performance
37+
seq = [10000, 50000, 100000, 150000, 200000]
38+
assert_linear_performance(seq, rehearsal: 10) do |n|
39+
REXML::Document.new('<!DOCTYPE rubynet [<!ENTITY rbconfig.ruby_version "' + '>' * n + '">')
40+
end
41+
end
3542
end
3643
end

0 commit comments

Comments
 (0)