Skip to content

Commit c1b64c1

Browse files
authored
Fix performance issue caused by using repeated > characters inside comments (#171)
A `<` is treated as a string delimiter. In certain cases, if `<` is used in succession, read and match are repeated, which slows down the process. Therefore, the following is used to read ahead to a specific part of the string in advance.
1 parent 0af55fa commit c1b64c1

File tree

2 files changed

+13
-1
lines changed

2 files changed

+13
-1
lines changed

lib/rexml/parsers/baseparser.rb

+2-1
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,7 @@ class BaseParser
126126
module Private
127127
INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
128128
INSTRUCTION_TERM = "?>"
129+
COMMENT_TERM = "-->"
129130
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
130131
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
131132
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
@@ -243,7 +244,7 @@ def pull_event
243244
return process_instruction(start_position)
244245
elsif @source.match("<!", true)
245246
if @source.match("--", true)
246-
md = @source.match(/(.*?)-->/um, true)
247+
md = @source.match(/(.*?)-->/um, true, term: Private::COMMENT_TERM)
247248
if md.nil?
248249
raise REXML::ParseException.new("Unclosed comment", @source)
249250
end

test/parse/test_comment.rb

+11
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
11
require "test/unit"
2+
require "core_assertions"
3+
24
require "rexml/document"
35

46
module REXMLTests
57
class TestParseComment < Test::Unit::TestCase
8+
include Test::Unit::CoreAssertions
9+
610
def parse(xml)
711
REXML::Document.new(xml)
812
end
@@ -117,5 +121,12 @@ def test_after_root
117121

118122
assert_equal(" ok comment ", events[:comment])
119123
end
124+
125+
def test_gt_linear_performance
126+
seq = [10000, 50000, 100000, 150000, 200000]
127+
assert_linear_performance(seq, rehearsal: 10) do |n|
128+
REXML::Document.new('<!-- ' + ">" * n + ' -->')
129+
end
130+
end
120131
end
121132
end

0 commit comments

Comments
 (0)