Skip to content

Commit c33ea49

Browse files
authored
Fix performance issue caused by using repeated > characters after <!DOCTYPE name (#173)
A `<` is treated as a string delimiter. In certain cases, if `<` is used in succession, read and match are repeated, which slows down the process. Therefore, the following is used to read ahead to a specific part of the string in advance.
1 parent 9f1415a commit c33ea49

File tree

2 files changed

+16
-1
lines changed

2 files changed

+16
-1
lines changed

lib/rexml/parsers/baseparser.rb

+2-1
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,7 @@ module Private
128128
INSTRUCTION_TERM = "?>"
129129
COMMENT_TERM = "-->"
130130
CDATA_TERM = "]]>"
131+
DOCTYPE_TERM = "]>"
131132
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
132133
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
133134
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
@@ -384,7 +385,7 @@ def pull_event
384385
end
385386
return [ :comment, md[1] ] if md
386387
end
387-
elsif match = @source.match(/(%.*?;)\s*/um, true)
388+
elsif match = @source.match(/(%.*?;)\s*/um, true, term: Private::DOCTYPE_TERM)
388389
return [ :externalentity, match[1] ]
389390
elsif @source.match(/\]\s*>/um, true)
390391
@document_status = :after_doctype

test/parse/test_document_type_declaration.rb

+14
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
# frozen_string_literal: false
22
require "test/unit"
3+
require "core_assertions"
4+
35
require "rexml/document"
46

57
module REXMLTests
68
class TestParseDocumentTypeDeclaration < Test::Unit::TestCase
9+
include Test::Unit::CoreAssertions
10+
711
private
812
def parse(doctype)
913
REXML::Document.new(<<-XML).doctype
@@ -276,6 +280,16 @@ def test_notation_attlist
276280
doctype.children.collect(&:class))
277281
end
278282

283+
def test_gt_linear_performance_malformed_entity
284+
seq = [10000, 50000, 100000, 150000, 200000]
285+
assert_linear_performance(seq, rehearsal: 10) do |n|
286+
begin
287+
REXML::Document.new('<!DOCTYPE root [' + "%>" * n + ']><test/>')
288+
rescue
289+
end
290+
end
291+
end
292+
279293
private
280294
def parse(internal_subset)
281295
super(<<-DOCTYPE)

0 commit comments

Comments
 (0)