Skip to content

Commit 3b9aefb

Browse files
committed
HTML API: Validate HTML Processor against external test suite from html5lib.
In this patch, the test suite from html5lib validates the tree-construction steps in the HTML Processor to ensure that they are behaving according to the HTML specification. This suite of tests is also used by the servo project to test its html5ever package. A new test module in the HTML API transforms HTML Processor output to match the expected tree shape from the external tests. For cases where there are tests validating behaviors of unsupported HTML tags and constructs, the tests are marked as skipped. As the HTML API continues to expand its own support, the number of skipped tests will automatically shrink down towards zero. Additional tests are skipped through the `SKIP_TEST` array in the test runner. Fixes #60227. See #58517. Props azaozz, costdev, dmsnell, hellofromtonya, jonsurrell, jorbin, swisspidy. git-svn-id: https://develop.svn.wordpress.org/trunk@58010 602fd350-edb4-49c9-b593-d223f7449a82
1 parent b39b75c commit 3b9aefb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+24645
-0
lines changed

phpunit.xml.dist

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
<group>ms-files</group>
2929
<group>ms-required</group>
3030
<group>external-http</group>
31+
<group>html-api-html5lib-tests</group>
3132
</exclude>
3233
</groups>
3334
<logging>
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.dat -text diff
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
Credits
2+
=======
3+
4+
The ``html5lib`` test data is maintained by:
5+
6+
- James Graham
7+
- Geoffrey Sneddon
8+
9+
10+
Contributors
11+
------------
12+
13+
- Adam Barth
14+
- Andi Sidwell
15+
- Anne van Kesteren
16+
- David Flanagan
17+
- Edward Z. Yang
18+
- Geoffrey Sneddon
19+
- Henri Sivonen
20+
- Ian Hickson
21+
- Jacques Distler
22+
- James Graham
23+
- Lachlan Hunt
24+
- lantis63
25+
- Mark Pilgrim
26+
- Mats Palmgren
27+
- Ms2ger
28+
- Nolan Waite
29+
- Philip Taylor
30+
- Rafael Weinstein
31+
- Ryan King
32+
- Sam Ruby
33+
- Simon Pieters
34+
- Thomas Broyer
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Copyright (c) 2006-2013 James Graham, Geoffrey Sneddon, and
2+
other contributors
3+
4+
Permission is hereby granted, free of charge, to any person obtaining
5+
a copy of this software and associated documentation files (the
6+
"Software"), to deal in the Software without restriction, including
7+
without limitation the rights to use, copy, modify, merge, publish,
8+
distribute, sublicense, and/or sell copies of the Software, and to
9+
permit persons to whom the Software is furnished to do so, subject to
10+
the following conditions:
11+
12+
The above copyright notice and this permission notice shall be
13+
included in all copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
19+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
20+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
21+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# html5lib-tests
2+
3+
This directory contains a third-party test suite used for testing the WordPress HTML API.
4+
5+
`html5lib-tests` can be found on GitHub at [html5lib/html5lib-tests](https://github.com/html5lib/html5lib-tests).
6+
7+
The necessary files have been copied to this directory:
8+
9+
- `AUTHORS.rst`
10+
- `LICENSE`
11+
- `README.md`
12+
- `tree-construction/README.md`
13+
- `tree-construction/*.dat`
14+
15+
The version of these files was taken from the git commit with
16+
SHA [`a9f44960a9fedf265093d22b2aa3c7ca123727b9`](https://github.com/html5lib/html5lib-tests/commit/a9f44960a9fedf265093d22b2aa3c7ca123727b9).
17+
18+
## Updating
19+
20+
If there have been changes to the html5lib-tests repository, this test suite can be updated. In
21+
order to update:
22+
23+
1. Check out the latest version of git repository mentioned above.
24+
1. Copy the files listed above into this directory.
25+
1. Update the SHA mentioned in this README file with the new html5lib-tests SHA.
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
Tree Construction Tests
2+
=======================
3+
4+
Each file containing tree construction tests consists of any number of
5+
tests separated by two newlines (LF) and a single newline before the end
6+
of the file. For instance:
7+
8+
[TEST]LF
9+
LF
10+
[TEST]LF
11+
LF
12+
[TEST]LF
13+
14+
Where [TEST] is the following format:
15+
16+
Each test must begin with a string "\#data" followed by a newline (LF).
17+
All subsequent lines until a line that says "\#errors" are the test data
18+
and must be passed to the system being tested unchanged, except with the
19+
final newline (on the last line) removed.
20+
21+
Then there must be a line that says "\#errors". It must be followed by
22+
one line per parse error that a conformant checker would return. It
23+
doesn't matter what those lines are, although they can't be
24+
"\#new-errors", "\#document-fragment", "\#document", "\#script-off",
25+
"\#script-on", or empty, the only thing that matters is that there be
26+
the right number of parse errors.
27+
28+
Then there \*may\* be a line that says "\#new-errors", which works like
29+
the "\#errors" section adding more errors to the expected number of
30+
errors.
31+
32+
Then there \*may\* be a line that says "\#document-fragment", which must
33+
be followed by a newline (LF), followed by a string of characters that
34+
indicates the context element, followed by a newline (LF). If the string
35+
of characters starts with "svg ", the context element is in the SVG
36+
namespace and the substring after "svg " is the local name. If the
37+
string of characters starts with "math ", the context element is in the
38+
MathML namespace and the substring after "math " is the local name.
39+
Otherwise, the context element is in the HTML namespace and the string
40+
is the local name. If this line is present the "\#data" must be parsed
41+
using the HTML fragment parsing algorithm with the context element as
42+
context.
43+
44+
Then there \*may\* be a line that says "\#script-off" or
45+
"\#script-on". If a line that says "\#script-off" is present, the
46+
parser must set the scripting flag to disabled. If a line that says
47+
"\#script-on" is present, it must set it to enabled. Otherwise, the
48+
test should be run in both modes.
49+
50+
Then there must be a line that says "\#document", which must be followed
51+
by a dump of the tree of the parsed DOM. Each node must be represented
52+
by a single line. Each line must start with "| ", followed by two spaces
53+
per parent node that the node has before the root document node.
54+
55+
- Element nodes must be represented by a "`<`" then the *tag name
56+
string* "`>`", and all the attributes must be given, sorted
57+
lexicographically by UTF-16 code unit according to their *attribute
58+
name string*, on subsequent lines, as if they were children of the
59+
element node.
60+
- Attribute nodes must have the *attribute name string*, then an "="
61+
sign, then the attribute value in double quotes (").
62+
- Text nodes must be the string, in double quotes. Newlines aren't
63+
escaped.
64+
- Comments must be "`<`" then "`!-- `" then the data then "` -->`".
65+
- DOCTYPEs must be "`<!DOCTYPE `" then the name then if either of the
66+
system id or public id is non-empty a space, public id in
67+
double-quotes, another space an the system id in double-quotes, and
68+
then in any case "`>`".
69+
- Processing instructions must be "`<?`", then the target, then a
70+
space, then the data and then "`>`". (The HTML parser cannot emit
71+
processing instructions, but scripts can, and the WebVTT to DOM
72+
rules can emit them.)
73+
- Template contents are represented by the string "content" with the
74+
children below it.
75+
76+
The *tag name string* is the local name prefixed by a namespace
77+
designator. For the HTML namespace, the namespace designator is the
78+
empty string, i.e. there's no prefix. For the SVG namespace, the
79+
namespace designator is "svg ". For the MathML namespace, the namespace
80+
designator is "math ".
81+
82+
The *attribute name string* is the local name prefixed by a namespace
83+
designator. For no namespace, the namespace designator is the empty
84+
string, i.e. there's no prefix. For the XLink namespace, the namespace
85+
designator is "xlink ". For the XML namespace, the namespace designator
86+
is "xml ". For the XMLNS namespace, the namespace designator is "xmlns
87+
". Note the difference between "xlink:href" which is an attribute in no
88+
namespace with the local name "xlink:href" and "xlink href" which is an
89+
attribute in the xlink namespace with the local name "href".
90+
91+
If there is also a "\#document-fragment" the bit following "\#document"
92+
must be a representation of the HTML fragment serialization for the
93+
context element given by "\#document-fragment".
94+
95+
For example:
96+
97+
#data
98+
<p>One<p>Two
99+
#errors
100+
3: Missing document type declaration
101+
#document
102+
| <html>
103+
| <head>
104+
| <body>
105+
| <p>
106+
| "One"
107+
| <p>
108+
| "Two"

0 commit comments

Comments
 (0)