hast utility to transform to nlcst.
Note: You probably want to use
rehype-retext
.
npm:
npm install hast-util-to-nlcst
Say we have the following example.html
:
<article>
Implicit.
<h1>Explicit: <strong>foo</strong>s-ball</h1>
<pre><code class="language-foo">bar()</code></pre>
</article>
…and next to it, index.js
:
var rehype = require('rehype')
var vfile = require('to-vfile')
var English = require('parse-english')
var inspect = require('unist-util-inspect')
var toNlcst = require('hast-util-to-nlcst')
var file = vfile.readSync('example.html')
var tree = rehype().parse(file)
console.log(inspect(toNlcst(tree, file, English)))
Which, when running, yields:
RootNode[2] (1:1-6:1, 0-134)
├─ ParagraphNode[3] (1:10-3:3, 9-24)
│ ├─ WhiteSpaceNode: "\n " (1:10-2:3, 9-12)
│ ├─ SentenceNode[2] (2:3-2:12, 12-21)
│ │ ├─ WordNode[1] (2:3-2:11, 12-20)
│ │ │ └─ TextNode: "Implicit" (2:3-2:11, 12-20)
│ │ └─ PunctuationNode: "." (2:11-2:12, 20-21)
│ └─ WhiteSpaceNode: "\n " (2:12-3:3, 21-24)
└─ ParagraphNode[1] (3:7-3:43, 28-64)
└─ SentenceNode[4] (3:7-3:43, 28-64)
├─ WordNode[1] (3:7-3:15, 28-36)
│ └─ TextNode: "Explicit" (3:7-3:15, 28-36)
├─ PunctuationNode: ":" (3:15-3:16, 36-37)
├─ WhiteSpaceNode: " " (3:16-3:17, 37-38)
└─ WordNode[4] (3:25-3:43, 46-64)
├─ TextNode: "foo" (3:25-3:28, 46-49)
├─ TextNode: "s" (3:37-3:38, 58-59)
├─ PunctuationNode: "-" (3:38-3:39, 59-60)
└─ TextNode: "ball" (3:39-3:43, 60-64)
Transform the given hast tree to nlcst.
tree
(HastNode
) — Tree with positional info (HastNode
)file
(VFile
) — Virtual fileparser
(Function
) — nlcst parser, such asparse-english
,parse-dutch
, orparse-latin
The algorithm supports implicit and explicit paragraphs, such as:
<article>
An implicit paragraph.
<h1>An explicit paragraph.</h1>
</article>
Overlapping paragraphs are also supported (see the tests or the HTML spec for more info).
Some elements are ignored and their content will not be present in
nlcst: <script>
, <style>
, <svg>
, <math>
, <del>
.
To ignore other elements, add a data-nlcst
attribute with a value of ignore
:
<p>This is <span data-nlcst="ignore">hidden</span>.</p>
<p data-nlcst="ignore">Completely hidden.</p>
<code>
elements are mapped to Source
nodes in nlcst.
To mark other elements as source, add a data-nlcst
attribute with a value
of source
:
<p>This is <span data-nlcst="source">marked as source</span>.</p>
<p data-nlcst="source">Completely marked.</p>
See contributing.md
in syntax-tree/.github
for ways to get
started.
See support.md
for ways to get help.
This project has a Code of Conduct. By interacting with this repository, organisation, or community you agree to abide by its terms.