From 96d70f33bbfdfb8ef807fb29c274f1dae442f011 Mon Sep 17 00:00:00 2001 From: BurdetteLamar Date: Sat, 31 Jul 2021 16:12:05 -0500 Subject: [PATCH] Tutorial --- doc/rexml/tutorial.rdoc | 1363 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 1363 insertions(+) create mode 100644 doc/rexml/tutorial.rdoc diff --git a/doc/rexml/tutorial.rdoc b/doc/rexml/tutorial.rdoc new file mode 100644 index 00000000..0bc3b874 --- /dev/null +++ b/doc/rexml/tutorial.rdoc @@ -0,0 +1,1363 @@ += \REXML Tutorial + +== Why \REXML? + +- Ruby's \REXML library is part of the Ruby distribution, + so using it requires no gem installations. +- \REXML is fully maintained. +- \REXML is mature, having been in use for long years. + +== To Include, or Not to Include? + +REXML is a module. +To use it, you must require it: + + require 'rexml' # => true + +If you do not also include it, you must fully qualify references to REXML: + + REXML::Document # => REXML::Document + +If you also include the module, you may optionally omit REXML::: + + include REXML + Document # => REXML::Document + REXML::Document # => REXML::Document + +== Preliminaries + +All examples here assume that the following code has been executed: + + require 'rexml' + include REXML + +The source XML for many examples here is from file +{books.xml}[https://www.w3schools.com/xml/books.xml] at w3schools.com. +You may find it convenient to open that page in a new tab +(Ctrl-click in some browsers). + +Note that your browser may display the XML with modified whitespace +and without the XML declaration, which in this case is: + + + +For convenience, we capture the XML into a string variable: + + require 'open-uri' + source_string = URI.open('https://www.w3schools.com/xml/books.xml').read + +And into a file: + + File.write('source_file.xml', source_string) + +Throughout these examples, variable +doc+ will hold only the document +derived from these sources: + + doc = Document.new(source_string) + +== Parsing \XML \Source + +=== Parsing a Document + +Use method REXML::Document::new to parse XML source. + +The source may be a string: + + doc = Document.new(source_string) + +Or an \IO stream: + + doc = File.open('source_file.xml', 'r') do |io| + Document.new(io) + end + +Method URI.open returns a StringIO object, +so the source can be from a web page: + + require 'open-uri' + io = URI.open("https://www.w3schools.com/xml/books.xml") + io.class # => StringIO + doc = Document.new(io) + +For any of these sources, the returned object is an REXML::Document: + + doc # => ... + doc.class # => REXML::Document + +Note: 'UNDEFINED' is the "name" displayed for a document, +even though doc.name returns an empty string "". + +A parsed document may produce \REXML objects of many classes, +but the two that are likely to be of greatest interest are +REXML::Document and REXML::Element. +These two classes are covered in great detail in this tutorial. + +=== Context (Parsing Options) + +The context for parsing a document is a hash that influences +the way the XML is read and stored. + +The context entries are: + +- +:respect_whitespace+: controls treatment of whitespace. +- +:compress_whitespace+: determines whether whitespace is compressed. +- +:ignore_whitespace_nodes+: determines whether whitespace-only nodes are to be ignored. +- +:raw+: controls treatment of special characters and entities. + +See {Element Context}[../context_rdoc.html]. + +== Exploring the Document + +An REXML::Document object represents an XML document. + +The object inherits from its ancestor classes: + +- REXML::Child (includes module REXML::Node) + - REXML::Parent (includes module {Enumerable}[rdoc-ref:Enumerable]). + - REXML::Element (includes module REXML::Namespace). + - REXML::Document + +This section covers only those properties and methods that are unique to a document +(that is, not inherited or included). + +=== Document Properties + +A document has several properties (other than its children); + +- Document type. +- Node type. +- Name. +- Document. +- XPath + +[Document Type] + + A document may have a document type: + + my_xml = '' + my_doc = Document.new(my_xml) + doc_type = my_doc.doctype + doc_type.class # => REXML::DocType + doc_type.to_s # => "" + +[Node Type] + + A document also has a node type (always +:document+): + + doc.node_type # => :document + +[Name] + + A document has a name (always an empty string): + + doc.name # => "" + +[Document] + + \Method REXML::Document#document returns +self+: + + doc.document == doc # => true + + An object of a different class (\REXML::Element or \REXML::Child) + may have a document, which is the document to which the object belongs; + if so, that document will be an \REXML::Document object. + + doc.root.document.class # => REXML::Document + +[XPath] + + \method REXML::Element#xpath returns the string xpath to the element, + relative to its most distant ancestor: + + doc.root.class # => REXML::Element + doc.root.xpath # => "/bookstore" + doc.root.texts.first # => "\n\n" + doc.root.texts.first.xpath # => "/bookstore/text()" + + If there is no ancestor, returns the expanded name of the element: + + Element.new('foo').xpath # => "foo" + +=== Document Children + +A document may have children of these types: + +- XML declaration. +- Root element. +- Text. +- Processing instructions. +- Comments. +- CDATA. + +[XML Declaration] + + A document may an XML declaration, which is stored as an REXML::XMLDecl object: + + doc.xml_decl # => + doc.xml_decl.class # => REXML::XMLDecl + + Document.new('').xml_decl # => + + my_xml = '"' + my_doc = Document.new(my_xml) + xml_decl = my_doc.xml_decl + xml_decl.to_s # => "" + + The version, encoding, and stand-alone values may be retrieved separately: + + my_doc.version # => "1.0" + my_doc.encoding # => "UTF-8" + my_doc.stand_alone? # => "yes" + +[Root Element] + + A document may have a single element child, called the _root_ _element_, + which is stored as an REXML::Element object; + it may be retrieved with method +root+: + + doc.root # => ... + doc.root.class # => REXML::Element + + Document.new('').root # => nil + +[Text] + + A document may have text passages, each of which is stored + as an REXML::Text object: + + doc.texts.each {|t| p [t.class, t] } + + Output: + + [REXML::Text, "\n"] + +[Processing Instructions] + + A document may have processing instructions, which are stored + as REXML::Instruction objects: + + + + Output: + + [REXML::Instruction, ] + [REXML::Instruction, ] + +[Comments] + + A document may have comments, which are stored + as REXML::Comment objects: + + my_xml = <<-EOT + + + EOT + my_doc = Document.new(my_xml) + my_doc.comments.each {|c| p [c.class, c] } + + Output: + + [REXML::Comment, # ... , @string="foo">] + [REXML::Comment, # ... , @string="bar">] + +[CDATA] + + A document may have CDATA entries, which are stored + as REXML::CData objects: + + my_xml = <<-EOT + + + EOT + my_doc = Document.new(my_xml) + my_doc.cdatas.each {|cd| p [cd.class, cd] } + + Output: + + [REXML::CData, "foo"] + [REXML::CData, "bar"] + +The payload of a document is a tree of nodes, descending from the root element: + + doc.root.children.each do |child| + p [child, child.class] + end + +Output: + + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + +== Exploring an Element + +An REXML::Element object represents an XML element. + +The object inherits from its ancestor classes: + +- REXML::Child (includes module REXML::Node) + - REXML::Parent (includes module {Enumerable}[rdoc-ref:Enumerable]). + - REXML::Element (includes module REXML::Namespace). + +This section covers methods: + +- Defined in REXML::Element itself. +- Inherited from REXML::Parent and REXML::Child. +- Included from REXML::Node. + +=== Inside the Element + +[Brief String Representation] + + Use method REXML::Element#inspect to retrieve a brief string representation. + + doc.root.inspect # => " ... " + + The ellipsis (...) indicates that the element has children. + When there are no children, the ellipsis is omitted: + + Element.new('foo').inspect # => "" + + If the element has attributes, those are also included: + + doc.root.elements.first.inspect # => " ... " + +[Extended String Representation] + + Use inherited method REXML::Child.bytes to retrieve an extended + string representation. + + doc.root.bytes # => "\n\n\n Everyday Italian\n Giada De Laurentiis\n 2005\n 30.00\n\n\n\n Harry Potter\n J K. Rowling\n 2005\n 29.99\n\n\n\n XQuery Kick Start\n James McGovern\n Per Bothner\n Kurt Cagle\n James Linn\n Vaidyanathan Nagarajan\n 2003\n 49.99\n\n\n\n Learning XML\n Erik T. Ray\n 2003\n 39.95\n\n\n" + +[Node Type] + + Use method REXML::Element#node_type to retrieve the node type (always +:element+): + + doc.root.node_type # => :element + +[Raw Mode] + + Use method REXML::Element#raw to retrieve whether (+true+ or +nil+) + raw mode is set. + + doc.root.raw # => nil + +[Context] + + Use method REXML::Element#context to retrieve the context hash + (see {Element Context}[../context_rdoc.html]): + + doc.root.context # => {} + +=== Relationships + +An element may have: + +- Ancestors. +- Siblings. +- Children. + +==== Ancestors + +[Containing Document] + + Use method REXML::Element#document to retrieve the containing document, if any: + + ele = doc.root.elements.first # => ... + ele.document # => ... + ele = Element.new('foo') # => + ele.document # => nil + +[Root Element] + + Use method REXML::Element#root to retrieve the root element: + + ele = doc.root.elements.first # => ... + ele.root # => ... + ele = Element.new('foo') # => + ele.root # => + +[Root Node] + + Use method REXML::Element#root_node to retrieve the most distant ancestor, + which is the containing document, if any, otherwise the root element: + + ele = doc.root.elements.first # => ... + ele.root_node # => ... + ele = Element.new('foo') # => + ele.root_node # => + +[Parent] + + Use inherited method REXML::Child#parent to retrieve the parent + + ele = doc.root # => ... + ele.parent # => ... + ele = doc.root.elements.first # => ... + ele.parent # => ... + + Use included method REXML::Node#index_in_parent to retrieve the index + of the element among all of its parents children (not just the element children). + Note that while the index for doc.root.elements[n] is 1-based, + the returned index is 0-based. + + doc.root.children # => + # ["\n\n", + # ... , + # "\n\n", + # ... , + # "\n\n", + # ... , + # "\n\n", + # ... , + # "\n\n"] + ele = doc.root.elements[1] # => ... + ele.index_in_parent # => 2 + ele = doc.root.elements[2] # => ... + ele.index_in_parent# => 4 + +==== Siblings + +[Next Element] + + Use method REXML::Element#next_element to retrieve the first following + sibling that is itself an element (+nil+ if there is none): + + ele = doc.root.elements[1] + while ele do + p [ele.class, ele] + ele = ele.next_element + end + p ele + + Output: + + p ele + [REXML::Element, ... ] + [REXML::Element, ... ] + [REXML::Element, ... ] + [REXML::Element, ... ] + nil + +[Previous Element] + + Use method REXML::Element#previous_element to retrieve the first preceding + sibling that is itself an element (+nil+ if there is none): + + ele = doc.root.elements[4] + while ele do + p [ele.class, ele] + ele = ele.previous_element + end + p ele + + Output: + + [REXML::Element, ... ] + [REXML::Element, ... ] + [REXML::Element, ... ] + [REXML::Element, ... ] + nil + +[Next Node] + + Use included method REXML::Node.next_sibling_node + (or its alias next_sibling) to retrieve the first following node + regardless of its class: + + node = doc.root.children[0] + while node do + p [node.class, node] + node = node.next_sibling + end + p node + + Output: + + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + nil + +[Previous Node] + + Use included method REXML::Node.previous_sibling_node + (or its alias previous_sibling) to retrieve the first preceding node + regardless of its class: + + node = doc.root.children[-1] + while node do + p [node.class, node] + node = node.previous_sibling + end + p node + + Output: + + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + nil + +==== Children + +[Child Count] + + Use inherited method REXML::Parent.size to retrieve the count + of nodes (of all types) in the element: + + doc.root.size # => 9 + +[Child Nodes] + + Use inherited method REXML::Parent.children to retrieve an array + of the child nodes (of all types): + + doc.root.children # => + # ["\n\n", + # ... , + # "\n\n", + # ... , + # "\n\n", + # ... , + # "\n\n", + # ... , + # "\n\n"] + +[Child at Index] + + Use method REXML::Element#[] to retrieve the child at a given numerical index, + or +nil+ if there is no such child: + + doc.root[0] # => "\n\n" + doc.root[1] # => ... + doc.root[7] # => ... + doc.root[8] # => "\n\n" + + doc.root[-1] # => "\n\n" + doc.root[-2] # => ... + + doc.root[50] # => nil + +[Index of Child] + + Use method REXML::Element#index to retrieve the zero-based child index + of the given object, or #size - 1 if there is no such child: + + ele = doc.root # => ... + ele.index(ele[0]) # => 0 + ele.index(ele[1]) # => 1 + ele.index(ele[7]) # => 7 + ele.index(ele[8]) # => 8 + + ele.index(ele[-1]) # => 8 + ele.index(ele[-2]) # => 7 + + ele.index(ele[50]) # => 8 + +[Element Children] + + Use method REXML::.has_elements? to retrieve whether the element + has element children: + + doc.root.has_elements? # => true + REXML::Element.new('foo').has_elements? # => false + + Use method REXML::Element#elements to retrieve the REXML::Elements object + containing the element children: + + eles = doc.root.elements + eles # => # ... > + eles.size # => 4 + eles.each {|e| p [e.class], e } + + Output: + + [ ... , + ... , + ... , + ... + ] + +Note that while in this example, all the element children of the root element are +elements of the same name, 'book', that is not true of all documents; +a root element (or any other element) may have any mixture of child elements. + +[CDATA Children] + + Use method REXML::Element#cdatas to retrieve a frozen array of CDATA children: + + my_xml = <<-EOT + + + + + EOT + my_doc = REXML::Document.new(my_xml) + cdatas my_doc.root.cdatas + cdatas.frozen? # => true + cdatas.map {|cd| cd.class } # => [REXML::CData, REXML::CData] + +[Comment Children] + + Use method REXML::Element#comments to retrieve a frozen array of comment children: + + my_xml = <<-EOT + + + + + EOT + my_doc = REXML::Document.new(my_xml) + comments = my_doc.root.comments + comments.frozen? # => true + comments.map {|c| c.class } # => [REXML::Comment, REXML::Comment] + comments.map {|c| c.to_s } # => ["foo", "bar"] + +[Processing Instruction Children] + + Use method REXML::Element#instructions to retrieve a frozen array + of processing instruction children: + + my_xml = <<-EOT + + + + + EOT + my_doc = REXML::Document.new(my_xml) + instrs = my_doc.root.instructions + instrs.frozen? # => true + instrs.map {|i| i.class } # => [REXML::Instruction, REXML::Instruction] + instrs.map {|i| i.to_s } # => ["", ""] + +[Text Children] + + Use method REXML::Element#has_text? to retrieve whether the element + has text children: + + doc.root.has_text? # => true + REXML::Element.new('foo').has_text? # => false + + Use method REXML::Element#texts to retrieve a frozen array of text children: + + my_xml = 'textmore' + my_doc = REXML::Document.new(my_xml) + texts = my_doc.root.texts + texts.frozen? # => true + texts.map {|t| t.class } # => [REXML::Text, REXML::Text] + texts.map {|t| t.to_s } # => ["text", "more"] + +[Parenthood] + + Use inherited method REXML::Parent.parent? to retrieve whether the element is a parent; + always returns +true+; only REXML::Child#parent returns +false+. + + doc.root.parent? # => true + +=== Element Attributes + +Use method REXML::Element#has_attributes? to return whether the element +has attributes: + + ele = doc.root # => ... + ele.has_attributes? # => false + ele = ele.elements.first # => ... + ele.has_attributes? # => true + +Use method REXML::Element#attributes to return the hash +containing the attributes for the element. +Each hash key is a string attribute name; +each hash value is an REXML::Attribute object. + + ele = doc.root # => ... + attrs = ele.attributes # => {} + + ele = ele.elements.first # => ... + attrs = ele.attributes # => {"category"=>category='cooking'} + attrs.size # => 1 + attr_name = attrs.keys.first # => "category" + attr_name.class # => String + attr_value = attrs.values.first # => category='cooking' + attr_value.class # => REXML::Attribute + +Use method REXML::Element#[] to retrieve the string value for a given attribute, +which may be given as either a string or a symbol: + + ele = doc.root.elements.first # => ... + attr_value = ele['category'] # => "cooking" + attr_value.class # => String + ele['nosuch'] # => nil + +Use method REXML::Element#attribute to retrieve the value of a named attribute: + + my_xml = "" + my_doc = REXML::Document.new(my_xml) + my_doc.root.attribute("x") # => x='x' + my_doc.root.attribute("x", "a") # => a:x='a:x' + +== Whitespace + +Use method REXML::Element#ignore_whitespace_nodes to determine whether +whitespace nodes were ignored when the XML was parsed; +returns +true+ if so, +nil+ otherwise. + +Use method REXML::Element#whitespace to determine whether whitespace +is respected for the element; returns +true+ if so, +false+ otherwise. + +== Namespaces + +Use method REXML::Element#namespace to retrieve the string namespace URI +for the element, which may derive from one of its ancestors: + + xml_string = <<-EOT + + + + + + + EOT + d = Document.new(xml_string) + b = d.elements['//b'] + b.namespace # => "1" + b.namespace('y') # => "2" + b.namespace('nosuch') # => nil + +Use method REXML::Element#namespaces to retrieve a hash of all defined namespaces +in the element and its ancestors: + + xml_string = <<-EOT + + + + + + + EOT + d = Document.new(xml_string) + d.elements['//a'].namespaces # => {"x"=>"1", "y"=>"2"} + d.elements['//b'].namespaces # => {"x"=>"1", "y"=>"2"} + d.elements['//c'].namespaces # => {"x"=>"1", "y"=>"2", "z"=>"3"} + +Use method REXML::Element#prefixes to retrieve an array of the string prefixes (names) +of all defined namespaces in the element and its ancestors: + + xml_string = <<-EOT + + + + + + + EOT + d = Document.new(xml_string, {compress_whitespace: :all}) + d.elements['//a'].prefixes # => ["x", "y"] + d.elements['//b'].prefixes # => ["x", "y"] + d.elements['//c'].prefixes # => ["x", "y", "z"] + +== Traversing + +You can use certain methods to traverse children of the element. +Each child that meets given criteria is yielded to the given block. + +[Traverse All Children] + + Use inherited method REXML::Parent#each (or its alias #each_child) to traverse + all children of the element: + + doc.root.each {|child| p [child.class, child] } + + Output: + + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + [REXML::Element, ... ] + [REXML::Text, "\n\n"] + +[Traverse Element Children] + + Use method REXML::Element#each_element to traverse only the element children + of the element: + + doc.root.each_element {|e| p [e.class, e] } + + Output: + + [REXML::Element, ... ] + [REXML::Element, ... ] + [REXML::Element, ... ] + [REXML::Element, ... ] + +[Traverse Element Children with Attribute] + + Use method REXML::Element#each_element_with_attribute with the single argument + +attr_name+ to traverse each element child that has the given attribute: + + my_doc = Document.new '' + my_doc.root.each_element_with_attribute('id') {|e| p [e.class, e] } + + Output: + + [REXML::Element, ] + [REXML::Element, ] + [REXML::Element, ] + + Use the same method with a second argument +value+ to traverse + each element child element that has the given attribute and value: + + my_doc.root.each_element_with_attribute('id', '1') {|e| p [e.class, e] } + + Output: + + [REXML::Element, ] + [REXML::Element, ] + + Use the same method with a third argument +max+ to traverse + no more than the given number of element children: + + my_doc.root.each_element_with_attribute('id', '1', 1) {|e| p [e.class, e] } + + Output: + + [REXML::Element, ] + + Use the same method with a fourth argument +xpath+ to traverse + only those element children that match the given xpath: + + my_doc.root.each_element_with_attribute('id', '1', 2, '//d') {|e| p [e.class, e] } + + Output: + + [REXML::Element, ] + +[Traverse Element Children with Text] + + Use method REXML::Element#each_element_with_text with no arguments + to traverse those element children that have text: + + my_doc = Document.new 'bbd' + my_doc.root.each_element_with_text {|e| p [e.class, e] } + + Output: + + [REXML::Element, ... ] + [REXML::Element, ... ] + [REXML::Element, ... ] + + Use the same method with the single argument +text+ to traverse + those element children that have exactly that text: + + my_doc.root.each_element_with_text('b') {|e| p [e.class, e] } + + Output: + + [REXML::Element, ... ] + [REXML::Element, ... ] + + Use the same method with additional second argument +max+ to traverse + no more than the given number of element children: + + my_doc.root.each_element_with_text('b', 1) {|e| p [e.class, e] } + + Output: + + [REXML::Element, ... ] + + Use the same method with additional third argument +xpath+ to traverse + only those element children that also match the given xpath: + + my_doc.root.each_element_with_text('b', 2, '//c') {|e| p [e.class, e] } + + Output: + + [REXML::Element, ... ] + +[Traverse Element Children's Indexes] + + Use inherited method REXML::Parent#each_index to traverse all children's indexes + (not just those of element children): + + doc.root.each_index {|i| print i } + + Output: + + 012345678 + +[Traverse Children Recursively] + + Use included method REXML::Node#each_recursive to traverse all children recursively: + + doc.root.each_recursive {|child| p [child.class, child] } + + Output: + + [REXML::Element, ... ] + [REXML::Element, ... </>] + [REXML::Element, <author> ... </>] + [REXML::Element, <year> ... </>] + [REXML::Element, <price> ... </>] + [REXML::Element, <book category='children'> ... </>] + [REXML::Element, <title lang='en'> ... </>] + [REXML::Element, <author> ... </>] + [REXML::Element, <year> ... </>] + [REXML::Element, <price> ... </>] + [REXML::Element, <book category='web'> ... </>] + [REXML::Element, <title lang='en'> ... </>] + [REXML::Element, <author> ... </>] + [REXML::Element, <author> ... </>] + [REXML::Element, <author> ... </>] + [REXML::Element, <author> ... </>] + [REXML::Element, <author> ... </>] + [REXML::Element, <year> ... </>] + [REXML::Element, <price> ... </>] + [REXML::Element, <book category='web' cover='paperback'> ... </>] + [REXML::Element, <title lang='en'> ... </>] + [REXML::Element, <author> ... </>] + [REXML::Element, <year> ... </>] + [REXML::Element, <price> ... </>] + +== Searching + +You can use certain methods to search among the descendants of an element. + +Use method REXML::Element#get_elements to retrieve all element children of the element +that match the given +xpath+: + + xml_string = <<-EOT + <root> + <a level='1'> + <a level='2'/> + </a> + </root> + EOT + d = Document.new(xml_string) + d.root.get_elements('//a') # => [<a level='1'> ... </>, <a level='2'/>] + +Use method REXML::Element#get_text with no argument to retrieve the first text node +in the first child: + + my_doc = Document.new "<p>some text <b>this is bold!</b> more text</p>" + text_node = my_doc.root.get_text + text_node.class # => REXML::Text + text_node.to_s # => "some text " + +Use the same method with argument +xpath+ to retrieve the first text node +in the first child that matches the xpath: + + my_doc.root.get_text(1) # => "this is bold!" + +Use method REXML::Element#text with no argument to retrieve the text +from the first text node in the first child: + + my_doc = Document.new "<p>some text <b>this is bold!</b> more text</p>" + text_node = my_doc.root.text + text_node.class # => String + text_node # => "some text " + +Use the same method with argument +xpath+ to retrieve the text from the first text node +in the first child that matches the xpath: + + my_doc.root.text(1) # => "this is bold!" + +Use included method REXML::Node#find_first_recursive +to retrieve the first descendant element +for which the given block returns a truthy value, or +nil+ if none: + + doc.root.find_first_recursive do |ele| + ele.name == 'price' + end # => <price> ... </> + doc.root.find_first_recursive do |ele| + ele.name == 'nosuch' + end # => nil + +== Editing + +=== Editing a Document + +[Creating a Document] + + Create a new document with method REXML::Document::new: + + doc = Document.new(source_string) + empty_doc = REXML::Document.new + +[Adding to the Document] + + Add an XML declaration with method REXML::Document#add + and an argument of type REXML::XMLDecl: + + my_doc = Document.new + my_doc.xml_decl.to_s # => "" + my_doc.add(XMLDecl.new('2.0')) + my_doc.xml_decl.to_s # => "<?xml version='2.0'?>" + + Add a document type with method REXML::Document#add + and an argument of type REXML::DocType: + + my_doc = Document.new + my_doc.doctype.to_s # => "" + my_doc.add(DocType.new('foo')) + my_doc.doctype.to_s # => "<!DOCTYPE foo>" + + Add a node of any other REXML type with method REXML::Document#add and an argument + that is not of type REXML::XMLDecl or REXML::DocType: + + my_doc = Document.new + my_doc.add(Element.new('foo')) + my_doc.to_s # => "<foo/>" + + Add an existing element as the root element with method REXML::Document#add_element: + + ele = Element.new('foo') + my_doc = Document.new + my_doc.add_element(ele) + my_doc.root # => <foo/> + + Create and add an element as the root element with method REXML::Document#add_element: + + my_doc = Document.new + my_doc.add_element('foo') + my_doc.root # => <foo/> + +=== Editing an Element + +==== Creating an Element + +Create a new element with method REXML::Element::new: + + ele = Element.new('foo') # => <foo/> + +==== Setting Element Properties + +Set the context for an element with method REXML::Element#context= +(see {Element Context}[../context_rdoc.html]): + + ele.context # => nil + ele.context = {ignore_whitespace_nodes: :all} + ele.context # => {:ignore_whitespace_nodes=>:all} + +Set the parent for an element with inherited method REXML::Child#parent= + + ele.parent # => nil + ele.parent = Element.new('bar') + ele.parent # => <bar/> + +Set the text for an element with method REXML::Element#text=: + + ele.text # => nil + ele.text = 'bar' + ele.text # => "bar" + +==== Adding to an Element + +Add a node as the last child with inherited method REXML::Parent#add (or its alias #push): + + ele = Element.new('foo') # => <foo/> + ele.push(Text.new('bar')) + ele.push(Element.new('baz')) + ele.children # => ["bar", <baz/>] + +Add a node as the first child with inherited method REXML::Parent#unshift: + + ele = Element.new('foo') # => <foo/> + ele.unshift(Element.new('bar')) + ele.unshift(Text.new('baz')) + ele.children # => ["bar", <baz/>] + +Add an element as the last child with method REXML::Element#add_element: + + ele = Element.new('foo') # => <foo/> + ele.add_element('bar') + ele.add_element(Element.new('baz')) + ele.children # => [<bar/>, <baz/>] + +Add a text node as the last child with method REXML::Element#add_text: + + ele = Element.new('foo') # => <foo/> + ele.add_text('bar') + ele.add_text(Text.new('baz')) + ele.children # => ["bar", "baz"] + +Insert a node before a given node with method REXML::Parent#insert_before: + + ele = Element.new('foo') # => <foo/> + ele.add_text('bar') + ele.add_text(Text.new('baz')) + ele.children # => ["bar", "baz"] + target = ele[1] # => "baz" + ele.insert_before(target, Text.new('bat')) + ele.children # => ["bar", "bat", "baz"] + +Insert a node after a given node with method REXML::Parent#insert_after: + + ele = Element.new('foo') # => <foo/> + ele.add_text('bar') + ele.add_text(Text.new('baz')) + ele.children # => ["bar", "baz"] + target = ele[0] # => "bar" + ele.insert_after(target, Text.new('bat')) + ele.children # => ["bar", "bat", "baz"] + +Add an attribute with method REXML::Element#add_attribute: + + ele = Element.new('foo') # => <foo/> + ele.add_attribute('bar', 'baz') + ele.add_attribute(Attribute.new('bat', 'bam')) + ele.attributes # => {"bar"=>bar='baz', "bat"=>bat='bam'} + +Add multiple attributes with method REXML::Element#add_attributes: + + ele = Element.new('foo') # => <foo/> + ele.add_attributes({'bar' => 'baz', 'bat' => 'bam'}) + ele.add_attributes([['ban', 'bap'], ['bah', 'bad']]) + ele.attributes # => {"bar"=>bar='baz', "bat"=>bat='bam', "ban"=>ban='bap', "bah"=>bah='bad'} + +Add a namespace with method REXML::Element#add_namespace: + + ele = Element.new('foo') # => <foo/> + ele.add_namespace('bar') + ele.add_namespace('baz', 'bat') + ele.namespaces # => {"xmlns"=>"bar", "baz"=>"bat"} + +==== Deleting from an Element + +Delete a specific child object with inherited method REXML::Parent#delete: + + ele = Element.new('foo') # => <foo/> + ele.add_element('bar') + ele.add_text('baz') + ele.children # => [<bar/>, "baz"] + target = ele[1] # => "baz" + ele.delete(target) # => "baz" + ele.children # => [<bar/>] + target = ele[0] # => <baz/> + ele.delete(target) # => <baz/> + ele.children # => [] + +Delete a child at a specific index with inherited method REXML::Parent#delete_at: + + ele = Element.new('foo') # => <foo/> + ele.add_element('bar') + ele.add_text('baz') + ele.children # => [<bar/>, "baz"] + ele.delete_at(1) + ele.children # => [<bar/>] + ele.delete_at(0) + ele.children # => [] + +Delete all children meeting a specified criterion with inherited method +REXML::Parent#delete_if: + + ele = Element.new('foo') # => <foo/> + ele.add_element('bar') + ele.add_text('baz') + ele.add_element('bat') + ele.add_text('bam') + ele.children # => [<bar/>, "baz", <bat/>, "bam"] + ele.delete_if {|child| child.instance_of?(Text) } + ele.children # => [<bar/>, <bat/>] + +Delete an element at a specific 1-based index with method REXML::Element#delete_element: + + ele = Element.new('foo') # => <foo/> + ele.add_element('bar') + ele.add_text('baz') + ele.add_element('bat') + ele.add_text('bam') + ele.children # => [<bar/>, "baz", <bat/>, "bam"] + ele.delete_element(2) # => <bat/> + ele.children # => [<bar/>, "baz", "bam"] + ele.delete_element(1) # => <bar/> + ele.children # => ["baz", "bam"] + +Delete a specific element with the same method: + + ele = Element.new('foo') # => <foo/> + ele.add_element('bar') + ele.add_text('baz') + ele.add_element('bat') + ele.add_text('bam') + ele.children # => [<bar/>, "baz", <bat/>, "bam"] + target = ele.elements[2] # => <bat/> + ele.delete_element(target) # => <bat/> + ele.children # => [<bar/>, "baz", "bam"] + +Delete an element matching an xpath using the same method: + + ele = Element.new('foo') # => <foo/> + ele.add_element('bar') + ele.add_text('baz') + ele.add_element('bat') + ele.add_text('bam') + ele.children # => [<bar/>, "baz", <bat/>, "bam"] + ele.delete_element('./bat') # => <bat/> + ele.children # => [<bar/>, "baz", "bam"] + ele.delete_element('./bar') # => <bar/> + ele.children # => ["baz", "bam"] + +Delete an attribute by name with method REXML::Element#delete_attribute: + + ele = Element.new('foo') # => <foo/> + ele.add_attributes({'bar' => 'baz', 'bam' => 'bat'}) + ele.attributes # => {"bar"=>bar='baz', "bam"=>bam='bat'} + ele.delete_attribute('bam') + ele.attributes # => {"bar"=>bar='baz'} + +Delete a namespace with method REXML::delete_namespace: + + ele = Element.new('foo') # => <foo/> + ele.add_namespace('bar') + ele.add_namespace('baz', 'bat') + ele.namespaces # => {"xmlns"=>"bar", "baz"=>"bat"} + ele.delete_namespace('xmlns') + ele.namespaces # => {} # => {"baz"=>"bat"} + ele.delete_namespace('baz') + ele.namespaces # => {} # => {} + +Remove an element from its parent with inherited method REXML::Child#remove: + + ele = Element.new('foo') # => <foo/> + parent = Element.new('bar') # => <bar/> + parent.add_element(ele) # => <foo/> + parent.children.size # => 1 + ele.remove # => <foo/> + parent.children.size # => 0 + +==== Replacing Nodes + +Replace the node at a given 0-based index with inherited method REXML::Parent#[]=: + + ele = Element.new('foo') # => <foo/> + ele.add_element('bar') + ele.add_text('baz') + ele.add_element('bat') + ele.add_text('bam') + ele.children # => [<bar/>, "baz", <bat/>, "bam"] + ele[2] = Text.new('bad') # => "bad" + ele.children # => [<bar/>, "baz", "bad", "bam"] + +Replace a given node with another node with inherited method REXML::Parent#replace_child: + + ele = Element.new('foo') # => <foo/> + ele.add_element('bar') + ele.add_text('baz') + ele.add_element('bat') + ele.add_text('bam') + ele.children # => [<bar/>, "baz", <bat/>, "bam"] + target = ele[2] # => <bat/> + ele.replace_child(target, Text.new('bah')) + ele.children # => [<bar/>, "baz", "bah", "bam"] + +Replace +self+ with a given node with inherited method REXML::Child#replace_with: + + ele = Element.new('foo') # => <foo/> + ele.add_element('bar') + ele.add_text('baz') + ele.add_element('bat') + ele.add_text('bam') + ele.children # => [<bar/>, "baz", <bat/>, "bam"] + target = ele[2] # => <bat/> + target.replace_with(Text.new('bah')) + ele.children # => [<bar/>, "baz", "bah", "bam"] + +=== Cloning + +Create a shallow clone of an element with method REXML::Element#clone. +The clone contains the name and attributes, but not the parent or children: + + ele = Element.new('foo') + ele.add_attributes({'bar' => 0, 'baz' => 1}) + ele.clone # => <foo bar='0' baz='1'/> + +Create a shallow clone of a document with method REXML::Document#clone. +The XML declaration is copied; the document type and root element are not cloned: + + my_xml = '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo><root/>' + my_doc = Document.new(my_xml) + clone_doc = my_doc.clone + + my_doc.xml_decl # => <?xml ... ?> + clone_doc.xml_decl # => <?xml ... ?> + + my_doc.doctype.to_s # => "<?xml version='1.0' encoding='UTF-8'?>" + clone_doc.doctype.to_s # => "" + + my_doc.root # => <root/> + clone_doc.root # => nil + +Create a deep clone of an element with inherited method REXML::Parent#deep_clone. +All nodes and attributes are copied: + + doc.to_s.size # => 825 + clone = doc.deep_clone + clone.to_s.size # => 825 + +== Writing the Document + +Write a document to an \IO stream (defaults to <tt>$stdout</tt>) +with method REXML::Document#write: + + doc.write + +Output: + + <?xml version='1.0' encoding='UTF-8'?> + <bookstore> + + <book category='cooking'> + <title lang='en'>Everyday Italian + Giada De Laurentiis + 2005 + 30.00 + + + + Harry Potter + J K. Rowling + 2005 + 29.99 + + + + XQuery Kick Start + James McGovern + Per Bothner + Kurt Cagle + James Linn + Vaidyanathan Nagarajan + 2003 + 49.99 + + + + Learning XML + Erik T. Ray + 2003 + 39.95 + + +