Install
sudo gem install nokogiri
Contribute
github.com/tenderlove/nokogiri

An HTML, XML, SAX, & Reader parser with the ability to search documents via XPath or CSS3 selectors… and much more

Nokogiri

CHANGELOG.rdoc

1.5.0 beta1 / 2010/05/22

  • Notes

    • JRuby support is provided by a new pure-java backend.

  • Deprecations

    • Ruby 1.8.6 is deprecated. Nokogiri will install, but official support is ended.

    • LibXML 2.6.16 and earlier are deprecated. Nokogiri will refuse to install.

    • FFI support is removed.

1.4.3 / UNRELEASED

  • New Features

    • XML::Reader#empty_element? returns true for empty elements. #262

    • Node#remove_namespaces! now removes namespace declarations as well. #294

  • Bugfixes

    • XML::NodeSet#{include?,delete,push} accept an XML::Namespace

    • XML::Document#parse added for parsing in the context of a document

    • XML::DocumentFragment#inner_html= works with contextual parsing! #298, #281

    • Reparenting text nodes is safe, even when the operation frees adjacent merged nodes. #283

1.4.2 / 2010/05/22

  • New Features

    • XML::Node#parse will parse XML or HTML fragments with respect to the context node.

    • XML::Node#namespaces returns all namespaces defined in the node and all ancestor nodes (previously did not return ancestors’ namespace definitions).

    • Added Enumerable to XML::Node

    • XML::Document#create_entity will create new EntityDecl objects. GH #174

    • JRuby FFI implementation no longer uses ObjectSpace._id2ref, instead using Charles Nutter’s rocking Weakling gem.

    • Node#fragment? indicates whether a node is a DocumentFragment.

  • Bugfixes

    • XML::NodeSet is now always decorated (if the document has decorators). GH #198

    • XML::NodeSet#slice gracefully handles offset+length larger than the set length. GH #200

    • XML::Node#content= safely unlinks previous content. GH #203

    • XML::Node#namespace= takes nil as a parameter

    • XML::Node#xpath returns things other than NodeSet objects. GH #208

    • XSLT::StyleSheet#transform accepts hashes for parameters. GH #223

    • Psuedo selectors inside not() work. GH #205

    • XML::Builder doesn’t break when nodes are unlinked. Thanks to vihai! GH #228

    • Encoding can be forced on the SAX parser. Thanks Eugene Pimenov! GH #204

    • XML::DocumentFragment uses XML::Node#parse to determine children.

    • Fixed a memory leak in xml reader. Thanks sdor! GH #244

    • Node#replace returns the new child node as claimed in the RDoc. Previously returned self.

  • Notes

    • The Windows gems now bundle DLLs for libxml 2.7.6 and libxslt 1.1.26. Prior to this release, libxml 2.7.3 and libxslt 1.1.24 were bundled.

1.4.1 / 2009/12/10

  • New Features

    • Added Nokogiri::LIBXML_ICONV_ENABLED

    • Alias Node#[] to Node#attr

    • XML::Node#next_element added

    • XML::Node#> added for searching a nodes immediate children

    • XML::NodeSet#reverse added

    • Added fragment support to Node#add_child, Node#add_next_sibling, Node#add_previous_sibling, and Node#replace.

    • XML::Node#previous_element implemented

    • Rubinius support

    • Ths CSS selector engine now supports :has()

    • XML::NodeSet#filter() was added

    • XML::Node.next= and .previous= are aliases for add_next_sibling and add_previous_sibling. GH #183

  • Bugfixes

    • XML fragments with namespaces do not raise an exception (regression in 1.4.0)

    • Node#matches? works in nodes contained by a DocumentFragment. GH #158

    • Document should not define add_namespace() method. GH #169

    • XPath queries returning namespace declarations do not segfault.

    • Node#replace works with nodes from different documents. GH #162

    • Adding XML::Document#collect_namespaces

    • Fixed bugs in the SOAP4R adapter

    • Fixed bug in XML::Node#next_element for certain edge cases

    • Fixed load path issue with JRuby under Windows. GH #160.

    • XSLT#apply_to will honor the “output method”. Thanks richardlehane!

    • Fragments containing leading text nodes with newlines now parse properly. GH #178.

1.4.0 / 2009/10/30

  • Happy Birthday!

  • New Features

    • Node#at_xpath returns the first element of the NodeSet matching the XPath expression.

    • Node#at_css returns the first element of the NodeSet matching the CSS selector.

    • NodeSet#| for unions GH #119 (Thanks Serabe!)

    • NodeSet#inspect makes prettier output

    • Node#inspect implemented for more rubyish document inspecting

    • Added XML::DTD#external_id

    • Added XML::DTD#system_id

    • Added XML::ElementContent for DTD Element content validity

    • Added XML::Node#external_subset

    • Added XML::Node#create_external_subset

    • Added XML::Node#create_internal_subset

    • XML Builder can append raw strings (GH #141, patch from dudleyf)

    • XML::SAX::ParserContext added

    • XML::Document#remove_namespaces! for the namespace-impaired

  • Bugfixes

    • returns nil when HTML documents do not declare a meta encoding tag. GH #115

    • Uses RbConfig::CONFIG[‘host_os’] to adjust ENV[‘PATH’] GH #113

    • NodeSet#search is more efficient GH #119 (Thanks Serabe!)

    • NodeSet#xpath handles custom xpath functions

    • Fixing a SEGV when XML::Reader gets attributes for current node

    • Node#inner_html takes the same arguments as Node#to_html GH #117

    • DocumentFragment#css delegates to it’s child nodes GH #123

    • NodeSet#[] works with slices larger than NodeSet#length GH #131

    • Reparented nodes maintain their namespace. GH #134

    • Fixed SEGV when adding an XML::Document to NodeSet

    • XML::SyntaxError can be duplicated. GH #148

  • Deprecations

    • Hpricot compatibility layer removed

1.3.3 / 2009/07/26

  • New Features

    • NodeSet#children returns all children of all nodes

  • Bugfixes

    • Override libxml-ruby’s global error handler

    • ParseOption#strict fixed

    • Fixed a segfault when sending an empty string to Node#inner_html= GH #88

    • String encoding is now set to UTF-8 in Ruby 1.9

    • Fixed a segfault when moving root nodes between documents. GH #91

    • Fixed an O(n) penalty on node creation. GH #101

    • Allowing XML documents to be output as HTML documents

  • Deprecations

    • Hpricot compatibility layer will be removed in 1.4.0

1.3.2 / 2009-06-22

1.3.1 / 2009-06-07

  • Bugfixes

    • extconf.rb checks for optional RelaxNG and Schema functions

    • Namespace nodes are added to the Document node cache

1.3.0 / 2009-05-30

1.2.3 / 2009-03-22

  • Bugfixes

    • Fixing bug where a node is passed in to Node#new

    • Namespace should be assigned on DocumentFragment creation. LH #66

    • Nokogiri::HTML returns an empty Document when given a blank string GH#11

    • Adding a child will remove duplicate namespace declarations LH #67

    • Builder methods take a hash as a second argument

1.2.2 / 2009-03-14

  • New features

    • Nokogiri builder interface improvements

  • Bugfixes

    • Fixed a tag nesting problem in the Builder API (LH #41)

    • Nokogiri::HTML.fragment will properly handle text only nodes (LH #43)

    • Nokogiri::HTML::NamedCharacters delegates to libxml2

    • Nokogiri::XML::Node#[] can take a symbol (LH #48)

    • vasprintf for windows updated. Thanks Geoffroy Couprie!

    • Nokogiri::XML::Node#[]= should not encode entities (LH #55)

    • Namespaces should be copied to reparented nodes (LH #56)

    • Nokogiri uses encoding set on the string for default in Ruby 1.9

    • Document#dup should create a new document of the same type (LH #59)

    • Document should not have a parent method (LH #64)

1.2.1 / 2009-02-23

  • Bugfixes

    • Fixed a CSS selector space bug

    • Fixed Ruby 1.9 String Encoding (Thanks 角谷さん!)

1.2.0 / 2009-02-22

1.1.1

  • New features

    • Added XML::Node#elem?

    • Added XML::Node#attribute_nodes

    • Added XML::Attr

    • XML::Node#delete added.

    • XML::NodeSet#inner_html added.

  • Bugfixes

    • Not including an HTML entity for r for HTML nodes.

    • Removed CSS::SelectorHandler and XML::XPathHandler

    • XML::Node#attributes returns an Attr node for the value.

    • XML::NodeSet implements to_xml

1.1.0

  • New Features

    • Nokogiri::XML::Node#<< will add a child to the current node

  • Bugfixes

    • Mutex lock on CSS cache access

    • Fixed build problems with GCC 3.3.5

    • XML::Node#to_xml now takes an indentation argument

    • XML::Node#dup takes an optional depth argument

    • XML::Node#add_previous_sibling returns new sibling node.

1.0.7

  • Bugfixes

    • Fixed memory leak when using Dike

    • SAX parser now parses IO streams

    • Comment nodes have their own class

    • Nokogiri() should delegate to Nokogiri.parse()

    • Prepending rather than appending to ENV[‘PATH’] on windows

    • Fixed a bug in complex CSS negation selectors

1.0.6

  • 5 Bugfixes

    • XPath Parser raises a SyntaxError on parse failure

    • CSS Parser raises a SyntaxError on parse failure

    • filter() and not() hpricot compatibility added

    • CSS searches via Node#search are now always relative

    • CSS to XPath conversion is now cached

1.0.5

  • Bugfixes

    • Added mailing list and ticket tracking information to the README.txt

    • Sets ENV[‘PATH’] on windows if it doesn’t exist

    • Caching results of NodeSet#[] on Document

1.0.4

  • Bugfixes

    • Changed memory mangement from weak refs to document refs

    • Plugged some memory leaks

    • Builder blocks can call methods from surrounding contexts

1.0.3

  • 5 Bugfixes

    • NodeSet now implements to_ary

    • XML::Document should not implement parent

    • More GC Bugs fixed. (Mike is AWESOME!)

    • Removed RARRAY_LEN for 1.8.5 compatibility. Thanks Shane Hanna.

    • inner_html fixed. (Thanks Yehuda!)

1.0.2

  • 1 Bugfix

    • extconf.rb should not check for frex and racc

1.0.1

  • 1 Bugfix

    • Made sure extconf.rb searched libdir and prefix so that ports libxml/ruby will link properly. Thanks lucsky!

1.0.0 / 2008-07-13

  • 1 major enhancement

    • Birthday!