Install
sudo gem install nokogiri
Contribute
github.com/tenderlove/nokogiri

An HTML, XML, SAX, & Reader parser with the ability to search documents via XPath or CSS3 selectors… and much more

Nokogiri

CHANGELOG.rdoc

HEAD

  • New Features
    • Added Nokogiri::LIBXML_ICONV_ENABLED
    • Alias Node#[] to Node#attr
    • XML::Node#next_element added
    • XML::Node#> added for searching a nodes immediate children
    • XML::NodeSet#reverse added
    • Added fragment support to Node#add_child, Node#add_next_sibling, Node#add_previous_sibling, and Node#replace.
    • XML::Node#previous_element implemented
    • Rubinius support
    • Ths CSS selector engine now supports :has()
    • XML::NodeSet#filter() was added
    • XML::Node.next= and .previous= are aliases for add_next_sibling and add_previous_sibling. GH 183
  • Bugfixes
    • XML fragments with namespaces do not raise an exception (regression in 1.4.0)
    • Node#matches? works in nodes contained by a DocumentFragment. GH 158
    • Document should not define add_namespace() method. GH 169
    • XPath queries returning namespace declarations do not segfault.
    • Node#replace works with nodes from different documents. GH 162
    • Adding XML::Document#collect_namespaces
    • Fixed bugs in the SOAP4R adapter
    • Fixed bug in XML::Node#next_element for certain edge cases
    • Fixed load path issue with JRuby under Windows. GH 160.
    • XSLT#apply_to will honor the “output method”. Thanks richardlehane!
    • Fragments containing leading text nodes with newlines now parse properly. GH 178.

1.4.0 / 2009/10/30

  • Happy Birthday!
  • New Features
    • Node#at_xpath returns the first element of the NodeSet matching the XPath expression.
    • Node#at_css returns the first element of the NodeSet matching the CSS selector.
    • NodeSet#| for unions GH 119 (Thanks Serabe!)
    • NodeSet#inspect makes prettier output
    • Node#inspect implemented for more rubyish document inspecting
    • Added XML::DTD#external_id
    • Added XML::DTD#system_id
    • Added XML::ElementContent for DTD Element content validity
    • Better namespace declaration support in Nokogiri::XML::Builder
    • Added XML::Node#external_subset
    • Added XML::Node#create_external_subset
    • Added XML::Node#create_internal_subset
    • XML Builder can append raw strings (GH 141, patch from dudleyf)
    • XML::SAX::ParserContext added
    • XML::Document#remove_namespaces! for the namespace-impaired
  • Bugfixes
    • returns nil when HTML documents do not declare a meta encoding tag. GH 115
    • Uses RbConfig::CONFIG[‘host_os’] to adjust ENV[‘PATH’] GH 113
    • NodeSet#search is more efficient GH 119 (Thanks Serabe!)
    • NodeSet#xpath handles custom xpath functions
    • Fixing a SEGV when XML::Reader gets attributes for current node
    • Node#inner_html takes the same arguments as Node#to_html GH 117
    • DocumentFragment#css delegates to it’s child nodes GH 123
    • NodeSet#[] works with slices larger than NodeSet#length GH 131
    • Reparented nodes maintain their namespace. GH 134
    • Fixed SEGV when adding an XML::Document to NodeSet
    • XML::SyntaxError can be duplicated. GH 148
  • Deprecations
    • Hpricot compatibility layer removed

1.3.3 / 2009/07/26

  • New Features
    • NodeSet#children returns all children of all nodes
  • Bugfixes
    • Override libxml-ruby’s global error handler
    • ParseOption#strict fixed
    • Fixed a segfault when sending an empty string to Node#inner_html= GH 88
    • String encoding is now set to UTF-8 in Ruby 1.9
    • Fixed a segfault when moving root nodes between documents. GH 91
    • Fixed an O(n) penalty on node creation. GH 101
    • Allowing XML documents to be output as HTML documents
  • Deprecations
    • Hpricot compatibility layer will be removed in 1.4.0

1.3.2 / 2009-06-22

1.3.1 / 2009-06-07

  • Bugfixes
    • extconf.rb checks for optional RelaxNG and Schema functions
    • Namespace nodes are added to the Document node cache

1.3.0 / 2009-05-30

1.2.3 / 2009-03-22

  • Bugfixes
    • Fixing bug where a node is passed in to Node#new
    • Namespace should be assigned on DocumentFragment creation. LH 66
    • Nokogiri::XML::NodeSet#dup works GH 10
    • Nokogiri::HTML returns an empty Document when given a blank string GH#11
    • Adding a child will remove duplicate namespace declarations LH 67
    • Builder methods take a hash as a second argument

1.2.2 / 2009-03-14

1.2.1 / 2009-02-23

  • Bugfixes
    • Fixed a CSS selector space bug
    • Fixed Ruby 1.9 String Encoding (Thanks 角谷さん!)

1.2.0 / 2009-02-22

1.1.1

  • New features
    • Added XML::Node#elem?
    • Added XML::Node#attribute_nodes
    • Added XML::Attr
    • XML::Node#delete added.
    • XML::NodeSet#inner_html added.
  • Bugfixes
    • Not including an HTML entity for r for HTML nodes.
    • Removed CSS::SelectorHandler and XML::XPathHandler
    • XML::Node#attributes returns an Attr node for the value.
    • XML::NodeSet implements to_xml

1.1.0

  • New Features
  • Bugfixes
    • Mutex lock on CSS cache access
    • Fixed build problems with GCC 3.3.5
    • XML::Node#to_xml now takes an indentation argument
    • XML::Node#dup takes an optional depth argument
    • XML::Node#add_previous_sibling returns new sibling node.

1.0.7

  • Bugfixes
    • Fixed memory leak when using Dike
    • SAX parser now parses IO streams
    • Comment nodes have their own class
    • Nokogiri() should delegate to Nokogiri.parse()
    • Prepending rather than appending to ENV[‘PATH’] on windows
    • Fixed a bug in complex CSS negation selectors

1.0.6

  • 5 Bugfixes
    • XPath Parser raises a SyntaxError on parse failure
    • CSS Parser raises a SyntaxError on parse failure
    • filter() and not() hpricot compatibility added
    • CSS searches via Node#search are now always relative
    • CSS to XPath conversion is now cached

1.0.5

  • Bugfixes
    • Added mailing list and ticket tracking information to the README.txt
    • Sets ENV[‘PATH’] on windows if it doesn’t exist
    • Caching results of NodeSet#[] on Document

1.0.4

  • Bugfixes
    • Changed memory mangement from weak refs to document refs
    • Plugged some memory leaks
    • Builder blocks can call methods from surrounding contexts

1.0.3

  • 5 Bugfixes
    • NodeSet now implements to_ary
    • XML::Document should not implement parent
    • More GC Bugs fixed. (Mike is AWESOME!)
    • Removed RARRAY_LEN for 1.8.5 compatibility. Thanks Shane Hanna.
    • inner_html fixed. (Thanks Yehuda!)

1.0.2

  • 1 Bugfix
    • extconf.rb should not check for frex and racc

1.0.1

  • 1 Bugfix
    • Made sure extconf.rb searched libdir and prefix so that ports libxml/ruby will link properly. Thanks lucsky!

1.0.0 / 2008-07-13

  • 1 major enhancement
    • Birthday!