module Nokogiri

Nokogiri parses and searches XML/HTML very quickly, and also has correctly implemented CSS3 selector support as well as XPath 1.0 support.

Parsing a document returns either a Nokogiri::XML::Document, or a Nokogiri::HTML4::Document depending on the kind of document you parse.

Here is an example:

require 'nokogiri'
require 'open-uri'

# Get a Nokogiri::HTML4::Document for the page we’re interested in...

doc = Nokogiri::HTML4(URI.open('http://www.google.com/search?q=tenderlove'))

# Do funky things with it using Nokogiri::XML::Node methods...

####
# Search for nodes by css
doc.css('h3.r a.l').each do |link|
  puts link.content
end

See also:

Constants

HTML

Alias for Nokogiri::HTML4

JAR_DEPENDENCIES

generated by the :vendor_jars rake task

NEKO_VERSION
VERSION

The version of Nokogiri you are using

XERCES_VERSION

Public Class Methods

HTML(input, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML, &block) → Nokogiri::HTML4::Document click to toggle source

Parse HTML. Convenience method for Nokogiri::HTML4::Document.parse

# File lib/nokogiri/html.rb, line 10
  
HTML4(...) click to toggle source

Convenience method for Nokogiri::HTML4::Document.parse

# File lib/nokogiri/html4.rb, line 7
def HTML4(...)
  Nokogiri::HTML4::Document.parse(...)
end
HTML5(...) click to toggle source

Convenience method for Nokogiri::HTML5::Document.parse

# File lib/nokogiri/html5.rb, line 28
def self.HTML5(...)
  Nokogiri::HTML5::Document.parse(...)
end
Slop(*args, &block) click to toggle source

Parse a document and add the Slop decorator. The Slop decorator implements method_missing such that methods may be used instead of CSS or XPath. For example:

doc = Nokogiri::Slop(<<-eohtml)
  <html>
    <body>
      <p>first</p>
      <p>second</p>
    </body>
  </html>
eohtml
assert_equal('second', doc.html.body.p[1].text)
# File lib/nokogiri.rb, line 91
def Slop(*args, &block)
  Nokogiri(*args, &block).slop!
end
XML(...) click to toggle source

Convenience method for Nokogiri::XML::Document.parse

# File lib/nokogiri/xml.rb, line 6
def XML(...)
  Nokogiri::XML::Document.parse(...)
end
XSLT(...) click to toggle source

Convenience method for Nokogiri::XSLT.parse

# File lib/nokogiri/xslt.rb, line 7
def XSLT(...)
  XSLT.parse(...)
end
make(input = nil, opts = {}, &blk) click to toggle source

Create a new Nokogiri::XML::DocumentFragment

# File lib/nokogiri.rb, line 68
def make(input = nil, opts = {}, &blk)
  if input
    Nokogiri::HTML4.fragment(input).children.first
  else
    Nokogiri(&blk)
  end
end
parse(string, url = nil, encoding = nil, options = nil) { |doc| ... } click to toggle source

Parse an HTML or XML document. string contains the document.

# File lib/nokogiri.rb, line 42
def parse(string, url = nil, encoding = nil, options = nil)
  if string.respond_to?(:read) ||
      /^\s*<(?:!DOCTYPE\s+)?html[\s>]/i.match?(string[0, 512])
    # Expect an HTML indicator to appear within the first 512
    # characters of a document. (<?xml ?> + <?xml-stylesheet ?>
    # shouldn't be that long)
    Nokogiri.HTML4(
      string,
      url,
      encoding,
      options || XML::ParseOptions::DEFAULT_HTML,
    )
  else
    Nokogiri.XML(
      string,
      url,
      encoding,
      options || XML::ParseOptions::DEFAULT_XML,
    )
  end.tap do |doc|
    yield doc if block_given?
  end
end