class Nokogiri::HTML5::Document

Since v1.12.0

💡 HTML5 functionality is not available when running JRuby.

Attributes

url[R]

Get the url name for this document, as passed into Document.parse, Document.read_io, or Document.read_memory

Public Class Methods

parse(input) click to toggle source
parse(input, url=nil, encoding=nil, **options)
parse(input, url=nil, encoding=nil) { |options| ... }

Parse HTML5 input.

Parameters
  • input may be a String, or any object that responds to read and close such as an IO, or StringIO.

  • url (optional) is a String indicating the canonical URI where this document is located.

  • encoding (optional) is the encoding that should be used when processing the document.

  • options (optional) is a configuration Hash (or keyword arguments) to set options during parsing. The three currently supported options are :max_errors, :max_tree_depth and :max_attributes, described at Nokogiri::HTML5.

    ⚠ Note that these options are different than those made available by Nokogiri::XML::Document and Nokogiri::HTML4::Document.

  • block (optional) is passed a configuration Hash on which parse options may be set. See Nokogiri::HTML5 for more information and usage.

Returns

Nokogiri::HTML5::Document

# File lib/nokogiri/html5/document.rb, line 61
def parse(string_or_io, url = nil, encoding = nil, **options, &block)
  yield options if block
  string_or_io = "" unless string_or_io

  if string_or_io.respond_to?(:encoding) && string_or_io.encoding != Encoding::ASCII_8BIT
    encoding ||= string_or_io.encoding.name
  end

  if string_or_io.respond_to?(:read) && string_or_io.respond_to?(:path)
    url ||= string_or_io.path
  end
  unless string_or_io.respond_to?(:read) || string_or_io.respond_to?(:to_str)
    raise ArgumentError, "not a string or IO object"
  end

  do_parse(string_or_io, url, encoding, options)
end
read_io(io, url = nil, encoding = nil, **options) click to toggle source

Create a new document from an IO object.

💡 Most users should prefer Document.parse to this method.

# File lib/nokogiri/html5/document.rb, line 82
def read_io(io, url = nil, encoding = nil, **options)
  raise ArgumentError, "io object doesn't respond to :read" unless io.respond_to?(:read)

  do_parse(io, url, encoding, options)
end
read_memory(string, url = nil, encoding = nil, **options) click to toggle source

Create a new document from a String.

💡 Most users should prefer Document.parse to this method.

# File lib/nokogiri/html5/document.rb, line 91
def read_memory(string, url = nil, encoding = nil, **options)
  raise ArgumentError, "string object doesn't respond to :to_str" unless string.respond_to?(:to_str)

  do_parse(string, url, encoding, options)
end

Public Instance Methods

fragment() → Nokogiri::HTML5::DocumentFragment click to toggle source
fragment(markup) → Nokogiri::HTML5::DocumentFragment

Parse a HTML5 document fragment from markup, returning a Nokogiri::HTML5::DocumentFragment.

Properties
  • markup (String) The HTML5 markup fragment to be parsed

Returns

Nokogiri::HTML5::DocumentFragment. This object’s children will be empty if ‘markup` is not passed, is empty, or is `nil`.

# File lib/nokogiri/html5/document.rb, line 127
def fragment(markup = nil)
  DocumentFragment.new(self, markup)
end
xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig click to toggle source
Returns

The document type which determines CSS-to-XPath translation.

See CSS::XPathVisitor for more information.

# File lib/nokogiri/html5/document.rb, line 143
def xpath_doctype
  Nokogiri::CSS::XPathVisitor::DoctypeConfig::HTML5
end