class Nokogiri::HTML5::Document

Since v1.12.0

💡 HTML5 functionality is not available when running JRuby.

Attributes

quirks_mode[R]

Get the parser’s quirks mode value. See HTML5::QuirksMode.

This method returns nil if the parser was not invoked (e.g., Nokogiri::HTML5::Document.new).

Since v1.14.0

url[R]

Get the url name for this document, as passed into Document.parse, Document.read_io, or Document.read_memory

Public Class Methods

parse(input) { |options| ... } → HTML5::Document click to toggle source
parse(input, url: encoding:) { |options| ... } → HTML5::Document
parse(input, **options) → HTML5::Document

Parse HTML input with a parser compliant with the HTML5 spec. This method uses the encoding of input if it can be determined, or else falls back to the encoding: parameter.

Required Parameters
  • input (String | IO) the HTML content to be parsed.

Optional Parameters
  • url: (String) the base URI of the document.

Optional Keyword Arguments
  • encoding: (Encoding) The name of the encoding that should be used when processing the document. When not provided, the encoding will be determined based on the document content.

  • max_errors: (Integer) The maximum number of parse errors to record. (default Nokogiri::Gumbo::DEFAULT_MAX_ERRORS which is currently 0)

  • max_tree_depth: (Integer) The maximum depth of the parse tree. (default Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH)

  • max_attributes: (Integer) The maximum number of attributes allowed on an element. (default Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES)

  • parse_noscript_content_as_text: (Boolean) Whether to parse the content of noscript elements as text. (default false)

See Parsing options at HTML5 for a complete description of these parsing options.

Yields

If present, the block will be passed a Hash object to modify with parse options before the input is parsed. See Parsing options at HTML5 for a list of available options.

âš  Note that url: and encoding: cannot be set by the configuration block.

Returns

Nokogiri::HTML5::Document

Example: Parse a string with a specific encoding and custom max errors limit.

Nokogiri::HTML5::Document.parse(socket, encoding: "ISO-8859-1", max_errors: 10)

Example: Parse a string setting the :parse_noscript_content_as_text option using the configuration block parameter.

Nokogiri::HTML5::Document.parse(input) { |c| c[:parse_noscript_content_as_text] = true }
# File lib/nokogiri/html5/document.rb, line 103
def parse(
  string_or_io,
  url_ = nil, encoding_ = nil,
  url: url_, encoding: encoding_,
  **options, &block
)
  yield options if block
  string_or_io = "" unless string_or_io

  if string_or_io.respond_to?(:encoding) && string_or_io.encoding != Encoding::ASCII_8BIT
    encoding ||= string_or_io.encoding.name
  end

  if string_or_io.respond_to?(:read) && string_or_io.respond_to?(:path)
    url ||= string_or_io.path
  end
  unless string_or_io.respond_to?(:read) || string_or_io.respond_to?(:to_str)
    raise ArgumentError, "not a string or IO object"
  end

  do_parse(string_or_io, url, encoding, **options)
end
read_io(io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options) click to toggle source

Create a new document from an IO object.

💡 Most users should prefer Document.parse to this method.

# File lib/nokogiri/html5/document.rb, line 129
def read_io(io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
  raise ArgumentError, "io object doesn't respond to :read" unless io.respond_to?(:read)

  do_parse(io, url, encoding, **options)
end
read_memory(string, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options) click to toggle source

Create a new document from a String.

💡 Most users should prefer Document.parse to this method.

# File lib/nokogiri/html5/document.rb, line 138
def read_memory(string, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options)
  raise ArgumentError, "string object doesn't respond to :to_str" unless string.respond_to?(:to_str)

  do_parse(string, url, encoding, **options)
end

Public Instance Methods

fragment() → Nokogiri::HTML5::DocumentFragment click to toggle source
fragment(markup) → Nokogiri::HTML5::DocumentFragment

Parse a HTML5 document fragment from markup, returning a Nokogiri::HTML5::DocumentFragment.

Properties
  • markup (String) The HTML5 markup fragment to be parsed

Returns

Nokogiri::HTML5::DocumentFragment. This object’s children will be empty if markup is not passed, is empty, or is nil.

# File lib/nokogiri/html5/document.rb, line 178
def fragment(markup = nil)
  DocumentFragment.new(self, markup)
end
xpath_doctype() → Nokogiri::CSS::XPathVisitor::DoctypeConfig click to toggle source
Returns

The document type which determines CSS-to-XPath translation.

See CSS::XPathVisitor for more information.

# File lib/nokogiri/html5/document.rb, line 194
def xpath_doctype
  Nokogiri::CSS::XPathVisitor::DoctypeConfig::HTML5
end