class Nokogiri::HTML5::Document
Since v1.12.0
💡 HTML5
functionality is not available when running JRuby.
Attributes
Get the parser’s quirks mode value. See HTML5::QuirksMode
.
This method returns nil
if the parser was not invoked (e.g., Nokogiri::HTML5::Document.new
).
Since v1.14.0
Get the url name for this document, as passed into Document.parse
, Document.read_io
, or Document.read_memory
Public Class Methods
Parse HTML input with a parser compliant with the HTML5
spec. This method uses the encoding of input
if it can be determined, or else falls back to the encoding:
parameter.
- Required Parameters
-
input
(String | IO) the HTML content to be parsed.
- Optional Parameters
-
url:
(String) the base URI of the document.
- Optional Keyword Arguments
-
encoding:
(Encoding) The name of the encoding that should be used when processing the document. When not provided, the encoding will be determined based on the document content. -
max_errors:
(Integer) The maximum number of parse errors to record. (defaultNokogiri::Gumbo::DEFAULT_MAX_ERRORS
which is currently 0) -
max_tree_depth:
(Integer) The maximum depth of the parse tree. (defaultNokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH
) -
max_attributes:
(Integer) The maximum number of attributes allowed on an element. (defaultNokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES
) -
parse_noscript_content_as_text:
(Boolean) Whether to parse the content ofnoscript
elements as text. (defaultfalse
)
See Parsing options at HTML5
for a complete description of these parsing options.
- Yields
-
If present, the block will be passed a Hash object to modify with parse options before the input is parsed. See Parsing options at
HTML5
for a list of available options.âš Note that
url:
andencoding:
cannot be set by the configuration block. - Returns
Example: Parse a string with a specific encoding and custom max errors limit.
Nokogiri::HTML5::Document.parse(socket, encoding: "ISO-8859-1", max_errors: 10)
Example: Parse a string setting the :parse_noscript_content_as_text
option using the configuration block parameter.
Nokogiri::HTML5::Document.parse(input) { |c| c[:parse_noscript_content_as_text] = true }
# File lib/nokogiri/html5/document.rb, line 103 def parse( string_or_io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options, &block ) yield options if block string_or_io = "" unless string_or_io if string_or_io.respond_to?(:encoding) && string_or_io.encoding != Encoding::ASCII_8BIT encoding ||= string_or_io.encoding.name end if string_or_io.respond_to?(:read) && string_or_io.respond_to?(:path) url ||= string_or_io.path end unless string_or_io.respond_to?(:read) || string_or_io.respond_to?(:to_str) raise ArgumentError, "not a string or IO object" end do_parse(string_or_io, url, encoding, **options) end
Create a new document from an IO object.
💡 Most users should prefer Document.parse
to this method.
# File lib/nokogiri/html5/document.rb, line 129 def read_io(io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options) raise ArgumentError, "io object doesn't respond to :read" unless io.respond_to?(:read) do_parse(io, url, encoding, **options) end
Create a new document from a String.
💡 Most users should prefer Document.parse
to this method.
# File lib/nokogiri/html5/document.rb, line 138 def read_memory(string, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options) raise ArgumentError, "string object doesn't respond to :to_str" unless string.respond_to?(:to_str) do_parse(string, url, encoding, **options) end
Public Instance Methods
Parse a HTML5
document fragment from markup
, returning a Nokogiri::HTML5::DocumentFragment
.
- Properties
-
markup
(String) TheHTML5
markup fragment to be parsed
- Returns
-
Nokogiri::HTML5::DocumentFragment
. This object’s children will be empty ifmarkup
is not passed, is empty, or isnil
.
# File lib/nokogiri/html5/document.rb, line 178 def fragment(markup = nil) DocumentFragment.new(self, markup) end
- Returns
-
The document type which determines CSS-to-XPath translation.
See CSS::XPathVisitor
for more information.
# File lib/nokogiri/html5/document.rb, line 194 def xpath_doctype Nokogiri::CSS::XPathVisitor::DoctypeConfig::HTML5 end