class Nokogiri::XML::ParseOptions

Options that control the parsing behavior for XML::Document, XML::DocumentFragment, HTML4::Document, HTML4::DocumentFragment, XSLT::Stylesheet, and XML::Schema.

These options directly expose libxml2’s parse options, which are all boolean in the sense that an option is “on” or “off”.

💡 Note that HTML5 parsing has a separate, orthogonal set of options due to the nature of the HTML5 specification. See Nokogiri::HTML5.

⚠ Not all parse options are supported on JRuby. Nokogiri will attempt to invoke the equivalent behavior in Xerces/NekoHTML on JRuby when it’s possible.

Setting and unsetting parse options

You can build your own combinations of parse options by using any of the following methods:

ParseOptions method chaining

Every option has an equivalent method in lowercase. You can chain these methods together to set various combinations.

# Set the HUGE & PEDANTIC options
po = Nokogiri::XML::ParseOptions.new.huge.pedantic
doc = Nokogiri::XML::Document.parse(xml, nil, nil, po)

Every option has an equivalent no{option} method in lowercase. You can call these methods on an instance of ParseOptions to unset the option.

# Set the HUGE & PEDANTIC options
po = Nokogiri::XML::ParseOptions.new.huge.pedantic

# later we want to modify the options
po.nohuge # Unset the HUGE option
po.nopedantic # Unset the PEDANTIC option

💡 Note that some options begin with “no” leading to the logical but perhaps unintuitive double negative:

po.nocdata # Set the NOCDATA parse option
po.nonocdata # Unset the NOCDATA parse option

💡 Note that negation is not available for STRICT, which is itself a negation of all other features.

Using Ruby Blocks

Most parsing methods will accept a block for configuration of parse options, and we recommend chaining the setter methods:

doc = Nokogiri::XML::Document.parse(xml) { |config| config.huge.pedantic }
ParseOptions constants

You can also use the constants declared under Nokogiri::XML::ParseOptions to set various combinations. They are bits in a bitmask, and so can be combined with bitwise operators:

po = Nokogiri::XML::ParseOptions.new(Nokogiri::XML::ParseOptions::HUGE | Nokogiri::XML::ParseOptions::PEDANTIC)
doc = Nokogiri::XML::Document.parse(xml, nil, nil, po)

Constants

BIG_LINES

Support line numbers up to long int (default is a short int). On by default for for XML::Document, XML::DocumentFragment, HTML4::Document, HTML4::DocumentFragment, XSLT::Stylesheet, and XML::Schema.

COMPACT

Compact small text nodes. Off by default.

⚠ No modification of the DOM tree is allowed after parsing. libxml2 may crash if you try to modify the tree.

DEFAULT_HTML

The options mask used by default used for parsing HTML4::Document and HTML4::DocumentFragment

DEFAULT_SCHEMA

The options mask used by default used for parsing XML::Schema

DEFAULT_XML

The options mask used by default for parsing XML::Document and XML::DocumentFragment

DEFAULT_XSLT

The options mask used by default used for parsing XSLT::Stylesheet

DTDATTR

Default DTD attributes. On by default for XSLT::Stylesheet.

DTDLOAD

Load external subsets. On by default for XSLT::Stylesheet.

It is UNSAFE to set this option when parsing untrusted documents.

DTDVALID

Validate with the DTD. Off by default.

HUGE

Relax any hardcoded limit from the parser. Off by default.

It is UNSAFE to set this option when parsing untrusted documents.

NOBASEFIX

Do not fixup XInclude xml:base uris. Off by default

NOBLANKS

Remove blank nodes. Off by default.

NOCDATA

Merge CDATA as text nodes. On by default for XSLT::Stylesheet.

NODICT

Do not reuse the context dictionary. Off by default.

NOENT

Substitute entities. Off by default.

⚠ This option enables entity substitution, contrary to what the name implies.

It is UNSAFE to set this option when parsing untrusted documents.

NOERROR

Suppress error reports. On by default for HTML4::Document and HTML4::DocumentFragment

NONET

Forbid network access. On by default for XML::Document, XML::DocumentFragment, HTML4::Document, HTML4::DocumentFragment, XSLT::Stylesheet, and XML::Schema.

It is UNSAFE to unset this option when parsing untrusted documents.

NOWARNING

Suppress warning reports. On by default for HTML4::Document and HTML4::DocumentFragment

NOXINCNODE

Do not generate XInclude START/END nodes. Off by default.

NSCLEAN

Remove redundant namespaces declarations. Off by default.

OLD10

Parse using XML-1.0 before update 5. Off by default

PEDANTIC

Enable pedantic error reporting. Off by default.

RECOVER

Recover from errors. On by default for XML::Document, XML::DocumentFragment, HTML4::Document, HTML4::DocumentFragment, XSLT::Stylesheet, and XML::Schema.

SAX1

Use the SAX1 interface internally. Off by default.

STRICT

Strict parsing

XINCLUDE

Implement XInclude substitution. Off by default.

Attributes

options[RW]
to_i[RW]

Public Class Methods

new(options = STRICT) click to toggle source
# File lib/nokogiri/xml/parse_options.rb, line 165
def initialize(options = STRICT)
  @options = options
end

Public Instance Methods

==(other) click to toggle source
# File lib/nokogiri/xml/parse_options.rb, line 198
def ==(other)
  other.to_i == to_i
end
inspect() click to toggle source
Calls superclass method
# File lib/nokogiri/xml/parse_options.rb, line 204
def inspect
  options = []
  self.class.constants.each do |k|
    options << k.downcase if send(:"#{k.downcase}?")
  end
  super.sub(/>$/, " " + options.join(", ") + ">")
end
strict() click to toggle source
# File lib/nokogiri/xml/parse_options.rb, line 189
def strict
  @options &= ~RECOVER
  self
end
strict?() click to toggle source
# File lib/nokogiri/xml/parse_options.rb, line 194
def strict?
  @options & RECOVER == STRICT
end