class Nokogiri::XML::ParseOptions
Options that control the parsing behavior for XML::Document
, XML::DocumentFragment
, HTML4::Document
, HTML4::DocumentFragment
, XSLT::Stylesheet
, and XML::Schema
.
These options directly expose libxml2’s parse options, which are all boolean in the sense that an option is “on” or “off”.
💡 Note that HTML5
parsing has a separate, orthogonal set of options due to the nature of the HTML5
specification. See Nokogiri::HTML5
.
⚠ Not all parse options are supported on JRuby. Nokogiri
will attempt to invoke the equivalent behavior in Xerces/NekoHTML on JRuby when it’s possible.
Setting and unsetting parse options¶ ↑
You can build your own combinations of parse options by using any of the following methods:
-
ParseOptions
method chaining -
Every option has an equivalent method in lowercase. You can chain these methods together to set various combinations.
# Set the HUGE & PEDANTIC options po = Nokogiri::XML::ParseOptions.new.huge.pedantic doc = Nokogiri::XML::Document.parse(xml, nil, nil, po)
Every option has an equivalent
no{option}
method in lowercase. You can call these methods on an instance ofParseOptions
to unset the option.# Set the HUGE & PEDANTIC options po = Nokogiri::XML::ParseOptions.new.huge.pedantic # later we want to modify the options po.nohuge # Unset the HUGE option po.nopedantic # Unset the PEDANTIC option
💡 Note that some options begin with “no” leading to the logical but perhaps unintuitive double negative:
po.nocdata # Set the NOCDATA parse option po.nonocdata # Unset the NOCDATA parse option
💡 Note that negation is not available for
STRICT
, which is itself a negation of all other features. - Using Ruby Blocks
-
Most parsing methods will accept a block for configuration of parse options, and we recommend chaining the setter methods:
doc = Nokogiri::XML::Document.parse(xml) { |config| config.huge.pedantic }
-
ParseOptions
constants -
You can also use the constants declared under
Nokogiri::XML::ParseOptions
to set various combinations. They are bits in a bitmask, and so can be combined with bitwise operators:po = Nokogiri::XML::ParseOptions.new(Nokogiri::XML::ParseOptions::HUGE | Nokogiri::XML::ParseOptions::PEDANTIC) doc = Nokogiri::XML::Document.parse(xml, nil, nil, po)
Constants
- BIG_LINES
-
Support line numbers up to
long int
(default is ashort int
). On by default for forXML::Document
,XML::DocumentFragment
,HTML4::Document
,HTML4::DocumentFragment
,XSLT::Stylesheet
, andXML::Schema
. - COMPACT
-
Compact small text nodes. Off by default.
⚠ No modification of the DOM tree is allowed after parsing. libxml2 may crash if you try to modify the tree.
- DEFAULT_HTML
-
The options mask used by default used for parsing
HTML4::Document
andHTML4::DocumentFragment
- DEFAULT_SCHEMA
-
The options mask used by default used for parsing
XML::Schema
- DEFAULT_XML
-
The options mask used by default for parsing
XML::Document
andXML::DocumentFragment
- DEFAULT_XSLT
-
The options mask used by default used for parsing
XSLT::Stylesheet
- DTDATTR
-
Default
DTD
attributes. On by default forXSLT::Stylesheet
. - DTDLOAD
-
Load external subsets. On by default for
XSLT::Stylesheet
.⚠ It is UNSAFE to set this option when parsing untrusted documents.
- DTDVALID
-
Validate with the
DTD
. Off by default. - HUGE
-
Relax any hardcoded limit from the parser. Off by default.
⚠ It is UNSAFE to set this option when parsing untrusted documents.
- NOBASEFIX
-
Do not fixup XInclude xml:base uris. Off by default
- NOBLANKS
-
Remove blank nodes. Off by default.
- NOCDATA
-
Merge
CDATA
as text nodes. On by default forXSLT::Stylesheet
. - NODICT
-
Do not reuse the context dictionary. Off by default.
- NOENT
-
Substitute entities. Off by default.
⚠ This option enables entity substitution, contrary to what the name implies.
⚠ It is UNSAFE to set this option when parsing untrusted documents.
- NOERROR
-
Suppress error reports. On by default for
HTML4::Document
andHTML4::DocumentFragment
- NONET
-
Forbid network access. On by default for
XML::Document
,XML::DocumentFragment
,HTML4::Document
,HTML4::DocumentFragment
,XSLT::Stylesheet
, andXML::Schema
.⚠ It is UNSAFE to unset this option when parsing untrusted documents.
- NOWARNING
-
Suppress warning reports. On by default for
HTML4::Document
andHTML4::DocumentFragment
- NOXINCNODE
-
Do not generate XInclude START/END nodes. Off by default.
- NSCLEAN
-
Remove redundant namespaces declarations. Off by default.
- OLD10
-
Parse using XML-1.0 before update 5. Off by default
- PEDANTIC
-
Enable pedantic error reporting. Off by default.
- RECOVER
-
Recover from errors. On by default for
XML::Document
,XML::DocumentFragment
,HTML4::Document
,HTML4::DocumentFragment
,XSLT::Stylesheet
, andXML::Schema
. - SAX1
-
Use the
SAX1
interface internally. Off by default. - STRICT
-
Strict parsing
- XINCLUDE
-
Implement XInclude substitution. Off by default.
Attributes
Public Class Methods
# File lib/nokogiri/xml/parse_options.rb, line 165 def initialize(options = STRICT) @options = options end
Public Instance Methods
# File lib/nokogiri/xml/parse_options.rb, line 198 def ==(other) other.to_i == to_i end
# File lib/nokogiri/xml/parse_options.rb, line 204 def inspect options = [] self.class.constants.each do |k| options << k.downcase if send(:"#{k.downcase}?") end super.sub(/>$/, " " + options.join(", ") + ">") end
# File lib/nokogiri/xml/parse_options.rb, line 189 def strict @options &= ~RECOVER self end
# File lib/nokogiri/xml/parse_options.rb, line 194 def strict? @options & RECOVER == STRICT end