class Nokogiri::XML::SAX::Document

The SAX::Document class is used for registering types of events you are interested in handling. All of the methods on this class are available as possible events while parsing an XML document. To register for any particular event, subclass this class and implement the methods you are interested in knowing about.

To only be notified about start and end element events, write a class like this:

class MyHandler < Nokogiri::XML::SAX::Document
  def start_element name, attrs = []
    puts "#{name} started!"
  end

  def end_element name
    puts "#{name} ended"
  end
end

You can use this event handler for any SAX-style parser included with Nokogiri.

See also:

Entity Handling

⚠ Entity handling is complicated in a SAX parser! Please read this section carefully if you’re not getting the behavior you expect.

Entities will be reported to the user via callbacks to #characters, to #reference, or possibly to both. The behavior is determined by a combination of entity type and the value of ParserContext#replace_entities. (Recall that the default value of ParserContext#replace_entities is false.)

It is UNSAFE to set ParserContext#replace_entities to true when parsing untrusted documents.

💡 For more information on entity types, see Wikipedia’s page on DTDs.

Entity type #characters #reference
Char ref (e.g., &#146;) always never
Predefined (e.g., &amp;) always never
Undeclared † never #replace_entities == false
Internal always #replace_entities == false
External † #replace_entities == true #replace_entities == false

 

† In the case where the replacement text for the entity is unknown (e.g., an undeclared entity or an external entity that could not be resolved because of network issues), then the replacement text will not be reported. If ParserContext#replace_entities is true, this means the #characters callback will not be invoked. If ParserContext#replace_entities is false, then the #reference callback will be invoked, but with nil for the content argument.