module Nokogiri::XML::SAX

SAX Parsers are event-driven parsers.

Two SAX parsers for XML are available, a parser that reads from a string or IO object as it feels necessary, and a parser that you explicitly feed XML in chunks. If you want to let Nokogiri deal with reading your XML, use the Nokogiri::XML::SAX::Parser. If you want to have fine grain control over the XML input, use the Nokogiri::XML::SAX::PushParser.

If you want to do SAX style parsing of HTML, check out Nokogiri::HTML4::SAX.

The basic way a SAX style parser works is by creating a parser, telling the parser about the events we’re interested in, then giving the parser some XML to process. The parser will notify you when it encounters events you said you would like to know about.

To register for events, subclass Nokogiri::XML::SAX::Document and implement the methods for which you would like notification.

For example, if I want to be notified when a document ends, and when an element starts, I would write a class like this:

class MyHandler < Nokogiri::XML::SAX::Document
  def end_document
    puts "the document has ended"
  end

  def start_element name, attributes = []
    puts "#{name} started"
  end
end

Then I would instantiate a SAX parser with this document, and feed the parser some XML

# Create a new parser
parser = Nokogiri::XML::SAX::Parser.new(MyHandler.new)

# Feed the parser some XML
parser.parse(File.open(ARGV[0]))

Now my document handler will be called when each node starts, and when then document ends. To see what kinds of events are available, take a look at Nokogiri::XML::SAX::Document.