Skip to content

Roadmap for API Changes

overhaul serialize/pretty printing API

  • #530 XHTML formatting can't be turned off

  • #415 XML formatting should be no formatting

overhaul and optimize the SAX parsing

  • see fairy wing throwdown - SAX parsing is wicked slow.

Node should not be Enumerable; and should have a better attributes API

  • #679 Mixing in Enumerable has some unintended consequences; plus we want to improve the attributes API

  • Some ideas for a better attributes API?

  • (closed) #666
  • #765

improve CSS query parsing

  • #528 support :not() with a nontrivial argument, like :not(div p.c)

  • #451 chained :not pseudoselectors

  • better jQuery selector and CSS pseudo-selector support:

  • #621
  • #342
  • #628
  • #652
  • #688

  • #394 nth-of-type is wrong, and possibly other selectors as well

  • #309 incorrect query being executed

  • #350 :has is wrong?

DocumentFragment

  • there are a few tickets about searches not working properly if you use or do not use the context node as part of the search.
  • #213
  • #370
  • #454
  • #572 could we fix this by making DocumentFragment be a subclass of NodeSet?

Better Syntax for custom XPath function handler

Better Syntax around Node#xpath and NodeSet#xpath

  • look at those methods, and use of Node#extract_params in Node#{css,search}
  • we should standardize on a hash of options for these and other calls
  • what should NodeSet#xpath return?
  • #656

Encoding

We have a lot of issues open around encoding. How bad are things? Somebody who knows encoding well should head this up.

Reader

It's fundamentally broken, in that we can't stop people from crashing their application if they want to use object reference unsafely.

Class methods that require Document

There are a few methods, like Nokogiri::XML::Comment.new that require a Document object.

We should probably make Document instance methods to wrap this, since it's a non-obvious expectation and thus fails as a convention.

So, instead, let's make alternative methods like Nokogiri::XML::Document#new_comment, and recommend those as the proper convention.

collect_namespaces is just broken

collect_namespaces is returning a hash, which means it can't return namespaces with the same prefix. See this issue for background:

#885

Do we care? This seems like a useless method, but then again I hate XML, so what do I know?

Overhaul ParseOptions

Currently we mirror libxml2's parse options, and then retrofit those options on top of Xerces-J for JRuby.

  • I'd like to identify which options work across both parsers,
  • And overhaul the parse methods so that these options are easier to use.

By "easier to use" I mean:

  • it's unwieldy to create a block to set/unset parse options
  • it's unwieldy to create a constant like MY_PARSE_OPTIONS = Nokogiri::XML::ParseOptions::STRICT | Nokogiri::XML::ParseOptions::RECOVER ...
  • some options are named dangerously poorly, like NOENT which does the opposite of what it says
  • semantically some options should be set/unset together, specifically "this is a trusted document" or "this is an untrusted document" should flip the senses of NONET and NOENT and DTDLOAD together.
  • we need the ability to invent new parse options, like the one suggested in #1582 that would allow local entities but not external entities.