Skip to content

2022-12 Hide libxml2 and libxslt symbols on Darwin in Ruby 3.2 native gem

Status

Accepted, but reversible if an alternative technical solution can be found.

Context

In the final days of shipping Nokogiri v1.14.0 with native (precompiled) support for Ruby 3.2, we're struggling a bit with symbol resolution.

Ruby 3.2, when compiling on Darwin, uses the -bundle_loader linker flag to resolve symbols against the Ruby executable as if it were a shared library. (This means that, when running a Ruby compiled with the --enable-shared flag, that the extension will fail to resolve Ruby symbols like rb_cObject.)

We can work around that with the -flat_namespace linker flag, which mimics the behavior we already see on Linux and allows us to resolve these symbols at runtime. But for reasons I don't fully understand, many Rubies on Darwin seem to load the libxml2 and libxslt dylibs that ship with XCode commandline tools ("CLT"), and so every libxml2 symbol is a collision and resolves to the wrong libxml2 (not the version we've patched and statically linked into the extension).

To work around this last problem, the best solution we know of right now seems to be to avoid exporting those symbols by using the -load_hidden flag (or a similar mechanism, there are several we could choose from).

Decision

Nokogiri v1.14.0's precompiled native gem for Darwin (MacOS) Ruby 3.2 will be built with:

  • the -flat_namespace flag to ensure the extension can be used by both --enable-shared and --disable-shared Rubies,
  • and the -load_hidden flag for both libxml2 and libxslt to avoid accidentally resolving to non-vendored versions of those libraries

Consequences

This would prevent accidental symbol collisions such as the https://github.com/sparklemotion/nokogiri/pull/2106 on Linux, and would ensure that we always pull in the desired version of libxml2, avoiding problems like the ones we're currently experiencing with Ruby 3.2 (see https://github.com/rake-compiler/rake-compiler-dock/issues/87 for extended discussion and more links).

This would also, however, prevent a small but non-zero number of downstream gems from integrating with Nokogiri's C API, or the C API of libxml2, libxslt, or libgumbo. A notable gem that did this was https://github.com/rubys/nokogumbo (now merged into Nokogiri itself). Another notable gem that I know that does this is nokogiri-xmlsec (and the various forks of it, the most popular seems to be https://github.com/instructure/nokogiri-xmlsec-instructure). So this may prevent experimentation and innovation (see Nokogumbo) as well as putting hurdles in front of useful integrations like xmlsec.

Alternatives considered

Remove the -bundle_loader flag from the link line. Although this works, it feels a bit like fighting the toolchain and the Ruby core team. It's a bit more complicated of a solution, it's harder for me to reason about, and I'm not positive we won't discover some weird side effect later on.

Fully hide all symbols everywhere is taking the chosen solution to the extreme, and may be what we decide to do in the future (see RFC: Stop exporting symbols · Discussion #2746 · sparklemotion/nokogiri). For now, though, I'd like to keep our options open and not break compatibility completely in this v1.14.0 release. By only doing this where we're forced to, we have a chance to learn about how the API is being used, and also buy some time to hear feedback and to find an alternative solution.

Stop precompiling or Stop vendoring libraries should always be options we consider, because offering native gems and vendoring libraries introduces complexity. However, I covered many of the reasons I think it's good for Nokogiri to do this in my RubyConf 2021 talk titled "Building Native Extensions. This Could Take A While..." and those reasons are still valid, notably our ability to patch libxml2 for performance (see #2144), functional (see #2403), or security (see #2294) reasons.

References