Systems Engineering

14 min read

Building DuckDB Extensions with Zig and Nix

This post is intended for Nix users and Zig developers who want practical guidance for building and shipping DuckDB extensions with a stable toolchain.

2024-03-03

This blog post is intended for Nix users and Zig developers familiar with DuckDB looking to extend it’s capabilities with custom extensions. If you’re not part of these communities I hope this post is useful enough to learn something new.

Giving the Duck New Wings

DuckDB is a blazing fast in-process analytical database that ships in a multitude of forms, from a binary CLI to C++ shared library.

The creators have been clever enough to provide extensions as a first class primitive and maintain an officially supported C++ extension template. As someone who comes to programming with a C and bytes over types mindset I find it challenging to work with C++ idioms. Over the last few years I’ve been enamoured with the productivity of a new programming language called Zig which has the audacious goal of becoming a better C while easily allowing us to maintain and integrate existing C & C++ codebases. This post will teach you how to use Zig to call the exposed C & C++ DuckDB API.

Nix: Reproducible Environments Across Multiple Hosts

Nix is a powerful package manager that offers a reproducible and declarative approach to managing dependencies. It will be responsible for providing a consistent build environment across your development hosts.

We’ll start by initializing a Nix flake template that includes multiple versions of duckdb (v0.9.2, v0.10.0 & main), Clang and libcxx.

console
> mkdir /tmp/myextension && cd /tmp/myextension
> nix flake init -t github:rupurt/duckdb-extension-template-zig#multi
wrote: /tmp/myextension/flake.nix

Activate the Nix shell

Let’s activate the default dev shell in the flake.

console
> nix develop -c $SHELL

Grab yourself a Tetley’s or fresh brew, Nix will take a few minutes to download and build our dependencies. When it’s finished verify that it completed successfully.

console
> duckdb --version
v0.9.2
console
> zig version
0.12.0-dev.3124+9e402704e
console
> clang --version
clang version 16.0.6
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /nix/store/naz8m910ndn4wzt6f7r_bp3c0000gn-clang-16.0.6/bin
console
# linux
> ldd $(which duckdb)
        ...
        libc++abi.so.1 => /nix/store/83qw6n5pllv5h6njrchgsd4kxkvmzlqg-libcxxabi-16.0.6/lib/libc++abi.so.1 (0x00007ffff7f7c000)
        libc++.so.1 => /nix/store/id50601k5gphbwbmp4ibn6agf1ma8kz5-libcxx-16.0.6/lib/libc++.so.1 (0x00007ffff4efc000)
        libgcc_s.so.1 => /nix/store/nzvnm5wsigdi3akjkavhx0rmvc8q8q17-gcc-13.2.0-libgcc/lib/libgcc_s.so.1 (0x00007ffff552d000)
        libc.so.6 => /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/libc.so.6 (0x00007ffff4c31000)
        /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/ld-linux-x86-64.so.2 => /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib64/ld-linux-x86-64.so.2 (0x00007ffff7fca000)
        ...
console
# mac
> otool -L $(which duckdb)
...

Liberating libcxx Pitfalls

Welcome back! I hope you’re feeling refreshed.

If you ever worked with C++ projects no doubt you’ve encountered headaches when linking against libcxx, especially when dealing with different versions or incompatible configurations.

To avoid the most common issues, ensure all libraries are built against the same version of libcxx. We manage this with Nix by building a DuckDB derivation against libcxx and will configure the Zig build system to use pkg-config to link against this same version.

Waddling Over the C++ Bridge

While Zig seamlessly integrates with C, interacting with C++ code requires a bridge due to differences in calling conventions. Fortunately Zig ships with a C & C++ compiler, so we can build a C++ bridge that will expose a C API that Zig can call directly.

Initialize the Zig package

Create a Zig project and scaffold the initial files.

console
> zig init
info: created build.zig
info: created build.zig.zon
info: created src/main.zig
info: created src/root.zig
info: see 'zig build --help' for a menu of options

Create a C++ bridge

Create src/bridge.cpp and add the extension bridge definitions.

cpp
#define DUCKDB_EXTENSION_MAIN

#include "duckdb.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/string_util.hpp"
#include "duckdb/function/scalar_function.hpp"
#include "duckdb/main/extension_util.hpp"
#include "duckdb/parser/parsed_data/create_scalar_function_info.hpp"

namespace duckdb {
inline void QuackScalarFun(DataChunk &args, ExpressionState &state, Vector &result) {
  auto &name_vector = args.data[0];
  UnaryExecutor::Execute(name_vector, result, args.size(), [](string_t name) {
    return StringVector::AddString(result, "Quack " + name.GetString() + " 🐥");
  });
}

static void LoadInternal(DatabaseInstance &instance) {
  auto quack_scalar_function = ScalarFunction(
      "quack", {LogicalType::VARCHAR}, LogicalType::VARCHAR, QuackScalarFun);
  ExtensionUtil::RegisterFunction(instance, quack_scalar_function);
}

class QuackExtension : public Extension {
public:
  void Load(DuckDB &db) override;
  std::string Name() override;
};

void duckdb::QuackExtension::Load(duckdb::DuckDB &db) {
  LoadInternal(*db.instance);
}

std::string duckdb::QuackExtension::Name() { return "quack"; }
} // namespace duckdb

// We will call these extern functions in Zig via the C ABI
// DuckDB requires the version returned from the extension to match the version
// calling it. Here we use the linked version reported by libduckdb.
extern "C" char const *extension_version(void) {
  return duckdb::DuckDB::LibraryVersion();
}

// This function is responsible for bootstrapping the extension into the DuckDB
// internals. The quack extension is trivial and only registers a single scalar
// function.
extern "C" void extension_init(duckdb::DatabaseInstance &db) {
  duckdb::DuckDB db_wrapper(db);
  db_wrapper.LoadExtension();
}

Create src/include/bridge.h

Create an include header with the extern function signatures so Zig can call into them.

c
char const *extension_version(void);
void extension_init(void *);

Tie the bridge from Zig

Open src/root.zig and define symbols matching the extension name. These functions will call the two extern functions defined above.

zig
const std = @import("std");

pub const c_bridge = @cImport({
    @cInclude("bridge.h");
});

export fn quack_version() [*c][*c]u8 {
    return @ptrCast(@alignCast(@constCast(c_bridge.extension_version())));
}

export fn quack_init(db: *anyopaque) void {
    c_bridge.extension_init(db);
}

Zig Build System: Producing Artifacts We Can Love

Thanks for hanging in there folks, this is where things start to get exciting! After all our hard work we’re going to compile our code and produce a native binary.

When we initialized our Zig project you might have noticed it created a build.zig file. Open it in your editor and copy the build configuration below.

zig
pub fn build(b: *std.Build) void {
    const target = b.standardTargetOptions(.{});
    const optimize = b.standardOptimizeOption(.{});

    // ...
    const lib = b.addSharedLibrary(.{
        .name = "quack",
        .root_source_file = .{ .path = "src/root.zig" },
        .target = target,
        .optimize = optimize,
    });
    lib.addIncludePath(.{ .path = "src/include" });
    lib.addCSourceFiles(.{
        .files = &.{
            "src/bridge.cpp",
        },
    });
    lib.linkLibC();
    lib.linkSystemLibrary("c++");
    lib.linkSystemLibrary("duckdb");

    const install_lib = b.addInstallArtifact(
        lib,
        .{ .dest_sub_path = "quack.duckdb_extension" },
    );
    b.getInstallStep().dependOn(&install_lib.step);
}

Build and verify outputs

The source code for the Zig build system explains some of the challenges of linking against libcxx with Zig in more detail.

It’s showtime! Let’s build the extension artifact.

console
> zig build
LLD Link...

Verify loadable library output

console
> ls -l zig-out/lib
-rwxr-xr-x 1 me me 9052040 Mar  3 11:49 quack.duckdb_extension

Validate libcxx linkage

console
> ldd zig-out/lib/quack.duckdb_extension
...
libduckdb.so => /nix/store/3r3qcq4zhw05xri3c78d3jsfs42x45al-duckdb-0.9.2/lib/libduckdb.so (0x00007ffff4c00000)
libc++abi.so.1 => /nix/store/83qw6n5pllv5h6njrchgsd4kxkvmzlqg-libcxxabi-16.0.6/lib/libc++abi.so.1 (0x00007ffff7f7c000)
libc++.so.1 => /nix/store/id50601k5gphbwbmp4ibn6agf1ma8kz5-libcxx-16.0.6/lib/libc++.so.1 (0x00007ffff4efc000)
...
console
> ldd /nix/store/3r3qcq4zhw05xri3c78d3jsfs42x45al-duckdb-0.9.2/lib/libduckdb.so
...
libc++abi.so.1 => /nix/store/83qw6n5pllv5h6njrchgsd4kxkvmzlqg-libcxxabi-16.0.6/lib/libc++abi.so.1 (0x00007ffff7f7c000)
libc++.so.1 => /nix/store/id50601k5gphbwbmp4ibn6agf1ma8kz5-libcxx-16.0.6/lib/libc++.so.1 (0x00007ffff4efc000)
...

Using Our Extension

Because we’re building an out of tree extension we’ll need to tell DuckDB that we’re OK with loading unsigned extensions.

console
> duckdb -unsigned
D LOAD 'zig-out/lib/quack.duckdb_extension';
D SELECT quack('howdy');
┌────────────────┐
│ quack('howdy') │
│    varchar     │
├────────────────┤
│ Quack howdy 🐥 │
└────────────────┘

That’s All Folks

Hopefully you made the duck quack and render the same output! The concepts explored in this post have been wrapped up in a Github template repository, duckdb-extension-template-zig.

If you create an extension with the template drop me a line and I’ll link to it in the README. In future posts we’ll dive deeper into the DuckDB internals such as the catalog system and leveraging Zig’s cross compilation abilities to simplify building and distributing your extensions for multiple architectures.