buildwarden

On Independent Guarantees of Completeness

Completeness is a guarantee necessary to assure that nothing was potentially missed due to a lack of measurement. Attesting the completeness of inputs is not the same as attesting that all inputs are necessarily human-identifiable, safe, or even necessarily used in their entirety, but that all resources have been accounted for in a way that either contains the content itself or a verifiable identifier of the content (e.g. hash). As such, documents providing a guarantee of completeness should ideally also provide tamper-resistance and clearly show the inputs either directly used or which were available and unmeasured. It is also important that the process not attempt to characterize the inputs beyond what could be measured outside the build process’s influences. This is because bundling or vendoring a tool or dependency into an output artifact is common practice for many packages and can be done in such a way that is hard to detect without specific expertise (e.g. static compilation) and so cannot be easily or simply guaranteed to the same degree of confidence that a naive completeness can.

In the current day, a naive guarantee of completeness could be written for all resources simply saying they were open to all influences at time of the ledger’s writing, though the utility of that would be exceedingly low given it means the resources they would describe are open to potentially any vulnerability. However, there are notable ways to still guarantee completeness while narrowing down possibilities. One example that limits external influences is usage of a hermetic build process, which cuts off all access to the network, allowing only resources proscribed in the source which are downloaded independently prior to the build and can be fully tracked. It isn’t the only method of making that guarantee, just the simplest at face-value: it can’t contain more software than what can be created from what was on disk at the start.

However, most developers today don’t locally build or test in hermetic environments. This is because full network isolation requires knowledge on how to store/communicate all of the remote resources necessary in the environment prior to network isolation. This isn’t universally configurable and the standards (e.g. proxy support) are opt-in rather than opt-out behavior, so it’s common for it to be inconsistently followed, even when common. However, if completeness is the goal, than simply measuring everything that goes in during the process is equivalent for consumers of that software and any additional consistency guarantees offered by the author being proscriptive aren’t strictly required for the supply chain itself to be secured.

Note that I am not saying the author needs to guarantee completeness. It need not be the author and, in most cases, it shouldn’t be. Having an independent guarantee of completeness of inputs gives the most confidence when the guarantor is able to be held accountable specifically on their completeness and the security of the platform which generates it, not the security of the artifact itself. This guarantee can come from for-profit CI platforms or non-profit organizations and the same hash-identified artifact can have a completeness document from multiple providers. As an example, if a vulnerability is found in an SSL library and was recently patched, a rebuild of a consuming package which generates a content-identical artifact using the patched version shows the artifact could not have been vulnerable originally and so can still be safely used. This applies to vulnerabilities across the spectrum, including the builder themselves having a compromised platform. It also opens possibilities for standards that can be audited by independent entities and security experts, instead of solely by a self-signed attestation.

What is a Build Ledger?

A build ledger is a document that provides a completeness guarantee for inputs and outputs during a build process as strictly-ordered entries with a chaining signature. Its logical specification is as follows:

The first entry is the Certificate header, containing the public portion of an ephemeral, self-signed certificate pair used to sign all of the entries in the ledger. Its signature is based on the content of the public portion of the certificate and signing by the private portion, so it validates the private portion matches the public portion.
All following entries have a signature including previous entry + their content hash
- Additional metadata can be included as annotations, such as urls for remote resources or directory/file for local resources, but it is not strictly required nor signed and only helps during later analysis steps. This metadata can also be potentially stripped out or modified for internal resources that need to be anonymized but identifiable
All requests are considered asynchronous and bidirectional, with explicit opening and closing entries, with transfers in each direction and the close of the entry referencing the opening signature. For any entity in the document, it is considered to have a complete ledger at the entry where it and all entities opened before it have closed. This handles the potential surface area around partial usage of a streamed resource before it is able to be fully logged in the ledger.
- Note that while complete, if an input is not identifiable at later stages, it is essentially a known unknown which in general will be a yellow/red flag that not all inputs are able to be properly accounted for
There is no specific entry marking the end of a ledger. This is intentional as it allows for truncation of ledgers when additional verification, testing, and documentation generation are done before considering the process successful.

BuildWarden

BuildWarden is intended to be a reference implementation to generate build ledgers through a simple ContainerFile as the declaration of starting environment through a parent container and following instructions with fully tracked http/https inputs and outputs. All exported artifacts and metadata are simply done as POST/PUT requests to endpoints without a valid top-level domain, e.g. https://artifacts or e.g. https://file-inputs

Authoritative restriction of HTTP/HTTPS network resources retrieved by content hash with URL metadata, logged headers and content, and with all other network interfaces blocked
Filesystem resources available are accounted for by concrete container hash, which is fully resolved
- This could be improved with content-hash+path for every file read/written, but due to added complexity around ensure completeness, may not be part of initial implementation

This site is open source. Improve this page.