We maintain a single BigQuery table of all known X.509 certificates. This article describes the dataset.
The certificates collection is composed of all unique certificates that we've seen in any of our scanning (of IPv4 or Alexa) and through synchronizing with known public certificate transparency servers. When scanning, we collect certificates seen in any TLS handshake (including non-HTTPS protocols like SMTP+STARTTLS).
We parse certificates using ZCrypto, an open source Go-based cryptographic library.
We validate certificates against the published root stores for Mozilla NSS, Microsoft, and Apple. To accommodate "trans-valid" certificates (where an intermediate certificate needed for validation is missing), we maintain a known set of intermediates for each root store, which we use to help during validation. Validation is performed using ZCrypto.
Note: We do not currently process browser-blacklisted certificates (e.g., certificates in OneCRL.
We lint (i.e., check for construction issues) certificates using the ZLint library.
The certificates table is composed of several top-level objects:
- raw. Raw unparsed certificate
parsed. Parsed out fields in the X.509 certificate (e.g.,
validation. Browser validation information for each root store (e.g.,
- ct. Data on which certificate transparency servers contain the certificate
- zlint. Linting data on the certificate from ZLint
audit. CCADB information about the certificate (e.g.,
metadata. Metadata about the certificate's inclusion in Censys (e.g.,
- parents. List of SHA-256 fingerprints of certificates that are parents