We maintain a single BigQuery table of all known X.509 certificates. This article describes the dataset.

Data Collection

The certificates collection is composed of all unique certificates that we've seen in any of our scanning (of IPv4 or Alexa) and through synchronizing with known public certificate transparency servers. When scanning, we collect certificates seen in any TLS handshake (including non-HTTPS protocols like SMTP+STARTTLS). 

X.509 Parsing

We parse certificates using ZCrypto, an open source Go-based cryptographic library.

Browser Validation

We validate certificates against the published root stores for Mozilla NSS, Microsoft, and Apple. To accommodate "trans-valid" certificates (where an intermediate certificate needed for validation is missing), we maintain a known set of intermediates for each root store, which we use to help during validation. Validation is performed using ZCrypto.

Note: We do not currently process browser-blacklisted certificates (e.g., certificates in OneCRL.  

Certificate Linting

We lint (i.e., check for construction issues) certificates using the ZLint library.

Table Structure

The certificates table is composed of several top-level objects:

  1. raw. Raw unparsed certificate
  2. parsed. Parsed out fields in the X.509 certificate (e.g., subject.common_name  and extensions.basic_constraints.is_ca )
  3. validation. Browser validation information for each root store (e.g., valiation.apple.valid )
  4. ct. Data on which certificate transparency servers contain the certificate
  5. zlint. Linting data on the certificate from ZLint
  6. audit. CCADB information about the certificate (e.g., ccadb.certification_practice_statement)
  7. metadata. Metadata about the certificate's inclusion in Censys (e.g., updated_at  and parse_version)
  8. parents. List of SHA-256 fingerprints of certificates that are parents
Did this answer your question?