Certificates BigQuery Dataset
The certificates collection is composed of all unique certificates that we've seen in any of our scanning (of hosts with an IPv4 or IPv6 address) and through synchronizing with known public certificate transparency servers. When scanning, we collect certificates seen in any TLS handshake (including non-HTTPS protocols like SMTP+STARTTLS).
We parse certificates using ZCrypto, an open source Go-based cryptographic library.
We validate certificates against the published root stores for Mozilla NSS, Microsoft, and Apple. To accommodate "trans-valid" certificates (where an intermediate certificate needed for validation is missing), we maintain a known set of intermediates for each root store, which we use to help during validation. Validation is performed using ZCrypto.
We lint (i.e., check for construction issues) certificates using the ZLint library.
The full Certificates dataset schema can be explored here.
Calculate the breakdown of cryptographic keys for certificates:
SELECT COUNT(*), parsed.subject_key_info.key_algorithm.name
GROUP BY parsed.subject_key_info.key_algorithm.name