The certificates collection is composed of all unique certificates that we've seen in any of our scanning (of IPv4 or Alexa) and through synchronizing with known public certificate transparency servers. When scanning, we collect certificates seen in any TLS handshake (including non-HTTPS protocols like SMTP+STARTTLS).
We parse certificates using ZCrypto, an open source Go-based cryptographic library.
We validate certificates against the published root stores for Mozilla NSS, Microsoft, and Apple. To accommodate "trans-valid" certificates (where an intermediate certificate needed for validation is missing), we maintain a known set of intermediates for each root store, which we use to help during validation. Validation is performed using ZCrypto.
Note: We do not currently process browser-blacklisted certificates (e.g., certificates in OneCRL).
We lint (i.e., check for construction issues) certificates using the ZLint library.
The full Certificates dataset schema can be explored here [more information]
Calculate the breakdown of cryptographic keys for certificates:
SELECT COUNT(*), parsed.subject_key_info.key_algorithm.name
GROUP BY parsed.subject_key_info.key_algorithm.name