Data Collection
The certificates collection is composed of all unique certificates that we've seen in any of our scanning (of IPv4 or Alexa) and through synchronizing with known public certificate transparency servers. When scanning, we collect certificates seen in any TLS handshake (including non-HTTPS protocols like SMTP+STARTTLS).
X.509 Parsing
We parse certificates using ZCrypto, an open source Go-based cryptographic library.
Browser Validation
We validate certificates against the published root stores for Mozilla NSS, Microsoft, and Apple. To accommodate "trans-valid" certificates (where an intermediate certificate needed for validation is missing), we maintain a known set of intermediates for each root store, which we use to help during validation. Validation is performed using ZCrypto.
Note: We do not currently process browser-blacklisted certificates (e.g., certificates in OneCRL).
Certificate Linting
We lint (i.e., check for construction issues) certificates using the ZLint library.
Usage
Information
BigQuery dataset: censys-io.certifcates_public
The full Certificates dataset schema can be explored here [more information]
Example
Calculate the breakdown of cryptographic keys for certificates:
SELECT COUNT(*), parsed.subject_key_info.key_algorithm.name
FROM `censys-io.certificates_public.certificates`
GROUP BY parsed.subject_key_info.key_algorithm.name