Certificates BigQuery Dataset
The Censys certificates collection is composed of all unique certificates seen in any historical scanning of hosts and through synchronizing with known public certificate transparency servers.
A BigQuery account is required to add these Censys resources.
This article shows you how to:
Query effectively over the most complete and up-to-date data Censys has about certificates.
View the data set’s schema to effectively write queries.
If you need an introduction to the data in a Censys certificate record, this intro to Censys certificates is a 5-minute read.
The Two Datasets
You can add 2 certificate datasets.
censys-io.certificates_v2.certificates: Contains all final certificates and any pre-certificates whose final certificates are not known to Censys.
censys-io.certificates_v2.certificates_all: Contains all final certificates and all pre-certificates known to Censys.
Understanding the Datasets
certificates_v2 datasets are indexed on a certificate’s SHA-256 fingerprint.
Although the contents of a certificate are immutable, accompanying data—such as which certificate transparency logs it’s been submitted to, whether it has been revoked, or whether it has ever been seen in a Censys scan of the Internet—can change.
Because the certificate dataset is append-only, any change to a certificate record results in a new row.
Changes to the dataset appear throughout the day.
Use Partitions and Clusters
One way to save cost is to query over certain partitions if your interest is in certificates that expired within a certain year. The dataset is partitioned by expiration date, in a top-level field called
not_valid_after with granularity of a year.
Within a partition, certificates are clustered by the date they were added or updated, which is present in a top-level field called
inserted_at. Clusters can also be specified for cost savings if you are interested in new or newly updated certificates.
The certificates dataset is wide, with about 400 columns. Leverage the
SELECT function in SQL statements to reduce the amount of data processed.
View the Certificate Schema
You can view the certificate schema directly in the BigQuery interface.