Certificates BigQuery Dataset
The Censys certificates collection is composed of all unique certificates seen in any historical scanning of hosts and through synchronizing with known public certificate transparency servers.
A BigQuery account is required to add these Censys resources.
This guide will show you how to:
-
Query effectively over the most complete and up-to-date data Censys has about certificates.
-
View the data set’s schema to effectively write queries.
If you need an introduction to the data in a Censys certificate record, this intro to Censys certificates is a 5-minute read.
The Two Datasets
There are two certificate datasets that can be added.
-
censys-io.certificates_v2.certificates
- Contains all final certificates and any pre-certificates whose final certificates are not known to Censys. -
censys-io.certificates_v2.certificates_all
- Contains all final certificates and all pre-certificates known to Censys.
Understanding the Datasets
The certificates_v2
datasets are indexed on a certificate’s SHA-256 fingerprint.
Although the contents of a certificate are immutable, accompanying data—such as which certificate transparency logs it’s been submitted to, whether it has been revoked, or whether it has ever been seen in a Censys scan of the Internet—can change.
Because the certificate dataset is append-only, any change to a certificate record results in a new row.
Changes to the dataset appear throughout the day.
Controlling Cost
Use Partitions and Clusters
One way to save cost is to query over certain partitions if your interest is in certificates that expired within a certain year. The dataset is partitioned by expiration date, in a top-level field called not_valid_after
with granularity of a year.
Within a partition, certificates are clustered by the date they were added or updated, which is present in a top-level field called inserted_at
. Clusters can also be specified for cost savings if you are interested in new or newly updated certificates.
Specify Columns
The certificates dataset is wide, with about 400 columns. Leverage the SELECT
function in SQL statements to reduce the amount of data processed.
View the Certificate Schema
The certificate schema can be viewed directly in the BigQuery interface.
Comments
0 comments
Article is closed for comments.