Downloading Certs 2.0 Data
The Censys certificate repository is the largest in the world, and growing every day.
Although a certificate’s contents are immutable once issued, accompanying data—such as which certificate transparency logs it has been submitted to, whether it has been revoked, or whether it has ever been seen in a Censys scan of the Internet—can change.
Censys provides these changes to certificate records and new records representing newly seen certificates in a daily download.
Each day, an incremental dataset with just that day’s certificate record changes and new record additions is available to download. It contains new certificate records and diffs of existing certificate record’s metadata.
How Downloads Work
Customers who wish to download certificate data for unlimited querying and use in custom workflows will complete a one-time download of the full certificate repository, followed by a daily download containing the day’s changes to be applied to the local copy of the full dataset.
A full snapshot of all of the certificates in the repository is available on the 1st of every month.
-
certificates-v2-full
Incremental downloads are available every day, including on the 1st of every month. In the event that a client runs fewer than once per day, it’s important to apply all changes in order from every incremental update.
-
certificates-v2-incremental
Resource Preparation for the Dataset
The certificate dataset is large, both in terms of the overall storage space necessary to accommodate its daily growth, and the client requirements for downloading the incremental changes published daily.
Series Sizes
The certificates dataset is continually growing.
Since the beginning of the year 2023, the number of new certificates added by Censys to the repository each month is about 500,000,000.
As of summer 2023, the size of the certificates-v2-full
dataset is about ~12TB.
As of the summer 2023, the size of the certificates-v2-incremental
dataset is about 30-60GB.
Data Formatting
The files containing the Certificates 2.0 datasets are serialized in Avro binary, which has the data schema stored within it.
To get started with Avro, visit the site documentation.
How to Download
Follow this step-by-step link::[guide].
Comments
0 comments
Please sign in to leave a comment.