How to Download Certs 2.0 Dataset
Follow this article to start downloading and searching the new and improved certificate data available from Censys!
You need to download the full snapshot 1 time.
After that, you download the incremental dataset each day and apply its changes to your copy of the dataset.
Use the Search URL:
-
Base URL:
https://search.censys.io
.
With this API path:
-
Path:
/api/v1/data/
And this Series Endpoint:
-
Series name:
certificates-v2-full
Example 200 Response
{ "id": "certificates-v2-full", "name": "Full Set of X.509 Certificates", "description": "Parsed X.509 certificates featuring all certificates known to Censys. Schema version 2.", "results": { "latest": { "id": "2023-03-01T12:50:16.804634Z", "timestamp": "20230301T125017", "details_url": "https://search.censys.io/api/v1/data/certificates-v2-full/2023-03-01T12:50:16.804634Z" }, "historical": [ { "id": "2023-03-01T12:50:16.804634Z", "timestamp": "20230301T125017", "details_url": "https://search.censys.io/api/v1/data/certificates-v2-full/2023-03-01T12:50:16.804634Z" } ] } }
Then, follow up with a GET
request to the details_url
to see the list of files comprising the result.
GET \https://search.censys.io/api/v1/data/certificates-v2-full/2023-03-01T12:50:16.804634Z
Example 200 Response(Truncated to a single file for display)
{ "series": { "id": "certificates-v2-full", "name": "Full Set of X.509 Certificates" }, "id": "2023-03-01T12:50:16.804634Z", "timestamp": "20230301T125017", "task_id": null, "metadata": null, "total_size": 12336264834346, "files": { "certificates-000000000000.avro": { "compressed_size": 73423483, "download_path": "https://file-host-02.censys.io/snap shots/certificates-v2-full/2023-03-01T12:50:16.804634Z/certificates-000000000000.avro", "compressed_md5_fingerprint":"c399b93f9cb1e6c5b697955b718c96e", "file_type": null, "compression_type": null } } }
Finally, download each file by issuing a GET
request to each download_path
.
The incremental dataset is not just new certificate records. Censys global scanning engine now regularly re-validates trust and revocation information of unexpired certificates to update relevant values in the structured data and labels.
Using the same URL and endpoint, request the new incremental dataset series:
-
Series name:
certificates-v2-incremental
Retrieve the ID of the latest result or the ID of historical datasets if you need to apply changes from more than one.
Only incremental datasets with a timestamp that is after the full dataset you’ve downloaded contain updates that need to be applied.
GET \https://search.censys.io/api/v1/data/certificates-v2-incremental
Example 200 Response
{ "id": "certificates-v2-incremental", "name": "Incremental Updates to X.509 Certificates", "description": "Parsed X.509 certificates as incremental updates to the last full series snapshot. Schema version 2.", "results": { "latest": { "id": "2023-03-07T12:50:11.773781Z", "timestamp": "20230307T125012", "details_url": "https://search.censys.io/api/v1/data/certificates-v2-incremental/2023-03-07T12:50:11.773781Z" }, "historical": [], } }
Then, follow up with a GET
request to the details_url
of the result you need to see the list of files comprising the result.
GET \https://search.censys.io/api/v1/data/certificates-v2-incremental/2023-03-07T12:50:11.773781Z
Example 200 Response(Truncated to a single file for display)
{ "series": { "id": "certificates-v2-incremental", "name": "Incremental Updates to X.509 Certificates" }, "id": "2023-03-07T12:50:11.773781Z", "timestamp": "20230307T125012", "task_id": null, "metadata": null, "total_size": 24252152323, "files": { "certificates-000000000000.avro": { "compressed_size": 34138, "download_path": "https://file-host-02.censys.io/snapshots/certificates-v2-incremental/2023-03-07T12:50:11.773781Z/certificates-000000000000.avro", "compressed_md5_fingerprint": "2f69439ebada1bc20bc6391a2ffa484f", "file_type": null, "compression_type": null }, ... } }
Finally, download each file by issuing a GET
request to each download_path
.
The files containing the Certificates 2.0 datasets are serialized in Avro binary, which has the data schema stored within it.
Thanks to the compression features of Avro format, the full Censys global scanning engine dataset is now about ~12TB of data (compared to 26TB when the dataset was encoded in JSON), but always be sure your client can accommodate these storage needs.
To get started with Avro, visit the site documentation.
Comments
0 comments
Please sign in to leave a comment.