Downloading Raw Files
Enterprise customers can download host data from the Censys API.
About the Files
The files containing the Universal Internet Dataset data are serialized in Avro binary, which has the data schema stored within it.
Even with the compression features of Avro format, each snapshot of the Censys UIDS dataset is contains thousands of serialized files amounting to about 2 terabytes of data.
To get started with Avro, visit the site documentation.
How to Download
Download a snapshot by getting the list of files and making follow-up calls to the file paths.
Get the ID of the Snapshot
The IDs of the snapshots of the Universal Internet Dataset reflect the date taken. For example, a snapshot with an ID of 20210920
was taken on Sept. 20, 2021.
First, request the universal-internet-dataset
endpoint to retrieve the ID of the latest result or the ID of historical datasets.
GET \https://search.censys.io/api/v1/data/universal-internet-dataset
Example 200 Response
{ "id": "universal-internet-dataset", "name": "Universal Internet DataSet", "description": "Deep Scans of more than 2,000 popular ports featuring Automatic Protocol Detection across all hosts in the IPv4 address space.", "results": { "latest": { "id": "20210721", "timestamp": "20210722T070504", "details_url": "https://search.censys.io/api/v1/data/universal-internet-dataset/20210721" }, "historical": [] } }
Get the List of Files in the Snapshot
Follow up with a GET
request to the details_url
to see the list of files composing the result.
Example 200 Response (Truncated to a single file for display)
{ "series": { "id": "universal-internet-dataset", "name": "Universal Internet DataSet" }, "id": "20210721", "timestamp": "20210722T070504", "task_id": null, "metadata": null, "total_size": 615197866948, "files": { "universal-internet-dataset-20210721-000000000001.avro": { "compressed_size": 732481, "download_path": "https://file-host-0.censys.io/snapshots/observations/20210721/universal-internet-dataset-20210721-000000000427.avro", "compressed_md5_fingerprint": "934fef4b131b92d144306df58c04ee02", "file_type": null, "compression_type": null }, ... } }
Download Each File
Send GET
requests to each URL in the download_path
field of each file listed in the endpoint above:
GET https://file-host-0.censys.io/snapshots/observations/20210919/universal-internet-dataset-20210919-000000000428.avro
With your snapshot downloaded, you are ready to begin querying the data! Need an introduction to the data model? Or a list of every field in the schema?
Comments
0 comments
Article is closed for comments.