How to Download Censys Universal Internet Dataset
Enterprise customers can download host data using the Censys API.
The Censys Universal Internet Dataset contains hosts and virtual hosts in the IPv4 and IPv6 address spaces as observed in scan and enriched with third party data.
Four series are included in the Censys Universal Internet Dataset:
-
IPv4 hosts: New snapshot available every day.
-
IPv6 hosts: New snapshot available every day.
-
IPv4 virtual hosts: New snapshot available every Tuesday.
-
IPv6 virtual hosts: New snapshot available every Tuesday.
The files containing the Censys Universal Internet Dataset data are serialized in Avro binary using Google BigQuery Avro Export Format.
Even with the compression features of Avro, the snapshots in each series representing the Censys Universal Internet Dataset contains thousands of serialized files amounting to about 2 terabytes of data.
To get started with Avro, visit the site documentation.
Download a snapshot by getting the list of files for a series snapshot and making follow-up calls to the file paths.
The 4 available series are:
-
IPv4 hosts:
universal-internet-dataset-v2-ipv4
-
IPv6 hosts:
universal-internet-dataset-v2-ipv6
-
IPv4 virtual hosts:
universal-internet-dataset-v2-ipv4-virtual-hosts
-
IPv6 virtual hosts:
universal-internet-dataset-v2-ipv6-virtual-hosts
The IDs of the snapshots of each Censys Universal Internet Dataset series reflect the date taken. For example, a snapshot with an ID of 20231107
was taken on Nov. 7, 2023.
First, request the series endpoint to retrieve the ID of the latest result or the ID of historical datasets.
curl -g -X 'GET' \ 'https://search.censys.io/api/v1/data/universal-internet-dataset-v2-ipv4\ -H 'Accept: application/json' \ --user "$CENSYS_API_ID:$CENSYS_API_SECRET"
Example 200 Response
{ "id": "universal-internet-dataset-v2-ipv4", "name": "Universal Internet DataSet of IPv4 Hosts", "description": "Deep Scans of more than 3,500 popular ports featuring Automatic Protocol Detection across hosts in the IPv4 address space. Schema version 2.", "results": { "latest": { "id": "20231107", "timestamp": "20231107T000000", "details_url": "https://search.censys.io/api/v1/data/universal-internet-dataset-v2-ipv4/20230416" }, "historical": [...] } }
Follow up with a GET
request to the details_url
in the snapshot you want to download to see the list of files comprising the result.
Example 200 Response (Truncated to a single file for display)
{ "series": { "id": "universal-internet-dataset-v2-ipv4", "name": "Universal Internet DataSet of IPv4 Hosts" }, "id": "20231107", "timestamp": "202301107T000000", "task_id": null, "metadata": null, "total_size": 704279348958, "files": { "ipv4-000000000000.avro": { "compressed_size": 71374181, "download_path": "https://file-host-02.censys.io/snapshots/universal-internet-dataset-v2-ipv4/20230416/ipv4-000000000000.avro", "compressed_md5_fingerprint": "a65be1938e1be56132ff48ac460384d9", "file_type": null, "compression_type": null } } }
Send GET
requests to each URL in the download_path
field of each file listed in the endpoint above:
GET https://file-host-02.censys.io/snapshots/universal-internet-dataset-v2-ipv4/20230416/ipv4-000000000000.avro
With your snapshot downloaded, you're ready to begin querying the data! Need an introduction to the host Data Model for the Censys Universal Internet Dataset?
Comments
0 comments
Article is closed for comments.