How to Download UIDS Dataset
Enterprise customers can download host data using the Censys API.
About the Available Series
Censys' Universal Internet DataSet contains hosts and virtual hosts in the IPv4 and IPv6 address spaces as observed in scan and enriched with third party data.
There are four series comprising UIDS that can be downloaded:
-
IPv4 hosts - New snapshot available every day
-
IPv6 hosts - New snapshot available every day
-
IPv4 virtual hosts - New snapshot available every Tuesday
-
IPv6 virtual hosts - New snapshot available every Tuesday
About the Files
The files containing the Universal Internet Dataset data are serialized in Avro binary using Google BigQuery Avro Export Format.
Even with the compression features of Avro, the snapshots in each series representing the Censys UIDS dataset contains thousands of serialized files amounting to about 2 terabytes of data.
To get started with Avro, visit the site documentation.
How to Download
Download a snapshot by getting the list of files for a series snapshot and making follow-up calls to the file paths.
The four available series are:
-
IPv4 hosts:
universal-internet-dataset-v2-ipv4
-
IPv6 hosts:
universal-internet-dataset-v2-ipv6
-
IPv4 virtual hosts:
universal-internet-dataset-v2-ipv4-virtual-hosts
-
IPv6 virtual hosts:
universal-internet-dataset-v2-ipv6-virtual-hosts
Get the ID of the Snapshot
The IDs of the snapshots of each Universal Internet Dataset series reflect the date taken.
For example, a snapshot with an ID of 20230416
was begun on Apr. 16, 2023.
First, request the series endpoint to retrieve the ID of the latest result or the ID of historical datasets.
curl -g -X 'GET' \ 'https://search.censys.io/api/v1/data/universal-internet-dataset-v2-ipv4\ -H 'Accept: application/json' \ --user "$CENSYS_API_ID:$CENSYS_API_SECRET"
Example 200 Response
{ "id": "universal-internet-dataset-v2-ipv4", "name": "Universal Internet DataSet of IPv4 Hosts", "description": "Deep Scans of more than 3,500 popular ports featuring Automatic Protocol Detection across hosts in the IPv4 address space. Schema version 2.", "results": { "latest": { "id": "20230416", "timestamp": "20230416T000000", "details_url": "https://search.censys.io/api/v1/data/universal-internet-dataset-v2-ipv4/20230416" }, "historical": [...] } }
Get the List of Files in the Snapshot
Follow up with a GET
request to the details_url
in the snapshot you wish to download to see the list of files comprising the result.
Example 200 Response (Truncated to a single file for display)
{ "series": { "id": "universal-internet-dataset-v2-ipv4", "name": "Universal Internet DataSet of IPv4 Hosts" }, "id": "20230416", "timestamp": "20230416T000000", "task_id": null, "metadata": null, "total_size": 704279348958, "files": { "ipv4-000000000000.avro": { "compressed_size": 71374181, "download_path": "https://file-host-02.censys.io/snapshots/universal-internet-dataset-v2-ipv4/20230416/ipv4-000000000000.avro", "compressed_md5_fingerprint": "a65be1938e1be56132ff48ac460384d9", "file_type": null, "compression_type": null } } }
Download Each File
Send GET
requests to each URL in the download_path
field of each file listed in the endpoint above:
GET https://file-host-02.censys.io/snapshots/universal-internet-dataset-v2-ipv4/20230416/ipv4-000000000000.avro
With your snapshot downloaded, you are ready to begin querying the data! Need an introduction to the host data model? Or a list of every field in the schema?
Comments
0 comments
Article is closed for comments.