UIDS Download Release & Timeline FAQ
Answers to commonly asked questions about downloading the Censys Universal Internet Dataset.
What is the Universal Internet Dataset?
The Universal Internet Dataset from Censys is the most comprehensive Internet-wide data collected from scan in the industry.
This dataset is composed of four downloadable data sets: IPv4 hosts, IPv6 hosts, IPv4 virtual (i.e., name-based) hosts, and IPv6 virtual hosts.
What is changing?
The existing four downloadable series that make up UIDS are being replaced.
They contain the same information, but are now encoded using Google’s BigQuery Avro Export Format.
The schema in each series is unchanged from its deprecated counterpart, although one data encoding has been improved from the previous datasets, and default values are more compliant with strict decoders.
IPv4 Hosts (Unnamed)
Old Series Name: universal-internet-dataset
New Series Name: universal-internet-dataset-v2-ipv4
IPv6 hosts (Unnamed)
Old Series Name: universal-internet-dataset-ipv6
New Series Name: universal-internet-dataset-v2-ipv6
IPv4 Virtual Hosts (Name-Based Scans)
Old Series Name: universal-internet-dataset-named-ipv4
New Series Name: universal-internet-dataset-v2-ipv4-virtual-hosts
IPv6 Virtual Hosts (Name-Based Scans)
Old Series Name: universal-internet-dataset-named-ipv6
New Series Name: universal-internet-dataset-v2-ipv6-virtual-hosts
How has the schema changed?
It hasn’t.
The encoding of timestamp fields has been changed from a string to a long, annotated as a timestamp-micros
logical type.
Other encodings remain the same but are now entirely compliant in setting safe default values (e.g., Previously, 0
was set as the default value for latitude, but strict decoders want a default of 0.0
).
When will the change happen?
The v2 Universal Internet Dataset series became available as of January 2023, and we plan to discontinue publishing legacy datasets on May 31, 2023.
How will the change be rolled out?
Both the new and old datasets are available at the same time beginning January 2023 for a limited time. After existing enterprise customers have switched to using new dataset, we will discontinue publishing the old datasets for consumption via download. We will continue to provide historical downloadable files for the v1 UIDS series for which there is no v2. New files will not be published after the switchover.
Comments
0 comments
Article is closed for comments.