Once you've been given access to the
censys-io.ipv4_public dataset use this article to learn how to add the dataset to your Google BigQuery instance.
Contact Sales to inquire about purchasing a Censys Enterprise data license for commercial access. Read Research Access to Censys Data for more information about how to request non-commercial access to this dataset.
New to BigQuery?
Read BigQuery Introduction for an overview of Google BigQuery and how it can be used to query and export Censys data.
Add the Censys IPv4 Dataset to your BigQuery account
If your BigQuery interface looks similar to the screenshot you are ready to query the ipv4_public dataset.
Querying the Censys IPv4 Datasets
Standard SQL Only
Before writing your first query, note that you'll need to use Standard SQL to query the Censys datasets. Unfortunately, Google BigQuery defaults to Legacy SQL and the option to change your query to Standard SQL is hidden. Once you are in the Compose Query screen click Show Options and uncheck Use Legacy SQL.
Another option is to include the following command at the top of your SQL statement in the web interface as shown below:
<SQL GOES HERE>
For more information on constructing SQL statements, check out the BigQuery SQL Reference.
Get a row count of the latest IPv4 scan
We'll start with a simple one. This query will return the total number of rows in the latest IPv4 scan published by Censys.
select count(*) from `censys-io.ipv4_public.current`
This should return a single integer in the hundreds of millions count.
You can also query a specific scan by replacing current with a date in YYYYMMDD format.
select count(*) from `censys-io.ipv4_public.20171231`
The number of known IPv4 hosts known to Censys on December 31, 2017 was 160,686,016.
Get a list of all hosts found on a specific IP subnet
Here's a simple query that will list every IP address on a /24 allocation that Censys discovered in it's most recent scan.
select ip from `censys-io.ipv4_public.current`
where ip like '141.211.243.%'
This should return 100+ rows of hosts found in 220.127.116.11/24.
A note on BigQuery processing costs
Grabbing a single column filtered by a single column is a good way to start and keep your BigQuery processing costs low. Note how many GB of data was processed by each query after it completes running.
Query complete (3.4s elapsed, 2.28 GB processed)
Review BigQuery Pricing for details about how much Google charges. As of January 2, 2018 Google charges $5 per TB of data processed. The first 1 TB of data processed per month is free.
Get list of ports found on each host on a specific IP subnet
Add the following columns to the previous query to get more detail about what was found on each host in 18.104.22.168/24.
select ip, ports from `censys-io.ipv4_public.current`
where ip like '141.211.243.%'
This will return an array port numbers for each IP address in that subnet.
Get HTTP server strings
For every IP address where the HTTP server was found on port 80 return the string that Censys discovered.
select ip, ports, p80.http.get.headers.server from `censys-io.ipv4_public.current`
where ip like '141.211.243.%' and p80.http.get.headers.server is not null
Click on the YYYYMMDD table under the ipv4_public dataset to view the Schema for that dataset. View the Schema for December 31, 2017.
The name of each column describes its use. BigQuery also shows the type of each column and whether or not it is nullable.
See Data Definitions > IPv4 Hosts on the Censys Overview page for a searchable data schema.