Once you've been given access to the censys-io.ipv4_public dataset use this article to learn how to add the dataset to your Google BigQuery instance.

Contact Sales to inquire about purchasing a Censys Enterprise data license for commercial access. Read Research Access to Censys Data for more information about how to request non-commercial access to this dataset.

New to BigQuery?

Read BigQuery Introduction for an overview of Google BigQuery and how it can be used to query and export Censys data.

Add the Censys IPv4 Dataset to your BigQuery account

Read Adding Censys Datasets to BigQuery and verify that you see the censys-io project and ipv4_public dataset in your Google BigQuery web interface.

If your BigQuery interface looks similar to the screenshot you are ready to query the ipv4_public dataset.

Querying the Censys IPv4 Datasets

Standard SQL Only

Before writing your first query, note that you'll need to use Standard SQL to query the Censys datasets. Unfortunately, Google BigQuery defaults to Legacy SQL and the option to change your query to Standard SQL is hidden. Once you are in the Compose Query screen click Show Options and uncheck Use Legacy SQL. 

Another option is to include the following command at the top of your SQL statement in the web interface as shown below:

#standardsql
<SQL GOES HERE>

For more information on constructing SQL statements, check out the BigQuery SQL Reference

Get a row count of the latest IPv4 scan

We'll start with a simple one. This query will return the total number of rows in the latest IPv4 scan published by Censys.

#standardsql
select count(*) from `censys-io.ipv4_public.current`

This should return a single integer in the hundreds of millions count. 

You can also query a specific scan by replacing current with a date in YYYYMMDD format.

#standardsql
select count(*) from `censys-io.ipv4_public.20171231`

The number of known IPv4 hosts known to Censys on December 31, 2017 was 160,686,016.

Get a list of all hosts found on a specific IP subnet

Here's a simple query that will list every IP address on a /24 allocation that Censys discovered in it's most recent scan.

#standardsql
select ip from `censys-io.ipv4_public.current`
where ip like '141.211.243.%'

This should return 100+ rows of hosts found in 141.211.243.0/24.

 A note on BigQuery processing costs

Grabbing a single column filtered by a single column is a good way to start and keep your BigQuery processing costs low. Note how many GB of data was processed by each query after it completes running. 

Query complete (3.4s elapsed, 2.28 GB processed)

Review BigQuery Pricing for details about how much Google charges. As of January 2, 2018 Google charges $5 per TB of data processed. The first 1 TB of data processed per month is free.

Get list of ports found on each host on a specific IP subnet

Add the following columns to the previous query to get more detail about what was found on each host in 141.211.243.0/24.

#standardsql
select ip, ports from `censys-io.ipv4_public.current`
where ip like '141.211.243.%'

This will return an array port numbers for each IP address in that subnet.

Get HTTP server strings

For every IP address where the HTTP server was found on port 80 return the string that Censys discovered.

#standardsql
select ip, ports, p80.http.get.headers.server from `censys-io.ipv4_public.current`
where ip like '141.211.243.%' and p80.http.get.headers.server is not null

Data Schema

Click on the YYYYMMDD table under the ipv4_public dataset to view the Schema for that dataset. View the Schema for December 31, 2017.

The name of each column describes its use. BigQuery also shows the type of each column and whether or not it is nullable.

See Data Definitions > IPv4 Hosts on the Censys Overview page for a searchable data schema.

Did this answer your question?