Data Collection
The IPv4 collection is composed of data about the services (e.g., HTTP, SMTP, MySQL) running on all publicly-accessible IPv4 hosts. We collect this data through regularly scheduled scans of the IPv4 address space, which we aggregate into host-based records and label with additional metadata (e.g., geographic location, network topology, operating system, and device type).
Usage
Information
BigQuery dataset: censys-io.ipv4_public
The full IPv4 dataset schema can be explored here [more information]
The list of TCP ports that we perform a deep scan on can be explored here [more information]
Examples
Get a row count of the latest IPv4 scan
We'll start with a simple one. This query will return the total number of rows in the latest IPv4 scan published by Censys.
SELECT count(*)
FROM `censys-io.ipv4_public.current`
This will return a single integer with a count in the hundreds of millions.
You can also query a specific scan by replacing current with a date in YYYYMMDD format.
SELECT count(*)
FROM `censys-io.ipv4_public.20171226`
The number of IPv4 hosts known to Censys on December 26, 2017 was 161,806,247.
Get a list of all hosts found on a specific IP subnet
Here's a simple query that will list every IP address on a /24 allocation that Censys discovered in its most recent scan.
SELECT ip
FROM `censys-io.ipv4_public.current`
WHERE ip LIKE '141.211.243.%'
This will return all of the hosts found in 141.211.243.0/24.
Get list of ports found on each host on a specific IP subnet
Add the ports
column to the previous query to get more detail about what was found on each host in 141.211.243.0/24.
SELECT ip, ports
FROM `censys-io.ipv4_public.current`
WHERE ip LIKE '141.211.243.%'
This will return an array of port numbers for each IP address in that subnet.
Get HTTP server strings
Add the p80.http.get.headers.server
column to the previous query to return the HTTP server string when the value is not null.
SELECT ip, ports, p80.http.get.headers.server
FROM `censys-io.ipv4_public.current`
WHERE ip LIKE '141.211.243.%' AND p80.http.get.headers.server IS NOT NULL
A note on BigQuery processing costs
Review BigQuery Pricing for details about how much Google charges. As of January 1, 2020 Google charges $5 per TB of data processed. The first 1 TB of data processed per month is free.
Selecting only the columns that you are interested in is a good way to keep your BigQuery processing costs down. The query validator will show you how much data will be processed before you execute your query.