We allow users to perform complex queries against Censys data by providing access to a collection of BigQuery tables. Google BigQuery is Google's public data warehouse services that allows developers to interact with arbitrarily large datasets in real time using SQL and Javascript.

You can access Censys datasets in BigQuery through a web UI, command-line tool, and REST API as well as through Dataproc (Google's Spark and Hadoop offering). There are also a variety of third-party tools—like Looker and Tableau—that you can use to interact with Censys BigQuery data.

Censys Datasets

We expose three datasets through BigQuery:

  1. Certificates. We have a single certificates table (certificates_public.certificates) that contains all unique X.509 certificates we've found during any of our scans and by synchronizing with publicly known CT servers. [more information]
  2. IPv4.  We provide a daily snapshots of the current state of known IPv4 hosts and how public protocols are configured [more information]
  3. Popular Websites. We provide a daily snapshot of the current state of the Alexa Top Million Domains collection.

Writing Queries

BigQuery's SQL dialect is compliant with the SQL 2011 standard and you can query tables once you have added the Censys datasets [more information]. 

Here are a couple of simple examples:

Find IPv4 hosts with a specific HTTP server:

#standardsql
SELECT ip, autonomous_system.description
FROM ipv4_public.current
WHERE p80.http.headers.server LIKE '%Apache%';

Find websites using a specific cipher suite:

#standardsql
SELECT domain, alexa_rank
FROM domain_public.20171006
WHERE p443.https.tls.cipher_suite = 'TLS_RSA_WITH_AES_256_CBC_SHA';

Calculate the breakdown of cryptographic keys for browser trusted certificates:

#standardsql
SELECT COUNT(*), parsed.subject_key_info.key_algorithm
FROM certificates_public.browser_trusted
GROUP BY parsed.subject_key_info.key_algorithm;

Standard SQL Only

You need to use Standard SQL to query the Censys datasets. Unfortunately, Google BigQuery defaults to Legacy SQL and the option to change your query to Standard SQL is hidden:

Another option is to include the following command at the top of your SQL statement in the web interface as shown below:

#standardsql
<SQL GOES HERE>

For more information on constructing SQL statements, check out the BigQuery SQL Reference.

Helpful Hints

A couple of notes from the Censys development team:

  1. Censys datasets are extremely wide (often containing thousands of columns). This means that seemingly simple queries (e.g., finding a single record about a host: select * from ipv4_public.current where ip = '8.8.8.8' ) will process several terabytes of data. This can be costly ($5-10 for the single query) if you're not careful. You can easily reduce these costs by only selecting the columns you need and by combining small queries into a single large query.

Query Limitations

The data we directly expose to users is accessible through BigQuery Views, which are subject to a few limitations:

  1. Standard SQL Only. BigQuery does not allow mixing standard and legacy SQL. This means that Censys views can only be queried using Standard SQL.
  2. No Direct Export Jobs. The results of queries against Censys views cannot be directly exported to a file on Google Cloud Storage. If you want to export results, you'll need to save the results of your query to a temporary table and export that to an external file.
  3. No Wildcard Queries. You cannot reference a view in a wildcard table query. Instead, you'll need to explicitly reference the tables you want to query.

If these limitations are causing significant problems, please reach out to our team at support@censys.io

Connecting to BigQuery in a Script

There are two methods of authenticating against Google BigQuery in a non-interactive environment:

  1. Your User Account. You can use OAUTH if you're running a script personally. [more information]
  2. Service Account. The service accounts you create in your Google Cloud Project will not automatically be granted access to the Censys BigQuery datasets (access is only granted to your user account and this does not propagate to the cloud project). If you'd like to have a service account added, please reach out to us at support@censys.io.
Did this answer your question?