Google BigQuery Introduction
Censys customers can perform complex queries against Censys data via a collection of Google BigQuery tables. Google BigQuery is the Google public data warehouse service that allows developers to interact with arbitrarily large datasets in real time using SQL and Javascript.
Google BigQuery offers a web UI, command-line tool, and REST API for querying, as well as Dataproc (Google Spark and Hadoop offering). A variety of third-party tools —like Looker and Tableau—are available so you can interact with Censys Google BigQuery data.
Current datasets:
-
Censys Universal Internet Dataset: A single table (
censys-io.universal_internet_dataset.current
) that contains structured/banner data from over 3500 ports on IPv4, IPv6 hosts and virtual hosts. -
Certificates: 2 certificates tables (
certificates_v2.certificates_all
) that contain all unique X.509 certificates found during scan and by synchronizing with publicly known CT servers.
Historical Datasets:
-
Domains (deprecated): A daily snapshot (
domain_public.current
) of the current state of the Alexa Top Million Domains collection. -
IPv4 (deprecated): A daily snapshot (
ipv4_public.current
) of the current state of known IPv4 hosts and how public protocols are configured. -
IPv4 Banners (deprecated): A daily snapshot (
ipv4_banners_public.current
) of the (banner) results of our continuous light-weight scanning on 2029 of the most popular ports.
Google BigQuery SQL dialect is compliant with the SQL 2011 standard, and you can query tables after you add the Censys datasets.
Discover the distribution of services with Coinhive miner indicators of compromise across ports in today's Internet snapshot:
SELECT port, count(*) FROM `censys-io.universal_internet_dataset.current`, UNNEST(services) WHERE service_name = 'HTTP' AND REGEXP_CONTAINS(SAFE_CONVERT_BYTES_TO_STRING(http.response.body), r'(?i)coinhive.min.js') GROUP BY 1 ORDER BY 2 DESC
Calculate the breakdown of cryptographic keys for certificates:
SELECT COUNT(*), parsed.subject_key_info.key_algorithm.name FROM `censys-io.certificates_public.certificates` GROUP BY parsed.subject_key_info.key_algorithm.name
Find websites using a specific cipher suite:
SELECT domain, alexa_rank FROM `censys-io.domain_public.current` WHERE p443.https.tls.cipher_suite.name = 'TLS_RSA_WITH_AES_256_CBC_SHA'
You need to use Standard SQL to query the Censys datasets, which is now the default for Google BigQuery. If you encounter issues running queries, confirm that your query is using Standard SQL by opening the query editor and selecting the More list. From the list, select Query Settings to open a window. Clear any references to Legacy SQL.
Another option is to include the following command at the top of your SQL statement in the web interface as shown below:
#standardsql <SQL GOES HERE>
For more information on constructing SQL statements, check out the BigQuery SQL Reference.
Censys datasets are extremely wide, often containing thousands of columns, so seemingly simple queries (for example, finding a single record about a host: SELECT * FROM censys-io.universal_internet_dataset.current WHERE host_identifer.ipv4 = '8.8.8.8'
processes several terabytes of data. This can be costly ($5-10 for the single query). You can reduce these costs by only selecting the columns you need or by combining several small queries into a single larger query.
The data Censys directly exposes is accessible through BigQuery Views, which are subject to a few limitations:
-
Standard SQL Only. Google BigQuery does not allow mixing standard and legacy SQL. This means that Censys views can only be queried using Standard SQL.
-
No Direct Export Jobs. The results of queries against Censys views cannot be directly exported to a file on Google Cloud Storage. If you want to export results, save the results of your query to a temporary table and export that to an external file.
-
No Wildcard Queries. You cannot reference a view in a wildcard table query. Instead, you need to explicitly reference the tables you want to query.
If these limitations cause significant problems, reach out to our team at support@censys.io.
You can authenticate against Google BigQuery in 2 ways in a non-interactive environment:
-
Your User Account. You can use OAUTH if you're running a script personally.
-
Service Account. The service accounts you create in your Google Cloud Project are not automatically granted access to the Censys Google BigQuery datasets. Access is only granted to your user account and this does not propagate to the cloud project. If you want a service account added, reach out to us at support@censys.io.
big query
Comments
0 comments
Article is closed for comments.