Inventory Aggregation API
Aggregations provide detailed counts of data points that are deeply nested in structured data models, such as Censys Attack Surface Management representations of Internet-facing hosts and web entities.
Use aggregations to discover patterns, gain insight, and better understand the makeup of an external attack surface.
Collect counts of values using the Aggregate endpoint in the Inventory API.
This endpoint returns a single page result that contains a report about the frequency of values present in an inventory for a specified field across all assets matching a search query.
ASM API URL
https://app.censys.io/api/
Method and Path
POST /inventory/v1/aggregate
Request Body
JSON-formatted object containing an aggregate specification.
Several aggregation types are supported:
-
Cardinality Aggregation: A count of the unique values for a field.
-
Filter Aggregation: A count filtered by a provided query.
-
Nested Aggregation: A count of all the documents nested in a repeated field.
-
Reverse Nested Aggregation: A count of parents of a nested field.
-
Rare Term Aggregation: A breakdown of least frequent values for a field.
-
Term Aggregation: A breakdown of most frequent values for a field.
These types can be used recursively to produce counts within counts.
A term aggregation returns a count for each of the highest frequency values present in the inventory for a provided field (for example, term) across all assets matching a search.
Note
This aggregation is the most familiar to current users. It is the type available in Censys Search on the Report page.
In the body of the request, as part of the term
object, supply the dot-delimited key of the field
to be aggregated, as well as the maximum number_of_buckets
(that is, values) to provide counts for. The maximum allowed is 1000.
Return the top 10 unique values present in the workspace’s inventory for the cloud field with a count of hosts reporting that value, from most to least.
{ "workspaces": [ "your-workspace-id" ], "query": "type=HOST and host.cloud:*", "aggregation": { "term": { "field": "host.cloud", "number_of_buckets": 10 } } }
The aggregate executed successfully and found 1,169 entities matching the query. The key of each bucket is a value for the host.cloud
field, and the count
is the number of hosts with that value.
{ "queryDurationMillis": 167, "totalCount": 1,169, // the number of entities matching the query "result": { "term": { "buckets": [ { "key": "CloudFlare Inc", // the most common value for the field "count": 948, // the number of entities with this value "subResult": null }, { "key": "Amazon AWS", "count": 77, "subResult": null }, { "key": "Microsoft Corporation", "count": 50, "subResult": null }, { "key": "Akamai Technologies, Inc.", "count": 39, "subResult": null }, { "key": "Confluence Networks Inc", "count": 17, "subResult": null }, { "key": "GoDaddy Operating Company, LLC.", "count": 19, "subResult": null }, { "key": "Microsoft Azure", "count": 18, "subResult": null } ], "otherCount": 0, "errorUpperBound": 0 } } }
Warning
Nested fields (such as host.services
) in the asset schemas won’t work with simple term aggregations because these fields contain an array of objects. Use the nested
aggregation instead.
A nested aggregation returns a count of the total number of documents nested within a repeated field present across all of the entities matching a query.
In the body of the request, in the nested
object, supply the dot-delimited path
to the nested field.
The aggregate executed successfully and found 16,336 services across the 4,435 virtual hosts matching the query.
{ "queryDurationMillis": 186, "totalCount": 4435, // the number of entities matching the query "result": { "nested": { "count": 16336, // the number of nested documents across all the entities matching the query "subResult": null } } }
A sub aggregation performs an aggregation within 1 previously specified. Sub aggregations are the same types as top-level aggregations.
In the body of the request, add a sub_aggregation
object after the initial aggregation, and embed another aggregation in the object.
... "aggregation": { ..., "sub_aggregation":{...} }
A filter aggregation narrows the counted documents to only those that match a query. This aggregation is often used as a sub_aggregation.
In the body of the request, in the filter
object, supply the query
in the Censys Search Language that filters the counted results.
Return the count of the name-based services that have a software risk in the workspace’s inventory.
{ "workspaces": [ "your-workspace-id" ], "query": "host.name: * and host.services.software.risks:*", "aggregation": { "nested": { "path": "host.services" }, "sub_aggregation": { "filter": { "query": "software.risks:*" } } } }
The aggregate executed successfully and found 183 services with a software risk out of the total 267 services on the 90 virtual hosts matching the query.
{ "queryDurationMillis": 26272, "totalCount": 90, // the number of entities matching the query "result": { "nested": { "count": 267, // the number of nested documents across all the entities matching the query "subResult": { "filter": { "count": 183, // the number of nested documents filtered by the filter query "subResult": null } } } } }
A rare term aggregation returns a count for each of the lowest frequency values present in the inventory for a specified field (for example, term) across all assets matching a search. Unlike term aggregations, this type of aggregation takes a numerical definition of "rare" instead of a number of buckets.
Why?
Well, for example, if 20 unique values are seen in only 1 document, it wouldn’t be possible to accurately return "the 10 least common values." Instead, defining rare by a count allows the aggregate to include as many or as few results that exist fitting that definition.
In the body of the request, in the rare_term
object, supply the dot-delimited key of the field
to be aggregated, as well as the maximum number of values (maxCount
) to provide counts for. The maximum allowed is 100.
Return the provinces with 10 or fewer hosts that have a critical or high risk, and include the count of hosts.
{ "workspaces": [ "{{workspace_id}}" ], "query": "host.services.risks.severity:{critical, high}", "aggregation": { "rareTerm": { "field": "host.location.province", "maxCount": 10 } } }
The aggregate executed successfully and out of the provinces with 10 or fewer hosts reporting that province as their location.
{ "queryDurationMillis": 192, "totalCount": 485, // the number of entities matching the query "result": { "rareTerm": { "buckets": [ { "key": "Alabama", // the least common value for the term of the entities matching the query "count": 1, // the number of entities with the province value shown in the key "subResult": null }, { "key": "Alaska", "count": 1, "subResult": null }, { "key": "Baladiyat ad Dawhah", "count": 1, "subResult": null }, { "key": "Colorado", "count": 1, "subResult": null }, { "key": "Haifa", "count": 1, "subResult": null }, { "key": "Iowa", "count": 1, "subResult": null }, { "key": "Jerusalem", "count": 1, "subResult": null }, { "key": "Land Berlin", "count": 1, "subResult": null }, { "key": "Maryland", "count": 1, "subResult": null }, { "key": "Massachusetts", "count": 1, "subResult": null } ] } } }
A reverse nested aggregation enables aggregating on parent docs from nested documents.
This field is used in conjunction with the nested
field.
In the body of the request, as part of the reverse_nested
object, supply the dot-delimited path
to the field to be aggregated.
Return a count of hosts with 1 of the top 10 most common extended service names in the inventory.
{ "workspaces": [ "your-workspace-id" ], "query": "host.ip:* and not host.name:*", "aggregation": { "nested": { "path": "host.services" }, "sub_aggregation": { "term": { "field": "host.services.extended_service_name", "number_of_buckets": 10 }, "sub_aggregation": { "reverse_nested": { "path": "host" } } } } }
The aggregate executed successfully and found the count of hosts with at least 1 of the 10 most common extended service names in the inventory.
{ "queryDurationMillis": 1793, "totalCount": 8778, // the number of entities matching the query "result": { "nested": { "count": 6214, // the number of nested documents across all the entities matching the query "subResult": { "term": { "buckets": [ { "key": "HTTP", // the most common value for the term on hosts matching the query "count": 3167, // the number of nested documents whose value for the term is the key "subResult": { "reverseNested": { "count": 1776, // the number of parent documents with at least 1 of the services counted above "subResult": null } } }, { "key": "HTTPS", "count": 1928, "subResult": { "reverseNested": { "count": 1549, "subResult": null } } }, { "key": "UNKNOWN", "count": 293, "subResult": { "reverseNested": { "count": 252, "subResult": null } } }, { "key": "SSH", "count": 123, "subResult": { "reverseNested": { "count": 109, "subResult": null } } }, { "key": "ANYCONNECT", "count": 99, "subResult": { "reverseNested": { "count": 99, "subResult": null } } }, { "key": "DNS", "count": 91, "subResult": { "reverseNested": { "count": 91, "subResult": null } } }, { "key": "SMTP-STARTTLS", "count": 83, "subResult": { "reverseNested": { "count": 52, "subResult": null } } }, { "key": "NTP", "count": 70, "subResult": { "reverseNested": { "count": 70, "subResult": null } } }, { "key": "IMAPS", "count": 64, "subResult": { "reverseNested": { "count": 34, "subResult": null } } }, { "key": "POP3S", "count": 58, "subResult": { "reverseNested": { "count": 33, "subResult": null } } } ], "otherCount": 238, "errorUpperBound": 0 } } } } }
A cardinality aggregation returns only the count of the unique values for a field present in the workspace’s inventory.
This aggregation is useful when trying to figure out the number_of_buckets
needed for a term aggregation.
In the body of the request, in the cardinality
object, provide the dot-delimited field
whose unique values are counted.
The aggregate executed successfully and found 19 unique operating system vendors reported by hosts in the inventory.
{ "queryDurationMillis": 81, "totalCount": 13186, // the total number of entities matching the query "result": { "cardinality": { "value": 19 // the number of unique values for OS vendor across all entities matching the query } } }
These requests can be copied and pasted into your API client. Replace the placeholder text in the workspaces
record with your organization’s workspace ID.
What are the 100 most common non-HTTP services on hosts in the inventory and what are the 5 most common ports each of those services run on?
This aggregate returns:
-
The number of hosts with a service that is not in the HTTP family.
-
The total number of services on those hosts.
-
The number of non-HTTP services on those hosts.
-
The 100 most common service names.
-
The 5 most common ports the services are running on.
{ "workspaces": [ "your-workspace-id" ], "query": "host.services: (not service_name: {HTTP, CWMP, KUBERNETES, PROMETHEUS, ELASTICSEARCH})", "aggregation": { "nested": { "path": "host.services" }, "sub_aggregation": { "filter": { "query": "not host.services.service_name: {HTTP, CWMP, KUBERNETES, PROMETHEUS, ELASTICSEARCH}" }, "sub_aggregation": { "term": { "field": "host.services.service_name", "number_of_buckets": 100 }, "sub_aggregation": { "term": { "field": "host.services.port", "number_of_buckets": 5 } } } } } }
What are the 1,000 most common HTML titles of name-based HTTPS services returning a 200 status code?
{ "workspaces": [ "your-workspace-id" ], "query": "host.name: * and host.services:(extended_service_name: HTTPS and http.response.status_code: 200)", "aggregation": { "nested": { "path": "host.services" }, "sub_aggregation": { "filter": { "query": "extended_service_name: HTTPS and http.response.status_code: 200" }, "sub_aggregation": { "term": { "field": "host.services.http.response.html_title", "number_of_buckets": 1000 } } } } }
What are the 1,000 most common HTML titles of HTTP services not returning a 301 status code?
This aggregation returns:
-
The number of hosts with a service on port 80 not returning an HTTP 301.
-
The total number of services on those hosts.
-
The number of services on port 80 (same as first number).
-
The 1000 most common HTML titles for those services.
-
The 5 most common HTTP status codes returned by services with those status codes.
{ "workspaces": [ "your-workspace-id" ], "query": "host.services: (port: 80 and not http.response.status_code: 301)", "aggregation": { "nested": { "path": "host.services" }, "sub_aggregation": { "filter": { "query": "port: 80 and not http.response.status_code: 301" }, "sub_aggregation": { "term": { "field": "host.services.http.response.html_title", "number_of_buckets": 154 }, "sub_aggregation": { "term": { "field": "host.services.http.response.status_code", "number_of_buckets": 5 } } } } } }
This aggregation returns:
-
The number of hosts with a high severity risk.
-
The total number of services on all of those hosts.
-
The number of services with a high severity risk.
-
The total number of risks on those services.
-
The number of high severity risks on those services.
-
The 10 most common risk categories of the high severity risks.
-
The number of services that each high risk category is on.
-
The 10 most common port numbers of those services.
{ "workspaces": [ "your-workspace-id" ], "query": "host.services.risks.severity: high", "aggregation": { "nested": { "path": "host.services" }, "sub_aggregation": { "filter": { "query": "host.services.risks.severity: high" }, "sub_aggregation": { "nested": { "path": "host.services.risks" }, "sub_aggregation": { "filter": { "query": "severity: high" }, "sub_aggregation": { "term": { "field": "host.services.risks.categories", "number_of_buckets": 10 }, "sub_aggregation": { "reverse_nested": { "path": "host.services" }, "sub_aggregation": { "term": { "field": "host.services.port", "number_of_buckets": 10 } } } } } } } } }
This aggregation returns:
-
The number of hosts with a software risk.
-
The total number of services on those hosts.
-
The number of services with a software risk.
-
The number of software risks.
-
The top 10 ten software risk types.
-
The number of services with each of the top 10 risks.
-
The 10 most common software packages reported by the services with each risk type.
{ "workspaces": [ "your-workspace-id" ], "query": "host.services.software.risks:*", "aggregation": { "nested": { "path": "host.services" }, "sub_aggregation": { "filter": { "query": "software.risks:*" }, "sub_aggregation": { "nested": { "path": "host.services.software.risks" }, "sub_aggregation": { "term": { "field": "host.services.software.risks.type", "number_of_buckets": 10 }, "sub_aggregation": { "reverse_nested": { "path": "host.services" }, "sub_aggregation": { "term": { "field": "host.services.software.uniform_resource_identifier", "number_of_buckets": 10 } } } } } } } }
Comments
0 comments
Please sign in to leave a comment.