Search 2.0 Troubleshooting Q&A
Discover details and nuances about the Search 2.0 host dataset and the Censys Search Language in this question-and-answer-style document.
Questions about Censys Search
Q: How do I specify a historical date for my search?
A: You can’t. Searches executed in the web UI and API are always for hosts and virtual hosts as they are currently known.
On any host page, you can select Host History to see a chronology of events and go back to a historical view, but searches using history are not supported.
(Enterprise customers who download or access daily snapshots in BigQuery can search the Internet as it was known to Censys at a historical point in time)
Q: Can I search using the observation timestamp for a service?
A: No, service observation timestamps change so rapidly across our ~3B indexed services that we can’t publish changes to this field fast enough to allow searching on it.
The host-level last_updated_at
field is searchable.
This field is updated in the search index when a service observation or enrichment event changed the data.
For example, a host with a service that has been observed by a Censys scanner every day for the past 5 days without change will have the last_updated_at
timestamp in the searchable index from 5 days ago.
Viewing the host on its details page will show the up-to-date timestamp.
To see all of the observations Censys made of a host’s services, even ones that resulted in no change to its representation, open the History tab and toggle the See all observations button to blue.
Q: How is the equal sign operator (=
) different from the colon (:
)?
A: The equals sign means that the value provided as search criteria for a field must be an exact match in totality to the value stored in Censys in order for the host to be considered a hit.
Q: Why are my searches for HTML values not getting good results?
A: A search that uses the fuzzy match operator (:
) for services.http.response.body
only searches the contents of the HTML body, while the exact match operator (=
) searches the full markup of the HTML body (including HTML tags).
Remember, if you use the exact match operator, only hosts with an HTTP response body that matches exactly and in whole to the value specified will be returned, so use wildcards (*
) to account for surrounding content.
Q: How do I restrict results to hosts with IPv6 addresses?
A: Append this criteria to your query: and labels=ipv6
Q: How do I exclude truncated superhosts and their pseudo services from my search results?
A: Add and truncated: false
to a query.
Suspected superhosts—that is, hosts with more than 100 services—are truncated, and only a sample of their services are indexed for searching: for each unique service name on the host, the (truncated) service on the lowest numerical port number is indexed.
Questions about Host Fields
Q: What does the truncated
boolean field mean?
A: When services.truncated: true
, Censys is distinguishing a low-quality pseudoservice from a regular service.
Analysis of Censys scan data reveals that hosts with more than 100 services are very likely to be either honeypots or firewalled hosts whose exposed services are qualitatively inferior to real services.
Because of the irrelevance and poor data quality of these 'pseudo services,' Censys truncates the service data itself and the number of searchable services for these 'superhosts.'
Want to exclude superhosts and pseudo services from results? See how above.
Q: Why are there no results for services.service_name: HTTPS
?
A: The service name field does not recognize the TLS indicator. You must search the extended_service_name
field instead.
For example, a search for services.service_name: HTTP
will return hosts running HTTP and HTTPS services.
If you want to restrict results to just HTTPS, you can use the services.extended_service_name
field, whose values do reflect the use of TLS.
Q: How do observation and update timestamps differ?
The observed_at
field within a service record marks the time that the service information was obtained via a Censys scan.
Location and routing data also have a last_updated_at
timestamp to reflect when they were last updated.
The last_updated_at
field located at the root level of a host or virtual host reflects the time of the latest change to any host or virtual host data, including a service observation or an update to location or routing data.
Example API Response for View Host 8.8.8.8 to show timestamps
{ "status": "OK", "code": 200, "result": { "ip": "8.8.8.8", "last_updated_at": "2022-01-19T16:23:57.883843845Z", "services": [ { "service_name": "DNS", "extended_service_name": "DNS", "transport_protocol": "UDP", "port": 53, "observed_at": "2022-01-19T16:23:57.883843845Z", "source_ip": "167.94.138.113", "perspective_id": "PERSPECTIVE_TATA", "truncated": false, "_decoded": "dns", "dns": {...} } ], "location": {...}, "location_updated_at": "2022-01-10T17:15:15.925739Z", "autonomous_system": {...}, "autonomous_system_updated_at": "2022-01-05T16:45:47.109054Z", "dns": {} } }
Q: Why do some hosts have multiple fields with the same key?
A: Key names are not guaranteed unique for a host because the same key can appear many times across a host’s services.
For example, in the legacy host dataset, SMTP fields could only ever appear once on a host because Censys only ever found SMTP on port 25. But now that Censys can find this service on any port, one host could potentially have multiple SMTP services, and therefore multiple fields with the flattened key name, services.smtp.ehlo
.
Tip
|
Software and TLS fields are most likely to be repeated across a host, since many services report their software and utilize TLS encryption. |
In some Censys Search API endpoints, such as /hosts/{ip}/diff
, the JSONPointers seen in the path
values are "array aware," so each service is indexed.
This creates a unique path to a key that is not unique.
Example
This JSONPatch object, extracted from a GET /hosts/{ip}/diff
response, shows the update of an observation timestamp for the second1 service in a host’s services array.
{ "op": "replace", "path": "/services/1/observed_at", (1) "value": "2021-09-21T17:48:00.428159173Z" }
(1) Arrays utilize zero-indexing
Questions about the API
Q: Can I specify the host or certificate fields I want returned by the Search API?
A: Yes! Use the optional fields
parameter to list up to 25 fields (including any embedded field for a certificate record) to be returned for each hit in a search result. Only a few large host fields (HTTP bodies and banners) cannot be returned.
Q: The API accepts timestamps with nanosecond precision. How many decimal places is that?
A: Nine.
Any endpoint that uses the at_time
parameter accepts an RFC3339-formatted timestamp with up to nanosecond precision, which is nine digits after the decimal.
Example: 2021-09-21T15:04:05.999999999Z
Diátaxis: explanation
Comments
0 comments
Article is closed for comments.