Regular Expressions in Censys Search 2.0
In Censys Search, support for regular expressions unlocks advanced search capabilities for teams with a paid subscription.
What is Regex?
A regular expression (regex) is a mechanism for describing a specific pattern instead of a static value as search criteria.
Whereas wildcards can be used as a substitute for any number of characters with no specificity, a regex is a "more specific wildcard."
Why Regex?
A regular expression provides great flexibility in defining search criteria to return relevant matches from large data sets.
As a simple example, take a wildcard search such as
services.http.response.body: *.js*
which asks the question, "Which (unnamed) hosts with an HTTP service contain a reference to any string containing .js
?"
The query above, while simple, is noisy, with link::https://search.censys.io/search?resource=hosts&sort=RELEVANCE&per_page=100&virtual_hosts=EXCLUDE&q=services.http.response.body%3A+*.js*[~26,300,000 results].
A regular expression is a way to provide criteria about what the value must look like without limiting it to single static string.
For example:
services.http.response.headers.location=/.*(\.\.\/)+.*(\.asp|\.php|\.js|\.cgi).*/
asks the question, "Which hosts have an HTTP location header that includes the sequence ../
( which is vulnerable to directory traversal attacks), followed by one of the more common executable page types like .js, .php, or .asp?"
What are all the back slashes for?
A back slash is used to escape a character that would otherwise be interpreted as an operator.
For example, since periods (.
) are used to separate pieces of the pattern defined by a regular expression, you must put a backslash before a period in order to actually look for that character.
Regex Best Practices
Regular expressions are extremely powerful but computationally expensive (i.e., slow). While understanding when to use a regex comes with experience, the most basic guideline is to use a regex when a simple pattern match won’t do.
Regex Against Fields Whose Values Are Robust Strings
While regular expressions are valid for most fields, the best use case is for fields whose values are long strings.
Popular host fields to write regular expressions for:
-
http.response.body
-
services.banner
Important
|
In order to search the full HTML markup in HTTP response bodies, use the exact match (= ) operator. When doing so, be sure to add generic regex wildcards (.* ) before and after the expression to account for "everything else" in the body. Learn more
|
Popular certificate fields to write regular expressions for:
-
parsed.subject_dn
-
parsed.names
Regex Examples in Censys Queries
We’ve put together a handful of queries utilizing regular expressions that help show the power of regex for Censys use cases.
Regular Expression |
Description |
Link |
|
HTTP responses originating from behind a proxy |
|
|
HTTP sites with an executable file listed in their index file |
|
|
Certificates that contain an eTLD+4-formatted subdomain of |
|
|
Hosts with a service serving a Hook URL |
Syntax Reference
Use this official regex syntax reference to see how to construct regular expressions.
Comments
0 comments
Please sign in to leave a comment.