Regular Expressions in Censys Search 2.0
In Censys Search, support for regular expressions unlocks advanced search capabilities for teams with a paid subscription.
What is Regex?
A regular expression (regex) is a mechanism for describing a specific pattern instead of a static value as search criteria.
Whereas wildcards can be used as a substitute for any number of characters with no specificity, a regex is like a "more specific wildcard."
Why Regex?
A regular expression provides great flexibility in defining search criteria to return relevant matches from large data sets.
As a simple example, take a wildcard search such as
services.http.response.body: *.js
which asks the question, "Which hosts with an HTTP service contain a reference to any string containing .js
?"
The query above, while simple, is noisy, with ~26,300,000 results.
A regular expression is a way to provide criteria about what the value must look like without limiting it to single static string.
For example:
services.http.response.headers.location: /.*(\.\.\/)+.*(\.asp|\.php|\.js|\.cgi).*/
asks the question, "Which hosts have an HTTP location header that includes the sequence ../
( which is vulnerable to directory traversal attacks), followed by one of the more common executable page types like .js, .php, or .asp?"
What’s with all the back slashes?
A Back slash is used to escape a character that would otherwise be interpreted as an operator.
For example, since periods (.
) are used to separate pieces of the pattern defined by a regular expression, you must put a backslash before a period in order to actually look for that character.
Regex Best Practices
Regular expressions are extremely powerful but computationally expensive (i.e., slow). While understanding when to use a regex comes with experience, the most basic guideline is to use a regex when a simple pattern match won’t do.
Don’t Loop If You Can Avoid It
This really only applies to scripts, which means searches performed via API, but it’s still important. If a piece of code requires looping a regex query, it should be done by storing the result in a variable and referencing the variable inside the code loop instead of the regex query itself.
Regex Against Fields Whose Values Are Robust Strings
While regular expressions are valid for most fields, the best use case is for fields whose values are long strings.
Popular host fields to write regular expressions for:
-
http.response.body
-
services.banner
Important
|
In order to search the full HTML markup in HTTP response bodies, use the exact match (= ) operator. When doing so, be sure to add generic regex wildcards (.* ) before and after the expression to account for "everything else" in the body. Learn more
|
Popular certificate fields to write regular expressions for:
-
parsed.subject_dn
-
parsed.names
Regex Examples in Censys Queries
We’ve put together a handful of queries utilizing regular expressions that help show the power of regex for Censys use cases.
Regular Expression |
Description |
Link |
|
Search for HTTP responses originating from behind a proxy |
|
|
HTTP sites with an available executable file in their index file |
|
|
All certificates that contain an eTLD+3-formatted subdomain of |
|
|
Hosts with a service serving a Hook URL |
Syntax Reference
Use this official regex syntax reference to see how to construct regular expressions.
Comments
0 comments
Please sign in to leave a comment.