Regular Expressions (Regex) in Censys Search
Regex is a regular expression (regex) is a mechanism for describing a specific pattern instead of a static value as search criteria. While wildcards can be used as a substitute for any number of characters with no specificity, a regex is a more specific wildcard.
Tip
Censys regex searches are case-insensitive except when the exact match operator =
is used. For example, services.software.vendor:/De[l]+/
returns results where the word is either capitalized or lowercase, while services.software.vendor=/De[l]+/
only returns results for the capitalized word.
A regular expression provides great flexibility in defining search criteria to return relevant matches from large data sets.
As a simple example, take a wildcard search such as:
services.http.response.body: *.js*
which asks, "Which (unnamed) hosts with an HTTP service contain a reference to any string containing .js
?"
The query above, while simple, is noisy, with link::https://search.censys.io/search?resource=hosts&sort=RELEVANCE&per_page=100&virtual_hosts=EXCLUDE&q=services.http.response.body%3A+*.js*[~26,300,000 results].
A regular expression is a way to provide criteria about what the value must look like without limiting it to single static string.
For example:
services.http.response.headers.location=/.*(\.\.\/)+.*(\.asp|\.php|\.js|\.cgi).*/
asks, "Which hosts have an HTTP location header that includes the sequence ../
( which is vulnerable to directory traversal attacks), followed by one of the more common executable page types like .js, .php, or .asp?"
What are all the back slashes for?
A back slash is used to escape a character that is otherwise interpreted as an operator. For example, because periods (.
) separate pieces of the pattern defined by a regular expression, you must put a backslash before a period to actually look for that character.
Regular expressions are extremely powerful but computationally slow. While understanding when to use a regex comes with experience, the most basic guideline is to use a regex when a simple pattern match won’t do.
Regex against fields whose values are robust strings
While regular expressions are valid for most fields, the best use case is for fields whose values are long strings.
Popular host fields to write regular expressions for:
-
http.response.body
-
services.banner
Important
To search the full HTML markup in HTTP response bodies, use the exact match (=
) operator. Make sure to add generic regex wildcards (.*
) before and after the expression to account for "everything else" in the body. Learn more.
Popular certificate fields to write regular expressions for:
-
parsed.names
These are example queries using regular expressions that show the power of regex.
Regular Expressions |
Description |
Link |
---|---|---|
|
HTTP responses originating from behind a proxy. |
|
|
HTTP sites with an executable file listed in their index file. |
|
|
Certificates that contain an eTLD+4-formatted subdomain of |
|
|
Hosts with a service serving a Hook URL. |
Use this official regex syntax reference to see how to construct regular expressions.
Comments
0 comments
Please sign in to leave a comment.