Differences Between Search 2.0 and Internet Intelligence Platform Datasets
Censys introduced a new data model for describing hosts, certificates, and web properties on the internet in the Internet Intelligence Platform (IIP). This new data model is different from the one available in Search 2.0 and features many improvements and additions.
As a result, queries and workflows that were used in Search 2.0 will need to be modified to return similar results in the IIP. For experienced Search 2.0 users, one of the most tangible differences between Search 2.0 and IIP is that host
, cert
, or web
have been prepended to all parsed data fields. These values indicate whether the field was detected on a host, certificate, or web property asset.
This document explains how Censys IIP models hosts and web properties in contrast with the data model available in Search 2.0.
The certificate data model available in IIP did not significantly change from that available in Search 2.0. However, all certificate-related fields are now prepended with cert
.
Learn more about the new query language used in the IIP here. A complete list of all data fields available in the IIP is accessible from within the IIP web interface.
Deprecation of virtual hosts and introduction of web properties
The Search 2.0 dataset includes hosts and virtual hosts. In Search 2.0, hosts are identified by an IP address. Virtual hosts are identified by a name and an IP address.
Virtual hosts are not present in the IIP dataset. Instead, internet assets that respond to hostname-based scans are classified as web properties.
In IIP, web properties are identified by a hostname and a port. Hostnames can be name-based records (such as app.censys.io) or ip-based records (such as 104.18.10.85). Example names of web property records include the following:
- app.censys.io: 443
- 104.18.10.85: 8880
Web properties provide a more accurate, up-to-date, and comprehensive view of name-based assets than virtual hosts. Classifying and presenting web assets as web properties enables Censys data to:
- Better identify and showcase name-based assets, linking domain names and related services directly to their underlying infrastructure.
- Support data from multiple HTTP endpoints.
- Make searching for name-based assets feel like using a web browser.
- Dramatically improve the freshness of web data. Web properties are refreshed daily.
- Provide deeper visibility into application scans and the true footprint of internet assets, enhancing precision in asset discovery and threat monitoring.
How web properties differ from host services
Web properties offer insight into HTTP services beyond layer 7 while abstracting away HTTP protocol semantics.
- Web properties contain all records that correlate to HTTP-based scans.
- A web property can have one or more endpoints, which serve as distinct entry points to different resources or functionalities within the web property.
- Each endpoint may be associated with specific applications, services, or APIs and can be individually monitored and analyzed.
- Web properties support deep scan information for HTTP-based scanners.
When to search across web properties instead of host services
Search web properties when:
- You want results that include hostnames.
- You are targeting software that runs on top of HTTP such as wordpress, pprof, kubernetes, elasticsearch, and so on.
- You are targeting software that services HTTP like apache or nginx.
- You need HTTP body information.
- You need data from endpoints other than /.
Do not use web properties when:
- You want results that include IP addresses.
- You are searching for DNS data, whois data, geolocation data, or routing data.
- You are searching for hosts serving HTTP as well as non-HTTP protocols.
Previous limitations with virtual hosts in Search 2.0 data model
Censys has historically scanned hosts with HTTP services running on them. There were several limitations with this approach:
- In the Search 2.0 data model, scanning hosts with HTTP services typically targets a single HTTP endpoint per host. By default, Censys aims to scan the root path (/) on HTTP and HTTPS ports (e.g., 80, 443) for each host. This approach makes it challenging to extract information from multiple endpoints like /wp-admin or /login.
- Additionally, endpoints beyond the root may contain dynamic or context-specific content that varies based on parameters like session states, cookies, or user-agent headers. Scanning multiple or application-specific paths requires explicit scan configuration, reducing the ability to explore deeper or secondary paths systematically which could result in missing critical information hosted on these endpoints.
- If a host had an HTTP service running on 443 but Censys could also identify a Cobalt Strike application running on that HTTP service, the Search 2.0 data model couldn’t support showing both services, which run on 443. This led to difficult decisions about prioritizing scan results which didn’t feel aligned with our mission.
- Some users struggle with understanding when and whether they should include or exclude virtual hosts as targets of their search queries.
- Data on virtual hosts could be up to 45 days stale, making them less useful.
Modified data field names
In the IIP dataset, some field names have changed, moved, or been removed. The following are examples of popular fields that moved between Search 2.0 and IIP. A complete list of fields that have moved or changed is available here.
Search 2.0 field | IIP field |
services.service_name | host.services.protocol |
services.http.response.body | host.services.endpoints.http.body |
Context on hosts and web properties
The data model featured in the Censys IIP surfaces new, actionable context on what exactly mapped services and devices are, how they are configured, whether they are vulnerable, whether they are malicious, and where they live.
This new context allow users to:
- Quickly find a specific type of host, web property, or service.
- Take advantage of uniform metadata fields like CPEs, software, hardware, and labels.
- Quickly and easily conduct investigations and incorporate Censys data into automation workflows.
The following new context types are available on hosts and web property records. Not all context will be available to all users.
- Hardware
- Software
- Operating System
- Threats
- Vulnerabilities
- Exposures
- Labels
- Misconfigurations
Hardware
By extracting hardware into its own object, the IIP dataset is able to provide details about the hardware itself (such as a Juniper Router) while also listing any known information about the hardware’s components, such as the processor type and firmware. This provides a structure to give more context about the kinds of devices that may be vulnerable.
Software
By extracting software into its own object, Censys data captures specific details about the software itself (such as Microsoft SQL Server) while also including any relevant information about its components, like version numbers on hosts and web properties.
This structured approach allows IIP to offer more context on software configurations and versions that may present vulnerabilities or security concerns, as well as highlight associations with other software or hardware dependencies, helping users understand the potential exposure or compatibility across diverse systems.
Field name | Description |
.services.software.components.vendor | Vendor or organization responsible for creating or maintaining the software component |
.services.software.components.cpe | Common Platform Enumeration (CPE) identifier for the software component |
.services.software.components.version | Version number of the software component |
.services.software.components.part | Specifies the type of software component such as web-server, proxy-server, botnet-server, and so on |
.services.software.components.product | Product name of the software component |
.services.software.components.update | Describes the update version for the software component such as major, minor, or patch |
.services.software.components.edition | Specifies the edition of the software component, such as Standard, Enterprise, and so on |
.services.software.components | Represents individual components within the software, allowing for detailed specification of submodules that the primary software relies on |
.services.software.components.life_cycle | Lifecycle details of a software component, such as release and end-of-life dates, providing insight into support and maintenance periods |
.services.software.components.life_cycle.end_of_life_date | Date on which support for the software component officially ended |
.services.software.components.life_cycle.release_date | Initial release date of the software component |
.services.software.components.life_cycle.end_of_life | Indicates whether the software component has reached its end-of-life status (true if support is discontinued) |
Threats
In the IIP data model, Censys defines “threat” using NIST’s primary definition of “cyber threat”:
Any circumstance or event with the potential to adversely impact organizational operations (including mission, functions, image, or reputation), organizational assets, or individuals through an information system via unauthorized access, destruction, disclosure, modification of information, and/or denial of service. Also, the potential for a threat-source to successfully exploit a particular information system vulnerability.
Threats are limited in scope as they are new objects and will be made available starting in 2025.The fields in the threat object provide a comprehensive view of threat information detected on services on hosts and web properties. Each field contributes to the detailed profiling and validation of potential threats, leveraging both Censys proprietary methods and industry-standard practices.
Field name | Description |
.services.threats.confidence | Shows Censys level of confidence in the threat detection, with values indicating how likely it is that the identified threat is accurate |
.services.threats.evidence.negative | Logs evidence suggesting a threat might not be present, helping to minimize false positives |
.services.threats.evidence.proprietary | Shows evidence obtained from Censys proprietary methods, adding further context to the detection |
.services.threats.evidence.regex | Pattern-matching expressions used to detect signs of threats through specific sequences in data |
.services.threats.evidence.semver_expression | Verifies vulnerabilities using software versioning rules based on known patterns |
.services.threats.evidence.data_path | Pinpoints the location within the data structure where threat evidence was detected |
.services.threats.evidence.exists | Indicates whether key evidence supporting the presence of a threat was found |
.services.threats.evidence.found_value | Displays the exact value supporting the threat detection |
.services.threats.evidence.literal_match | Confirms an exact match for a known threat indicator, ensuring precise identification |
.services.threats.names | Provides a list of all known names and aliases for the detected threat, ensuring users can recognize it regardless of naming variations |
.services.threats.source | Identifies the origin of the threat information, whether from external intelligence sources or Censys analysis |
.services.threats.tactic | Categorizes the threat according to its objective (e.g., reconnaissance, lateral movement) based on standardized threat behavior models like MITRE ATT&CK |
.services.threats.type | Categorizes the threat type (e.g., c2-server, phishing-server) to help identify the nature of the threat |
Changes to labels between Search 2.0 and IIP
In the Search 2.0 dataset, labels are used for multiple purposes, ranging from indicating software manufacturers to describing records using descriptors like “network.device” or “login-page.”
There are fewer label values in IIP than in Search 2.0. This is partially because “labels” in Search 2.0 that were actually unstructured software, hardware, or operating system data have been moved to the appropriate component fields (e.g. jquery, bootstrap) in IIP.
The table below lists a few of the labels available in IIP:
Label name | Description |
IPV6 | Entity identified as a IPv6 host |
login-page | Entity has an HTTP service that appears to host a login page |
open-dir | Web Server with a exposed directory listing |
suspicious-open-dir | Web Server with Suspicious Open Directory |
IIP and Search 2.0 data model examples
IIP host data model example
The following is an example of a host record in the IIP data model. A host represents strictly host-level information about an IP address. This includes its location, routing, and IP whois enrichments. All context included about a host is derived from service scan data.
{
"dns": {
// DNS Names and Forward DNS Data
},
"ip": <ipv4 or ipv6 address>,
"service_count": <count of services running on host>,
"truncated": <true or false>,
"location": {
// Location details such as continent, country, country_code, city, postal_code, timezone, and co-ordinates
},
"routing": {
// Routing information such as ASN,BGP Prefix, BGP name and country
},
"services": [
{
"port": <port #>,
"protocol": <protocol name>,
"transport_protocol": <transport protocol - TCP / UDP, QUIC>,
"misconfigs": [],
"exposures": [],
"vulns": [],
"software": [],
"hardware": [],
"operating_systems": [],
"threats": [],
"labels": [],
"ip": <ipv4 or ipv6 address>,
"scan_time": "2024-10-20T20:07:50.000Z",
"banner": "",
"banner_hash_sha256": "",
"<service specific details>": {
// Details about service repeated for each identified service
}
}
],
"whois": {
// WhoIS data about host includes name, CIDRs, organization, contacts
},
"labels": []
}
Search 2.0 host data model example
The Search 2.0 data model provides cohesive records about individual IPv4 hosts, decoupled ports, and protocol data. The 2021 model presents top-level information about hosts (IP address, location, routing information) and then an array of statically defined services.
In the Search 2.0 data model, Censys combined all information about a specific protocol in a single record.
For example, in the Search 2.0, the host “8.8.8.8” is presented as follows:
{
“ip”: ”8.8.8.8”,
“services”: [
{
“port”:80,
“service”:”http”,
“http”:{
“title”: “Hello World!”
}
],
“location”:{
“city”: “Ann Arbor”,
...
},
...
}
If, for example, this had been a MySQL service, there would be a “mysql” subrecord instead of an “http” one.
IIP web property data model example
The following is an example of a web property presented in the IIP data model.
{
"hostname": <host name>,
"port": <port number>,
"endpoints": [ // repeat section for each endpoint on the asset
{
"hostname": <host name>,
"port": <port number>,
"path": <path name>,
"endpoint_type": "HTTP",
"transport_protocol": "TCP",
"scan_time": "2024-10-12T12:22:18.167Z",
"banner": <banner details>,
"banner_hash_sha256": <banner sha256 hash>,
"http": {
"supports_http2": false,
"uri": <end point URI>,
"protocol": "HTTP/1.1",
"status_code": <status code of endpoint>,
"status_reason": <status reason>,
"headers": {
// Location and location headers
// Content length and headers
// Server headers
// Date headers
// Cache-control headers
},
"html_tags": [
// List of HTML tags
],
"body_size": <size of body>,
"body": <body information>,
"favicons": [
// List of favicons
// Includes size, name, hash_sha256, hash_md5
],
"body_hash_sha256": <body sha256 hash>,
"body_hash_sha1": <body sha1 hash>
}
},
"exposures": [],
"hardware": [],
"labels": [],
"misconfigs": [],
"operating_systems": [],
"scan_time": "2024-10-12T12:23:13.926Z",
"software": [],
"threats": [],
"vulns": []
}