Dark Web vs Deep Web OSINT Investigations

And how to incorporate each within OSINT Investigations

Many consider the internet to be one monolithic structure accessible via search engines like Google, Yahoo and Bing. In reality, quintillions of bytes are created on the internet each day, but this data is dispersed through 3 different parts of the internet — the surface web, deep web and dark web.

A popular analogy for the anatomy of the internet is an iceberg — only a small percentage of it is visible when using a typical search engine. Whilst obtaining precise figures for the proportion of the internet attributable to either the surface, deep or dark web is difficult, most estimates place the proportion of the internet formed by the deep web at around 93%.¹

The dark web is even harder to measure. It is likely to occupy less than 0.1% of the internet, with the remainder (around 6%) consisting of the surface web, which is indexed and accessible by standard search engines.

There are various open source investigation (OSINT) techniques that investigators can use to understand deep and dark web activity, thereby allowing connections to be mapped across these three distinct components of the internet. This article will break down the dark web, deep web and surface web, and look at the opportunities and limitations they have within OSINT investigations.

Suggested reading: To learn more about open source investigation best practices, check out our recent publication — The OSINT Handbook

What is the surface web?

The surface web is an information system formed by the world wide web and accessible to the general public via search engines. It is crawled by bots, or spiders, which follow links (URLs) and analyse web content by reading code on the page. The web crawling process itself has two parts:crawling and indexing, the former referring to the process of crawling links to discover webpages, and the latter to the process of analysing the content on those web pages for indexing. Indexed results will then appear on the search engine results page (SERPs).

In summary:

The surface web refers to the indexed world wide web: the most popular information system used to access content over the internet.
Google, Bing, Yahoo and other search engines employ web crawlers or web spiders, which crawl the web for indexable content. This content is organised in the search engine for retrieval via search.
The surface web contains publicly accessible web information that is indexable (e.g. it is not placed behind a subscription wall, private login, paywall, or is labelled to not be indexed).
Whilst search engines are not particularly equitable, as search engine optimisation (SEO) dictates the order of the SERPs, it’s still theoretically possible to find any indexed web page using a search engine.

The surface web in OSINT investigations

The surface web is the first port of call for almost any OSINT research process. While the data accessible through the surface web is small compared to that of the deep web, it represents a well-structured and well-organised source of information that contextualises the publicly visible components of networks, events and connections.

Opportunities and Uses

Within the context of OSINT investigations, the surface web provides investigators with an opportunity to trace publicly visible names and identifiers to kickstart investigations. News events, public reports, and public forums provide solid starting points. Poor or lazy surface web habits, like a user having the same username on a public forum or YouTube as they do on a dark web marketplace, can allow investigators to start building connections.

Publicly available and accessible social media is an important component of the surface web, enabling OSINT researchers to draft networks and their connections. Social media tools (SOCMINT) can be deployed to optimise the task of surface web network mapping, allowing investigators to generate initial leads, visualise connections and discover further avenues for exploration.

Limitations and Challenges

While the surface web provides a powerful means to explore and contextualise public information, the potency of OSINT investigations is multiplied by the deep and dark web. The surface web is limited — it’s explicitly designed for easy search and navigation by the public. It’s possible to get quite far with just a surface web investigation, but researchers are likely to hit a brick wall at some point.

In addition to this, the extent to which search engines dominate the surface web limits its usefulness within OSINT investigations. Search engines are designed for consumers, not investigators, and as a result search engines bring back results they think the user wants to see. This can be useful for consumers searching for a product, but very unhelpful for investigators seeking out unbiased information. SEO further undermines the effectiveness of the surface web in OSINT investigations, making results more reflective of marketing spend and strategy than relevance or quality.

What is the deep web?

The deep web is often conflated with the dark web in public discourse, but they are not the same. Web content in the dark web is de facto ‘invisible’ to search engines because they are unable to crawl it. Where this differs from the deep web is that much of the deep web is not intentionally hidden from public access, whereas dark web content is deliberately obscured.

Often, search crawlers can’t index the deep web because web pages instruct them not to, and the content usually requires authentication to access. Any webmaster can place a script on their website (called the robots.txt) to instruct web crawlers to not crawl certain URLs.

Deep web sources include:

Grey literature, which includes corporate and working papers, white papers, reports, evaluations, and unpublished academic data.
Database material which is not indexed by search crawlers, but is instead indexed internally, and is therefore not directly accessible using surface web browsers.
Paywall and password-protected content from academic, corporate, governmental, legal, financial, NGO and medical/public health sources.
Data contained on private intranets or cloud storage like OneDrive, DropBox etc.
Emails and messages sent using messaging platforms and web apps.

The deep web in OSINT investigations

The deep web is colossal, perhaps 500 times larger than the surface web, and much of it is considered open source — the fact it isn’t indexed and readily accessible by commercial search engines is irrelevant.²

Opportunities and Uses

Deep web grey literature provides a powerful means to discover links and discrepancies between unindexed records, leaked information and public filings. OSINT researchers can use the deep web to map networks using both publicly accessible social media information and social media data contained within the deep web, including images, video and metadata.

Deep web OSINT provides data obtained from behind logins, e.g. publicly available forums that require membership, corporate records and sanctions lists. In these instances, the information is intended and available for public consumption but requires user authentication, unlike the surface web.

Limitations and Challenges

The primary challenge of using the deep web arises from the fact that standard search engines do not index it in the same way they do the surface web, making it far more difficult to navigate. Furthermore, this means that relationships and considerable expertise are essential in order to access all the data sources across the deep web.

What is the dark web?

The dark web is often defined as an extension of the deep web, which is true in that the dark web is also hidden from surface web indexing. However, the dark web is hidden by intent, and designed with specific technologies to protect user anonymity. Numerous prominent surface websites, including Facebook and the New York Times, host mirrors of their content on the dark web for this exact reason, as it allows political dissent and freedom of expression in authoritarian countries without fear of identification.³

The dark web uses cryptographic methods to partially anonymise users. This is done primarily by relaying encrypted traffic through a series of nodes, also known as onion routing, using TOR (or The Onion Router) browsers. This obfuscates IP addresses and other identifiers, hiding the user’s requests and communications. The network infrastructure is dynamic and randomised, making connections difficult to trace.

Key dark web facts:

The dark web has hosted criminal activity and black markets since the late 1990s, and also hosts marketplaces that sell everything from drugs and firearms to stolen data and illegal services.
As well as drugs, firearms, and financial crime, the dark web is used by terrorist groups from around the world.
Following the 2015 Paris terrorism attack, many ISIS propaganda websites and archives were unearthed on the dark web.⁴ There is evidence that terrorist groups use the dark web for fundraising, communications, and the purchase of weaponry.
The dark web also has a history of use for internal corporate or governmental risk discussions, e.g. preceding an expected data leak or whistleblowing event.

The dark web in OSINT investigations

The dark web provides a rich source of illicit data. However, it’s estimated that only 6.7% of users access TOR specifically for illegal or illicit purposes.⁵ That is still roughly 1 in 20, a very large proportion compared to those who access the surface web for the same end goals.

Opportunities and Uses

OSINT investigators can form links between the surface and dark web via users’ own poor anonymity measures, leaking their own personal information in the process of communicating with others via forums. One such prolific example is drug dealer Carl Stewart, who was successfully prosecuted on the basis of his fingerprints being found on the wrapper of some stilton cheese, which he posted a picture of to an encrypted messaging service.⁶

Everything from usernames to forum signatures and captions can be linked between the surface and dark web. Researchers can even use natural language processing (NLP) techniques to correlate how users use written language to communicate with their networks.

Limitations and Challenges

The challenges of utilising dark web data revolve around dark web access, which requires specialised tools and network configurations to remain anonymous and not expose the researcher’s identity. The dark is also incredibly unstructured, and not indexed in the same way as the surface web. This makes navigation and finding information relevant to investigations very difficult. Other considerations include risk of exposure to malware, illegal or distressing content.

Many of these challenges can be navigated with specialised dark web OSINT tools. These enable safe, anonymous access to the dark web, allowing researchers to interrogate and analyse information in a careful, granular way, converting the dark web into a safe resource for investigations.

Investigations require multiple web sources

The modern internet consists of 3 layers, the surface web, deep web and dark web. Whilst these exist as independent entities, each source is complementary to the other, and OSINT researchers can use the vertical links between them to further their investigations.

Here at Blackdot, our goal is to enable the safe and secure gathering of open source data (OSD) from the surface, deep and dark web. Modern OSINT investigations require seamless access to various OSINT sources — any missing source might mean a missing piece of the puzzle. Using the dark web safely and efficiently represents the final frontier for OSINT, providing a powerful means to switch between each layer of the web, mapping connections and following threads between disparate information.

Utilising the numerous data points that can be pulled from across the dark, deep, and surface web from a single platform requires a powerful OSINT solution. That’s why Videris was created, providing OSINT investigators with:

Data collection, analysis and visualisation in a single platform.
Anonymity and data security.
Intelligent automation to inform human decision making.

If you’re interested in finding out what Videris can do for your OSINT investigations, book a demo with us today.

FAQs

What is dark web OSINT?

Dark web OSINT is the practice of collecting and analysing publicly available information from the dark web to uncover illicit activity, emerging threats and hidden networks. It often involves safely navigating unindexed, unstructured forums and marketplaces using OSINT tools to identify links to surface web identities through leaked personal details, reused identifiers or linguistic patterns.

How can you safely access the dark web for investigations?

OSINT tools like Blackdot's Videris allow investigators to access the dark web without compromising their organisation's security and their personal data, as well as ensuring that they aren't exposed to illegal or distressing content, including imagery.

Is using OSINT legal?

Yes, OSINT is legal because it involves collecting and analysing information that is publicly available, but how you gather and use it is also important. Accessing restricted systems, bypassing logins or paywalls without permission, violating platform terms or using OSINT for harassment, fraud or unlawful surveillance can cross legal and ethical boundaries.

Is the deep web OSINT?

Yes, much of the deep web can be considered OSINT because it often contains open-source information that is publicly available but not indexed by search engines. This can include grey literature, databases or content behind logins. Not everything in the deep web is OSINT, however. Private intranets, personal emails and other restricted data aren’t open source unless you have lawful access and permission.

¹What is the dark web? How to access it and what you’ll find

²White Paper: The Deep Web: Surfacing Hidden Value

³Who’s Afraid of the Dark? Hype Versus Reality on the Dark Web

⁴ Terrorist Migration to the Dark Web

⁵Who Commits Crime On TOR? A New Analysis Has A Surprising Answer

⁶Cheese photo leads to Liverpool drug dealer’s downfall