OSINT Sources: What Are The Different Types of Open Source Data?

By Blackdot Solutions

Copy of Featured Images

Organisations within both the public and private sectors are placing increased emphasis on open source data (OSD) when embarking on intelligence-gathering operations. Open source intelligence (OSINT) is a structured approach to extracting meaningful insights from OSD that involves cataloguing, sorting and prioritising data within an intelligence-led framework. This goes far beyond understanding the different data sources. However, the basics start with identifying the types of open source information available. 

The scale and vastness of open source data is its strength as well as its weakness, and it’s only with the advent of modern open source intelligence tools that OSD has become usable in a wider commercial context. Most OSD is available online, but much of it consists of unindexed data which cannot be obtained using standard search engines. This article will explore the various types of OSD in detail, breaking down each source with its respective opportunities and challenges within OSINT investigations.

Suggested reading: For a thorough explanation of OSINT and it’s practical uses, check out our article What is OSINT? If you want to go beyond simply understanding the source types and build an effective OSINT strategy, check out our recent publication — The OSINT Handbook. 

Type 1: News and Media 

News and media comprises mass media publications, broadcasts, radio, TV, content from media aggregators, books and other forms of print or traditional media. News and media data is published both in front and behind paywalls, and is distributed at international, national, regional and local scales. Even local news can be of pivotal importance, providing intelligence analysts with a rich account of events occurring in close proximity to the source. 

Challenges posed to investigators

With millions of pieces of content being published each day in numerous languages, the news and media are in a state of constant movement. Navigating this source using conventional search techniques and manual data handling is virtually impossible, particularly when mapping connections between news content and data from other OSD sources, e.g. financial records and company filings.

Since news content is often repurposed across multiple networks in many different formats, delineating duplicate content is also a major issue without the assistance of purpose-built OSINT tools. Moreover, distinguishing reliable news from ‘fake news’ or other non-reputable media is simpler when investigators can properly understand the origins of news and media sources.

OSINT use cases

OSD collected from the news and media provides historical context to a wide range of events occurring at multiple levels. As a result, news and media has three clear use cases within the context of OSINT investigations:

  • Revealing networks: News and media can pertain to individuals, organisations and events, and can often reveal new networks or correlate those unearthed from other sources, such as social media posts. This can help investigators understand events and individuals better. For example, news reports might identify individuals present in or related to the story, whether that be a crime, protest, or something else, either by name, image or video. These identifiers can be cross-referenced with additional data to map networks and connections.
  • Mapping chronology: News and media OSD is particularly useful for mapping the chronology of news stories and events, helping investigators react to emerging stories that threaten someone or something’s integrity or security. 
  • Integrity screening: Adverse news investigations are also highly useful when screening prospective clients for risk or integrity issues, regardless of whether they are individuals, businesses, organisations or government departments. 

Type 2: Grey Literature

Grey literature includes all manner of publicly available non-media private and public sector policy information. This includes documents and reports from charities, NGOs, inter-governmental institutions and think tanks, as well as crime statistics, census data (e.g. from the ONS) and information contained in academic databases, journals and reports. 

Grey literature also includes annual business reports, filing data, and leaked reports. Examples of this include data leaks compiled by reputable organisations, such as the Organized Crime and Corruption Reporting Project (OCCRP) or the International Consortium of Investigative Journalists (ICIJ), who recently exposed an offshore financial system used by numerous world leaders, heads of state, celebrities, and businesses leaders in the Pandora Papers.1 These OSINT sources are densely populated with well-researched data that is often hard to quantify.

Challenges posed to investigators

A key challenge is that this information commonly sits behind a paywall or requires login details in order to gain access. For example, some 42% of global health research is currently published behind paywalls.2 This is because grey literature largely exists in what is known as the deep web — a part of the internet that is not discoverable on standard search engine results pages (SERPs).

Many OSINT tools offer advanced browser capabilities that allow intelligence analysts to extend their search into deep web that is non-discoverable using standard web browsers. Moreover, the storage and distribution systems behind grey literature are notoriously disparate and poorly structured, complicating the process of locating and connecting related data points between different sources. In these cases, OSINT tools can offer visualisation capabilities that help investigators to understand the data they have collected from these sources.

OSINT use cases

Grey literature is often used to distribute and disseminate both quantitative and qualitative data between businesses. It is, therefore, data-rich by virtue of its design, thereby allowing investigators to obtain critical investigatory context. 

Corporate records are a key focus of OSINT gathering, providing information about business transactions, filings, and network connections between various business stakeholders and related organisations. Policy documents and reports can be correlated to financial records to expose discrepancies. Such information can be compared against leaked financial records to enrich and cross-reference corporate records, making grey literature most useful in money laundering and asset tracing investigations.

New call-to-action

Type 3: Social Media

Social media can cover the entire spectrum of long-form content (e.g. Reddit posts, long-form social media blog posts, Quora answers) as well as short-term content (Tweets, LinkedIn updates, Instagram captions) and photographs, tags and both first and second-degree connections. The metadata associated with social media content is central to OSINT, assisting in network visualisation and understanding the chronology of interlinked events. Social media data is also highly visual, and involves a mass of images and videos often created in close proximity to their subject matter.

Note: Some social media is public, and other social media is not. It’s important to point out that OSINT only covers public social media. Targeting information hidden behind privacy settings is not only unethical, it isn’t strictly OSINT or OSD. 

Challenges posed to investigators

Over half the world’s population now use some form of social media, and as a result, it has immense depth and volume, making manual exploration exceptionally laborious. Much of the most useful social media information is unindexed and resides in the deep web, rendering standard surface web browsers insufficient. Fortunately, specialist OSINT tools help investigators to analyse this data and highlight connections at speed, reducing much of the manual legwork.

Furthermore, retaining anonymity is crucial for any social media investigator. Investigations into individuals or networks must remain under the radar and not trigger any signals that might alert the subject to an ongoing investigation.

OSINT use cases

The vast amount of data that is available on social media platforms makes it a powerful source for OSINT investigations. It can be applied in a range of investigatory settings, including:

  • Threat intelligence: The level of person-to-person data available via social media is not available anywhere else, and provides a non-superficial means of monitoring information about a threat actor’s recent activities, locations or communications – information that might provide insights into previous or planned hostile activity.
  • Visualising personal connections: Intrinsic to social media are the connections formed between accounts and their respective posts and content — these assist investigators in visualising connections within and between networks.

As a result of its proficiency in threat intelligence and visualising connections, social media can be applied within a variety of OSINT investigatory contexts. This includes insider threats, bribery, and corruption, as social media allows investigators to uncover unexpected connections a member of staff or customer might have to a malicious actor that would otherwise remain secret.

Type 4: Dark Web

Considered part of the deep web, the dark web is a term used to refer to web pages that are non-indexed and require specialised software to gain access. On the dark web, users and operators remain untraceable, and as a result, it is a rich source of data relating to criminal networks, their activities and connections. User names, addresses and other signals and identifiers are invaluable in forming cross-connections with surface or deep web information, assisting intelligence analysts in identifying connections between accounts and profiles.

Challenges

The dark web is a unique data medium that is wholly unindexed by standard search engines. Intelligence analysts must exercise care to not expose their own identities or give away their investigation, whilst also remaining distanced from malware or exposure to illegal media. Collecting relevant information from the dark web whilst ensuring regulatory and legal due diligence is critical, and requires specialist OSINT tools that allow investigators to interrogate information safely without exposing themselves to risks. Furthermore, investigators should also look to use these OSINT tools in order to avoid unnecessary exposure to potentially traumatic illegal material whilst browsing the dark web.

OSINT uses cases

The dark web provides intelligence analysts with a key opportunity for locating direct connections between criminal activities and their associated user names, addresses and other identifiers. These findings can be cross-referenced and checked against data found on the surface web, allowing law enforcement and threat analysts to observe and identify criminal networks, gather evidence and monitor communication. 

All this makes the dark web suitable for use within a number of different types of OSINT investigation. This includes investigations into the sale of illegal products, weapons, for example, and the tracing of drug and wildlife trafficking networks

Use multiple OSINT sources in your investigations

OSINT sources are complex, and conventional research processes are largely insufficient for investigators looking to maximise the potential of open source data. A repeating theme here is the sheer volume of data available. Much of this data is poorly structured or unindexed by conventional search engines, making manual exploration laborious at best, impossible at worst.

The best OSINT tools should allow investigators to seamlessly transition between these complex data sources, mapping connections and documenting network relationships using Intelligent Automation (IA). That’s why at Blackdot, we’ve developed our Videris software. 

Videris transforms processes in areas such as anti-money laundering, anti-financial crime, corporate due diligence, law enforcement and threat analysis, corruption, and illicit trade investigations by uncovering multiple layers of connectivity. This increases the efficiency of OSINT and allows it to take place at a scale not previously possible using manual handling and processing methods. It also professionalises investigations, ensuring that all necessary tasks and processes can be dealt with in the same seamless workflow.

Book a demo today, and see for yourself how Videris can optimise the outcomes of your open source investigations.

New call-to-action
1 Pandora Papers
2 Open to the public: paywalls and the public rationale for open access medical research publishing