OSINT Sources: What Are The Different Types of Open Source Data?
By Blackdot Solutions
Get the latest news and insights sent straight to your inbox
Organisations within both the public and private sectors are placing increased emphasis on open source data (OSD) when embarking on intelligence-gathering operations.
Open source intelligence (OSINT) is a structured approach to extracting meaningful insights from OSD that involves cataloguing, sorting and prioritising data within an intelligence-led framework.
While the extent of OSINT goes beyond understanding different sources of data, it has to start with identifying the types of open source information available.
The scale and vastness of open source data is its strength as well as its weakness, and it’s only with the advent of modern open source intelligence tools that OSD has become usable in a wider commercial context. Most OSD is available online, but much of it consists of unindexed data which cannot be obtained using standard search engines.
This article will explore the various types of OSD in detail, breaking down each source and its respective opportunities and challenges within OSINT investigations.
Type 1: News and Media
News and media comprises mass media publications, broadcasts, radio, TV, content from media aggregators, books and other forms of print or traditional media. News and media data is published both in front and behind paywalls, and is distributed at international, national, regional and local scales. Even local news can be of pivotal importance, providing intelligence analysts with a rich account of events occurring in close proximity to the source.
Challenges posed to investigators
With millions of pieces of content being published each day in numerous languages, the news and media are in a state of constant movement. Grasping this movement and distinguishing reliable news from ‘fake news’ or other non-reputable media is simpler when investigators can properly understand the origins of these sources.
There are key questions to ask when assessing the reliability of an open source:
- Is there reason to believe that the source is biased in any way? For example, is it backed by a political party or a state-controlled media outlet?
- Does it have a long and reputable track record as an established publication?
- Can the source’s findings be found elsewhere? Are these other sources potentially more reliable?
However, navigating sources using conventional search techniques and manual data handling is immensely time-consuming, particularly when mapping connections between news content and data from other OSD sources, e.g. financial records and company filings.
Additionally, as news content is often repurposed across multiple networks in many different formats, deduplicating content can eat into efficiency without the assistance of purpose-built OSINT tools.
OSINT use cases
OSD collected from the news and media provides historical context to a wide range of events that can be factored into your investigations. As a result, news and media has three clear use cases within the context of OSINT investigations:
- Due diligence: Adverse news screening is also highly useful when assessing prospective clients for risk or integrity issues, regardless of whether they are individuals, businesses, organisations or government departments.
- Revealing networks: News and media can pertain to individuals, organisations and events, and can often reveal new networks or correlate those unearthed from other sources, such as social media posts. This can help investigators understand events and individuals better. For example, news reports might identify individuals present in or related to the story, whether that be a crime, protest, or something else, either by name, image or video. These identifiers can be cross-referenced with additional data to map networks and connections.
- Mapping chronology: News and media OSD is particularly useful for mapping the chronology of news stories and events, helping investigators react to emerging stories that threaten someone or something’s integrity or security.
Suggested reading: Our blog How reliable is Open Source Intelligence? goes more in-depth on why OSINT sources are legitimate value-adds to investigations.
Type 2: Grey Literature
Grey literature includes all manner of publicly available non-media private and public sector policy information. This includes documents and reports from charities, NGOs, inter-governmental institutions and think tanks, as well as crime statistics, census data (e.g. from the ONS) and information contained in academic databases, journals and reports.
Grey literature also includes annual business reports, filing data, and leaked reports. Examples of this include data leaks compiled by reputable organisations, such as the Organized Crime and Corruption Reporting Project (OCCRP) or the International Consortium of Investigative Journalists (ICIJ), who recently exposed an offshore financial system used by numerous world leaders, heads of state, celebrities, and businesses leaders in the Pandora Papers. These OSINT sources are densely populated with well-researched data that is often unstructured and hard to quantify.
Challenges posed to investigators
A key challenge is that this information commonly sits behind a paywall or requires login details in order to gain access. For example, some 42% of global health research is currently published behind paywalls. This is because grey literature largely exists in what is known as the deep web — a part of the internet that is not discoverable on standard search engine results pages (SERPs).
Many OSINT tools offer advanced browser capabilities that allow intelligence analysts to extend their search into results that are non-discoverable using standard web browsers. Moreover, the storage and distribution systems behind grey literature are notoriously disparate and poorly structured, complicating the process of locating and connecting related data points between different sources. In these cases, OSINT tools can offer visualisation capabilities that help investigators to understand the data they have collected from these sources.
OSINT use cases
Grey literature is often used to distribute and disseminate both quantitative and qualitative data between businesses. It is, therefore, data-rich by virtue of its design, allowing investigators to obtain critical investigation context.
Corporate records are a key focus of OSINT gathering, providing information about business transactions, filings, and network connections between various business stakeholders and related organisations.
Grey literature is useful across a range of investigations, especially those where a detailed understanding of corporate networks and finances is beneficial, such as anti-money laundering and asset tracing investigations.
Suggested reading: For a thorough explanation of OSINT and its practical uses, check out our article What is OSINT?
Type 3: Social Media
Social media can cover the entire spectrum of long-form content (e.g. Reddit posts, long-form social media blog posts, Quora answers) as well as short-term content (Tweets, LinkedIn updates, Instagram captions) and photographs, tags and both first and second-degree connections.
The metadata associated with social media content is central to open source intelligence techniques, assisting in network visualisation and understanding the chronology of interlinked events. Social media data is also highly visual, and involves a mass of images and videos often created in close proximity to their subject matter.
Note: Some social media data is public data, and other social media data is not. It’s important to point out that OSINT only covers public social media.
Challenges posed to investigators
Over half the world’s population now use some form of social media and, as a result, it has immense depth and volume, making manual exploration exceptionally laborious. Much of the most useful social media information is unindexed and resides in the deep web, rendering standard surface web browsers insufficient. Fortunately, specialist OSINT tools help investigators to analyse this data and highlight connections at speed, reducing much of the manual legwork.
Furthermore, retaining anonymity is crucial for any social media investigator. Investigations into individuals or networks must remain under the radar and not trigger any signals that might alert the subject to an ongoing investigation.
OSINT use cases
The vast amount of data that is available on social media platforms makes it a powerful source for OSINT investigations. It can be applied in a range of investigation settings, including:
- Threat intelligence: The level of person-to-person data available via social media is not available anywhere else, and provides a non-superficial means of monitoring information about a threat actor’s recent activities, locations or communications – information that might provide insights into previous or planned hostile activity.
- Visualising personal connections: Intrinsic to social media are the connections formed between accounts and their respective posts and content — these assist investigators in visualising connections within and between networks.
As a result of its proficiency in threat intelligence and visualising connections, social media can be applied within a variety of OSINT investigatory contexts. This includes insider threats, bribery,and corruption, as social media allows investigators to uncover unexpected connections a member of staff or customer might have to a malicious actor that would otherwise remain secret.
Type 4: Dark Web
Considered part of the deep web, the dark web is a term used to refer to web pages that are non-indexed and require specialised software to gain access. On the dark web, users and operators remain untraceable, and as a result, it is a rich source of data relating to criminal networks, their activities and connections. User names, addresses and other signals and identifiers are invaluable in forming cross-connections with surface or deep web information, assisting intelligence analysts in identifying connections between accounts and profiles.
The dark web is a unique data medium that is wholly unindexed by standard search engines. Intelligence analysts must exercise care to not expose their own identities or give away their investigation, whilst also remaining distanced from malware or exposure to illegal media. Collecting relevant information from the dark web whilst ensuring regulatory and legal due diligence is critical, and requires specialist OSINT tools and techniques that allow investigators to interrogate information safely without exposing themselves to risks.
Furthermore, investigators should also look to use these OSINT tools in order to avoid unnecessary exposure to potentially traumatic illegal material whilst browsing the dark web.
OSINT uses cases
The dark web provides intelligence analysts with a key opportunity for locating direct connections between criminal activities and their associated user names, addresses and other identifiers. These findings can be cross-referenced and checked against data found on the surface web, allowing law enforcement and threat analysts to observe and identify criminal networks, gather evidence and monitor communication.
All this makes the dark web suitable for use within a number of different types of OSINT investigation. This includes investigations into the sale of illegal products, weapons, for example, and the tracing of drug and wildlife trafficking networks.
Use multiple OSINT sources in your investigations
OSINT sources are complex and conventional research processes are largely insufficient for investigators looking to maximise the potential of open source data. A repeating theme here is the sheer volume of data available. Much of this data is poorly structured or unindexed by conventional search engines, making manual exploration laborious at best, impossible at worst.
The best OSINT tools should allow investigators to seamlessly transition between these complex data sources, mapping connections and documenting network relationships using Intelligent Automation (IA).
That’s why we Blackdot have developed Videris (powered by ShadowDragon©). Videris uncovers multiple layers of connectivity and assists in transforming processes in areas such as:
- Anti-money laundering
- Anti-financial crime
- Corporate due diligence
- Law enforcement and threat analysis
- Illicit trade investigations
This increases the efficiency of OSINT and allows it to take place at a scale not previously possible using manual handling and processing methods. It also adds efficiency to investigations, ensuring that all necessary tasks and processes can be dealt with in the same seamless workflow.
Book a demo today, and see for yourself how Videris can optimise the outcomes of your open source investigations.
What are the different types of open source data?
- News and media
- Grey literature, including corporate records
- Social media
- Dark web
What are the challenges of using open source data?
- Finding the right information amongst constantly changing content and growing volumes of data.
- Keeping track of findings when collecting information from multiple data sources
- Accessing the right data: Information can sit behind a paywall or requires login details for access. The dark web is also not indexed by standard search engines
- Understanding the data: much of the data resulting from OSINT investigations is unstructured, which can make it difficult to interpret or spot patterns in
What are the use cases for open source data?
- Anti-money laundering
- Enhanced due diligence
- Counter-terrorism or serious and organised crime
- Bribery and corruption
- Corporate security or insider threats
How do I choose the right OSINT tool for the data I’m using?
1. Explore and understand what’s available to you. There are multiple OSINT tools to choose from depending on your requirements.
2. Book a demo. Get a feel for the technology and how it could benefit you in the long term.
3. Have a discussion. After your demo, we can answer any questions you have about the software and how your organisation could benefit.