In this episode of From the Source, host Matthew Stibbe interviews Henri Beek, a lead analyst at Data Expert in the Netherlands. They discuss the integration of agentic AI in OSINT, the evolution of Henri's career in the field, and the challenges posed by synthetic data and ethical considerations surrounding breached data. Henri shares insights on the fragmentation of the internet and its implications for OSINT investigations, as well as resources for professionals in the field.
Matthew Stibbe (00:01) Hello and welcome to From the Source, the Blackdot podcast. I'm your host Matthew Stibbe and today I'm talking to Henri Beek who is lead analyst at Data Expert in the Netherlands. Good morning, great to have you on the show.
Henri (00:16) Matthew, thank you for having me on the show.
Matthew Stibbe (00:19) And what I want to start with is a question I ask all my guests. What are you geeking out about at the moment in the world of OSINT and not just LEGO and Star Wars?
Henri (00:31) Well, of course, Lego and Star Wars, as you mentioned, but ⁓ currently I'm dabbling a lot with agentic AI and trying to see if there is a possibility to build something like an OSINT agent that could help you out during your investigations and ⁓ leave all the boring stuff to the agent so I can focus on the more serious stuff myself, for example.
Matthew Stibbe (00:58) And can you give me an example of what you think might be possible with agentic AI and OSINT? What sort of use cases are there for that?
Henri (01:09) Of course you have to be careful with ⁓ PII, personally identifiable information. But for example, currently in my role, I do a lot with ⁓ domains, IP addresses, hash values.
And I would love to have an agentic AI feeding it with some of the IP addresses I come across and just let the ⁓ agent go out and shop for me the information that is relevant about those IP addresses, for example. Instead of what Opera Browser currently is doing... let it buy 12 pairs of white socks for you, for example.
Matthew Stibbe (01:49) And in my world of marketing, we're also exploring how you could use AI to do some of that, maybe finding companies that are using HubSpot software or companies in a particular target sector. And I've seen quite a lot of people using AI in combination with software like Make.com, Zapier at a trivial level
to sort of build these workflows and pipelines, so you know, you do something over here, you get a spreadsheet, you put it into another. Is that something that is of interest in the world of OSINT? Is that something you're looking at?
Henri (02:25) Yeah, I think that's quite interesting. Again, ⁓ the problem we always have with OSINT, it's always classified to a certain level. It's about PII, so you don't want to have your procedure running around several platforms to get to the end and have the data dropped everywhere. So I'm trying to see if it's possible to run something locally, maybe on my own
laptop or like a server to put the info in there, ⁓ let it go out on its own outside and then come back with the relevant data but still process all the stuff on the server alone.
Matthew Stibbe (03:03) Yes.
Hearing this a lot. Local AI, local LLMs or, I mean, in a different context, Apple running... claiming to be running Siri or ChatGPT in a little, local sort of private cloud for you. It's... it's I think this is going to be a key thing isn't it just privacy and LLMs. I know you are online as K2S OSINT and you've got a GitHub repo and and content there and I know you do a little bit of
Python and scripting and things. Does AI have ⁓ a role to play in helping you write those sorts of scripts? Can it debug things? I'm not a developer anymore, so I'm just curious about this.
Henri (03:47) Yeah, of course it helps. Although I must say, some of the courses we also teach here, we try to stay away, in the beginning, from AI because it's also very important to understand what you are doing. I mean, I've been doing Java scripts, HTML, I think for over 10 years now. And of course, if you want to write something like a bookmarklet, you can ask AI to do it for you. But when it messes up,
⁓ it sometimes messes up badly and you don't have a clue with AI how to fix it. Then, it helps if you have some knowledge yourself. So yes, it helps. I always see AI as sort of companion like the Yoda on the back of your shoulder, who can support you with the stuff you're doing, but not in the lead role, but basically more like support.
Matthew Stibbe (04:44) Code you will write, yes. So, I agree. I think we have had a client who went on a one day AI course and came back and said, look, I've built a website. Why am I paying you money for websites? And when you actually look at it, superficially it looked like a website, but there was all kinds of problems with it. It was very heavy code and it wasn't very well optimised... it had used... AI had just picked randomly some out of date libraries.
And, with great respect to her because she was astonished and amazed and it's extraordinary that what you can do, quickly, but would you put your hundred million pound business on a website that had been badly coded by AI? I'm not sure, I don't think so. So the fundamental skills are still there. Do you think there are other risks around the use of AI in OSINT? I mean, there are opportunities obviously. What do professionals in the OSINT community need to
beware of, be careful of?
Henri (05:44) Yeah, well, I've heard about people writing reports using AI, but then again, putting all of their information of their investigation ⁓ into an online AI. And of course, nowadays you have like the ⁓ memory function where you can say, hey, this is just like private navigation in your browser. This conversation is temporary. And after that, you have to forget it. But recently, I know there are some lawsuits against some of those
LLM companies ⁓ that are forcing them to keep the data even if the user... Exactly. So I think that's quite a big risk ⁓ and I think for OSINT investigators it's better to just take it step by step. ⁓ One of the presentations I gave a couple of months ago is...
Matthew Stibbe (06:20) every conversation, even the temporary ones. Yes, I saw that too.
Henri (06:40) I mean, we all know Python and to build a little web scraper to get some pages with information back to us, and that's something you can still do. You can still use just basic Python to scrape a web page. And what I did for the analysis part is I used an LLM, like an old copy of Mistral, but I ran it locally. So the processing and the information is still locally and not online out there. And I think if you look at
it into steps, so like a process. This step can be done with AI, and this step shouldn't be done with AI, I think that's a more powerful and more secure use of AI in your process than just throwing your whole intel cycle into AI, press the button and hopefully the answer rolls out, right?
Matthew Stibbe (07:32) Yeah, I agree. I think that's good advice for anyone experimenting with AI right now. So Henri, I want to move on a little bit and learn about you. So can you tell us a little bit about your career and journey to OSINT? How did you end up in this world?
Henri (07:49) Oh, that's a long while ago nowadays, but I think I've now been about 17 years professionally into OSINT.
It started out basically with ⁓ gaming, ⁓ multiplayer gaming, trying to find out on the internet, well, in the stages of the internet when you still had a modem with a phone line, just to find out who's the adversary and see if you can find stuff that you can shout at him during a multiplayer match. So they would...
slip up and you would win the match, right? And then I started to dabble a little bit into scripting and we had some nice social media platforms here in the Netherlands where people put a lot of their data publicly available online, ended up writing some scripts on that and I got the attention of a private investigation firm who were busy working with OSINT but in that day it was still called desk research. Research.
So, their way of OSINT was basically a print of the local Chamber of Commerce, maybe the land register and, if you were lucky, Google search. And that was their OSINT procedure. And it was fine for that time. I mean, we still had phone books on paper, right?
Matthew Stibbe (09:07) Yeah.
Henri (09:09) But ⁓ when I wrote that script and I found that and I had some connections as well within that firm, I got accepted in the role and basically started to find out more about OSINT and that it really was a profession.
And, just try to roll with what the world of OSINT was going to. And at that time, we're talking about like 2008, 2009, there weren't a lot of training courses or things you could find online that could help you do OSINT or...
find a methodology or a way of reporting. I think there was one in the Netherlands, one company that provided it, but nothing else. So it was really a journey of discovering and ⁓ like those cartoons on Cartoon Network back in the day, 'ooh, what does this button do?' ⁓
Finding out, breaking stuff and trying to figure out if you could find some more information on things. Yeah, basically rolling from the private investigation firm to a digital forensics investigation firm. And now to my role within the Data Expert, first as a trainer, training law enforcement, ⁓ Department of Defense, other government bodies and companies ⁓
how to utilise OSINT within their types of investigation. And, currently, now my role is more into cyber threat intel. So, instead of looking at ⁓ people and ⁓ companies and due diligence and stuff like that, I'm more into risk analysis, ⁓ indicators of compromise.
So yeah, OSINT is a very dynamic landscape with all kinds of niches. So, it's nice to every once in a while, shake up your own ⁓ role and to find something new and interesting to dive into.
Matthew Stibbe (10:57) Mmm.
And Data Expert, tell me what that business does. Introduce that for our listeners.
Henri (11:10) Yeah, we're a company that's been over here for 35 years now in the Netherlands, starting out as a company that sold ⁓ forensic solutions to make forensic copies of hard drives or ⁓ phones
but during these years it expanded with an analytical ⁓ branch selling analytical products, selling OSINT products. Later out it branched also with an academy, which I have been a part of for over five years, teaching companies, government bodies, about cybercrime, OSINT, cryptocurrency investigations. And lastly, now we have a
cybersecurity branch where we have an incident response team, a security operations centre, and also we have a fraud solution for banks that monitor transactions for banks and can add fraud indicators.
Some transactions can be flagged for banks as being not quite a good transaction. So basically, Data Expert started as a small company. I think now we have expanded to the Nordics and Poland. We're over more than 200 colleagues. I think it was number 40 or 42 or something when I started five and a half years ago. And now we have like over more than 200 colleagues. Yeah, business is good.
Matthew Stibbe (12:39) Yeah, well, that's an amazing story. And I like that it started with computer games. That's also where my journey started, but that's another story in another podcast. So, we were discussing before we started recording a few things. And one of the things that you were telling me about ⁓ was synthetic data. And this is a new phrase to me. So explain to me what it means and how you can use it or what you can do with it.
Henri (13:08) Well, basically it's more of a pitfall we are going to see more and more within OSINT. Synthetic data is AI-generated data that's being thrown out there. So you can think, obviously, of like the deep fake videos or the deep fake pictures we are seeing. But nowadays complete websites with news content that is fake...
fake persona on social media, of course, that was already a thing but now it's even easier to put out a sock puppet or a fake persona online.
On one hand, it offers opportunities because we don't have to manage our sock puppets that much as we nowadays have to do. But, on the other hand, it makes our investigations also more complicated because we have to try to deviate between what's real data, what's AI-generated data, and we have to scale it in... how much can we trust it? ⁓
Is it good enough to help in our investigation or will it be ⁓ a larger problem in the coming years where we can have to sift through the mud basically to find those gems of real data?
Matthew Stibbe (14:24) Is it fundamentally
Is it a filtering issue or a scale and volume issue? I mean...
Henri (14:32) Yeah, I think a bit of both. I mean, if you talk about scaling, just look at the basic reviews on a website or ⁓ comments on an Amazon store article. Half of those comments below it are AI-generated. And 'half' is my opinion, that's not a fact, by the way. But,
basically, when I want to buy something on a store online, I now have to sift through those comments to see... are these really good reviews or are those all AI-generated? Even you can sometimes see the phrases that are used by LLMs. If you filter on that, you can just basically filter out all of those LLM comments. And also detecting it, or, I mean, I'm using Midjourney, for example, as an image generator.
If I want to generate someone like ⁓ a Dutch male person sitting in Amsterdam with a cup of coffee on the table, actually my colleague is very good at that kind of generation as well. ⁓
The quality of the images are improving that much that it's getting harder and harder to see if it's really a ⁓ generated image. And since the legislation is not in place yet about how we are going to...
watermark AI-generated content or other technical solutions, there's still no consensus about it. I think that will be one of the ⁓ issues we're going to face as an OSINT investigator, is how to ⁓ sift through the mud again to find those gems of real data ⁓ while we see all those AI-generated synthetic data popping up at search engine results or in databases or whatever.
Matthew Stibbe (16:28) At the moment, I'm very interested in copywriting because of the work I do at Articulate Marketing, and it's reasonably easy to see when something's being generated by ChatGPT. There are sort of tells, words, phrases, signatures, ⁓ tone of voice, but it's...
I think there's going to be a need for some sort of advanced reverse Turing test for those photos from Midjourney, and I'm not talking about right now, but like, next year and the year after it might be really much harder to go 'yeah that's AI slop' or you know 'that's computer-generated' or... It's a challenging thing. Is there any research or any sort of tools out there that might help people detect it or identify it?
Other than instinct.
Henri (17:18) Yeah, there are
There are some Chrome extensions that ⁓ at least say they can detect AI-generated content. Also, like plagiarism detectors, for example. But, yeah, the output of it is doubtful. ⁓ Sometimes it's good, but...
nine out of ten times it will give a result it's AI-generated, and either I am possibly AI that I can write that good, but if I write it myself it still will say it's AI-generated, for example, and what you said about tone of voice...
I also use Claude, for example, or 'Claude' it's called more in France, of course. It can mimic my tone of voice if I feed it like 10 or 20 articles it will mimic my tone of voice and I can just say write an article about this stuff. No worries, I still write my own articles. But, ⁓ it tends to to pick up that tone of voice quite good and
if I look at Claude versus ChatGPT, ⁓ Claude is way better at writing articles.
Matthew Stibbe (18:33) Yeah, Better
for copy. We did some testing with... there are some sites and services online that you can use to say, this copy, is this article AI-generated or not? And it gives you a sort of a percentage. And we wrote some things and we knew exactly how much AI we had used from absolutely none at all, all the way through to ⁓ auto-generated. And there's a little bit where we will use
Grammarly or something like that, which is an AI sort of spell checker, grammar checker and you know, it will change some things and it was reasonably good. I mean it was...
I wouldn't bet money on the outcome but it was, you know, the proportion was about right. Anyway, it was interesting, but I suspect that's going to become harder and harder. So anyway, let's move on. ⁓ another thing that I think is
is really fascinating, you were telling me earlier about how OSINT professionals are sort of grappling with information that has come available because of a data breach and the ethical and the legal issues. And I wondered if you could talk a little bit about that.
Henri (19:47) Yeah, what we see more and more nowadays within ⁓ OSINT tooling and OSINT sources and, of course, ⁓ sometimes even just based on information that's freely available ⁓ is the ethical and legal implications of using breeched data, for example. And, what I see throughout the landscape
within Europe but also globally... what you see is that some law enforcement teams are allowed to use it. ⁓ Some cybersecurity firms are allowed to use it, but also we're based on the country, based on the legislation. But, as an OSINT professional, it's sometimes...
Yeah, it's it's troublesome to see if you can use it, although it's a very good source of information. It still is stolen data. So what I'm very interested to see is in the next coming months, years, if there will be like a general consensus on are we allowed to use breached data and what kind of degree are we allowed to use breached data for OSINT investigations? And maybe, of course, also depending on the type of investigation, is it like a
a digital footprint or do we want to find the whereabouts of someone that's gone missing, for example, and there are some rules and regulations about it yet but to my knowledge there's still not a general consensus or general guideline about ⁓ using it within the OSINT community. So, for me it's very interesting to see, yeah, what kind of
legislation or guidelines are we going to get and I hope that there are some organisations who are going to pick up that ball and ⁓ are going to work with it but I don't see a lot of movement currently going on.
Matthew Stibbe (21:40) As an outsider I don't know what governing bodies or professional organisations exist in the OSINT world, but where would such guidelines or ethical guidelines come from? Who might be producing those?
Henri (21:55) Well, I can imagine that if you are working in law enforcement, they have a large, of course, ⁓ governmental body watching what they are doing, like the DA, for example, and to see if it's allowed to use that kind of information.
And throughout some European police agencies, or a lot actually, there is a consensus of which teams are allowed to use it and in what way. But you, of course, have also other kinds of organisations that do investigation because of due diligence or know your customer or otherwise, ⁓ that there still not is a general consensus about or any
government body saying, well, hey, you should or should not use it or allowed to use it. And I think it's a great source of information, as long as you haven't bought it, of course, that we could use an investigation, even if it's just for verification of some of the information. But
But, again, I still see so many question marks around the usage of it between different countries or continents around the world. I'm very curious to see where it goes.
Matthew Stibbe (23:16) I'm observing as I'm exploring this world with the guests on this podcast, ⁓ a lot of this ethical resolution seems to come from individuals' professional integrity, especially perhaps outside ⁓ government and law enforcement. And I'm interested if there are professional bodies that set standards or set expectations or if there are common...
commonly understood professional guidelines.
Henri (23:48) Well, if you look at the OSINT community, you have like an organisation like Osmosis, for example, US-based organisation that tend to set guidelines. They also have like a training course, with an exam, where part of the exam also talks about professional guidelines and ethics. ⁓
But, since I am an outsider to the US, I also did the exam. But for me, the legal differences between one state and the other in the US, it was like a living nightmare to get that in my head.
But you see they are trying to set up professional guidelines, for example, and of course there are some other organisations throughout the world on cyber security as well like SoNS, for example, or ISC that tried to set up professional guidelines as well and how to behave. But again it's like splintered bodies trying to set up those guidelines and...
it would be nice if there was a general consensus. And, if I look at it ethically for myself, I don't tend to use like data breach data in every investigation. It simply has to be, how do you say, that ⁓ professionally and ethically used in my investigation, and nine out of 10 times I have consent of, for example, a candidate I am investigating because we're pre-employment screening.
Then, they know I'm going to use it and then it's okay for the investigation. Or, if it's like a missing person case, if we would assist with that, then I could understand that you have to use it. But, I don't think you should use it for just every investigation where a person is involved. But again, ⁓ is there like a universal guideline for this? Not one that I'm aware of.
Matthew Stibbe (25:45) You've talked about the fragmentation of regulation or guidelines. There's another kind of splintering, Balkanisation, fragmentation that we were discussing that I think is very interesting, which is the splintering of the web, in your phrase. What does that mean to you and what do you think the impact of that is going to be on OSINT investigations?
Henri (26:07) Well, ⁓ back in the day, so this is grandpa Henri telling how the internet works back in the day, but, you know, when we still had phone lines and modems using phone lines to connect to the World Wide Web, ⁓ you had like a general World Wide Web where information was freely available. And there was...
not that much of a border between countries on the World Wide Web. And what you currently see with all the geopolitical tension is that countries tend to block certain content, disallow the usage of VPN, ⁓ or even block entire websites or even disconnect themselves from the World Wide Web to keep information in
for other users, also keep other information out for their own citizens, I reckon. And I think that's also a problem we're going to see as an ⁓ OSINT professional is that for some countries, it will become harder and harder to get information from really public sources because simply the websites will be...
of reach even with a VPN to collect the information.
Matthew Stibbe (27:33) I remember the early days of the internet as well and dial-up and things. At that time, big Silicon Valley promise, I suppose, or the internet pioneer promise was the internet will root itself around censorship. It will be this clear. And of course, that's great, except if you are the government of such and such a country, you can just go and turn off the interconnects between your country's internet. They're not millions.
They're tens, dozens, hundreds. And I think this is something that the early internet pioneers hadn't hadn't quite borne in mind. And do you think that this splintering might come in other ways too? For example, companies setting up walled gardens or, you know, pay walls or things like that?
Henri (28:21) Yeah, of course you see ⁓ in the news media, for example, that paywalls are more common now. And, you know, I can relate to that. I mean, there are a lot of journalists doing hard work ⁓ and they also need to get paid. They also have a mortgage or rent they have to pay. So I understand that a lot of news media are using paywalls to make sure that at least they're getting paid for the content they are making. ⁓ But it doesn't help to...
get your reliable information sources ⁓ to your investigation. Especially if you are like a starting OSINT investigator, usually your budget is tight for tooling. You're already glad you have like your own company and you want to set up your investigations. Then it's hard to get reliable information.
information through those kinds of sources. So either you invest in tooling that has access to those sources, or you try to make do with publicly available sources. And since it's so much easier now to set up a website with ⁓ fake news or news from a certain perspective, ⁓ it will be harder to verify the information from more than one source.
Matthew Stibbe (29:39) And, also, search engines, Google, you know, the answers that they give now. The first thing is an AI summary, which, I'm not even sure Google knows how they've put that together, right? That's not explainable. That's just come from an AI machine. Anyway, interesting. We're almost out of time and I'd like to close with one last question. Are there blogs or podcasts or other resources that you've been...
been looking at recently that you find interesting and that you would recommend to your fellow professionals?
Henri (30:13) Yeah, well if you are into ⁓ reading then I think ⁓ Deep Dive by Rae Baker is a very nice book if you want to start with with OSINT. She's also very good with Maritime OSINT. So if you are into tracking ships and shipping containers, it's a very good book to read. You really know that that's her expertise. You can really read that in that book. There are some nice Discord servers out there from
Bellingcat, case scenarios where you can hook up with a lot of people. The Osmosis Association also has a Discord server. ⁓ And podcasts, ⁓ of course this podcast, ⁓ but I think also The Pivot is nice.
Trace Labs has a nice podcast, and if you're a little bit more into cybersecurity and CTI, Microsoft has a great threat intel podcast, SoNS of course, and Darknet Diaries is one of my personal favourites when I'm in the car. So ⁓ yeah, I think there are a lot of nice resources out there and otherwise if someone wants some more resources, are always happy to reach me and ⁓ I can show them some more resources, no problem.
Matthew Stibbe (31:28) Fantastic.
Indeed, as we bring this episode to a close, if you want to learn more about Data Expert, you can go to dataexpert.eu in the Netherlands or dataexpert.eu for the rest of the world. And ⁓ Henri, I think we can find you at K2S OSINT. If we Google that, I found your GitHub repo. So is there any other way that we can get in touch with you or find your resources online?
Henri (31:56) Yeah, well, I'm on the usual suspects like LinkedIn, ⁓ Bluesky and GitHub and otherwise, of course, through the website of Data Expert, I'm always reachable.
Matthew Stibbe (32:06) Fantastic. Well, and that brings up this episode to a close. Henri, thank you so much for being a wonderful guest. Very interesting conversations today.
Henri (32:14) Thank you so much for having me.
Matthew Stibbe (32:16) And, if you're listening to this and you'd like to know more about OSINT, Blackdot, or their product Videris, please contact or please visit blackdotsolutions.com. Thank you very much for listening and until next time goodbye.