The Ultimate Guide to OSINT and Google Dorking

If you're looking for a comprehensive guide to OSINT and Google dorking, you've come to the right place. In this blog post, we'll discuss what OSINT is and how to use Google dorks to find sensitive information online. Stay tuned, because by the end of this post, you'll be a master at using OSINT techniques.

What is Google Dorking?

Google dorking is a hacking technique that makes use of Google's advanced search services to locate valuable data or hard-to-find content. Google dorking is also known as "Google Hacking."

Researchers have found that they could take advantage of the wider extra options to find things via Google that you don't expect, which narrows it down to very specific things. For example, finding vulnerabilities in systems. Google is an incredibly powerful search engine, more powerful than a lot of other companies such as Apple.

What Are Some Examples of This?

One example is LinkedIn, LinkedIn is very difficult to scrape, compared to a lot other websites out there and the site does this on purpose as they don't want people to be able to find peoples names and sensitive details. Google is searching and indexing all of the time. Another example is documents, stripping metadata from documents, these often contain usernames hidden in the documents, the first step is actually getting hold of those documents. With google you can crack this by extracting metadata through the google tools which leads to finding very sensitive information. This also collides with social media, being able to cross reference people on social media sites to google maps to actually find where people live. This can happen in such a short amount of time, 40 to 50 minutes from finding your name all the way to your residential address.

How Serious is Finding Someones Residential Address?

This is an interesting debate, and some may think it's not a huge concern as many people may know where they live, but they often don't think about how many times they have given the first line of their address over the phone to companies and businesses. Worst case scenarios include being swatted, little bits of these personal bits of information can be damaging in the wrong hands. People need to be aware of this and be able to remove it if possible.

Unfortunately with a lot of OSINT, you cant just switch it off, you need to understand what's out there and the risk it poses from addresses to usernames, once its out there, its hard to get it back.

How Can we Make People More Aware?

First of all, you can try and make people aware that's its now publicly available information. And that shouldn't be used as a factor of authentication. Implementing things such as multi-factor authentication on publicly available login portals or cloud environments that again, would help to mitigate against the risk of having this information out of that process.

Conclusion

To conclude, this blog, alongside the podcast, is the ultimate guide to OSINT and Google Hacking (or google dorking) with the goal in mind to protect you from having sensitive information leaked online. This blog highlights the key solutions to avoid your sensitive information being online and the precautions you can take going forward. Here at DarkInvader, we discover, mitigate and remove public facing threats on the public and private web. Check out our OSINT service to discover more.

Audio Transcript

Hello and welcome to the first of many DarkInvader OSINT deep dives. I'm joined by our technical director Gavin Watson, and one of the research team Liam Follin to discuss the magic that is Google Dorking. Without further ado, Gavin, what is Google Dorking?

So Google Dorking, it's sometimes called Google hacking as well. And it's, it's using the Google search engine in a more advanced way. Basically, it's using some of the the extra syntax options that Google offers. Now, this can be really, really simple. And I think a lot of people are aware of some of the extra little commands you can put into a Google search box. So for example, if you wanted to search something on the BBC website, and you wanted to make sure that you only receive results from the BBC website, and nothing else, and you can type site colon bbc.com, for example. Now, what people found was that they could take advantage of the wide variety of these extra options to find things via Google that you really wouldn't expect and to, to really narrow the search down to very, very specific things. So for, for example, it's possible to find vulnerabilities in in people systems, it is possible to, to map an attack surface for a target, you can, you can use these options to generate lists of potential usernames. And these are all really, really useful types of information for an attacker, or in terms of us for a pen tester or a security consultant.

Liam, do you have anything to add on to that?

No, that was a very eloquently book. No, effectively, we're just, Google is an incredibly powerful search engine, more powerful than Apple. And I think a lot of people realise, and all we're doing is leveraging that power slightly more efficiently, to allow us to find out information about people to tech services, coming from the security side, or coming from the research side of town, just trying to find out information about businesses that could then be leveraged to harm and damage. I think there's some pretty good examples of this. I know, you've worked on certain tools that use Google Dorking to specifically around LinkedIn.

Absolutely. So LinkedIn is very difficult to scrape, relatively speaking, compared to a lot of other websites out there. And that's on purpose. The LinkedIn don't want people to be able to scrape names and addresses because at the end of the day, LinkedIn contains not not particularly sensitive information. But if it could be scraped on mass, then that could be quite an issue, getting hold of huge amounts of names and employment history and people's likes, and endorsements, things like that, you know, they don't want people to scrape that for a variety of reasons. But Google is searching and indexing all of the time. And so you can scrape LinkedIn, to an extent via Google. So and how this works is very, very simplistic. Just going back to what I said previously, is you can refine the search with site colon, LinkedIn. And then you can use some of the other syntax, like in title, for example, so that the results you get back from Google, every single one is an individual LinkedIn user for the company you are interested in. And that's quite a powerful thing. You can't scrape LinkedIn, but you kind of have scraped the names of the your title company via Google. And then with a little bit of coding, you can change that output into a valid list of names, or user names. And this is half the battle. If you're going to attack a company, if you're going to brute force, a login portal, you need user names. And with a little bit of more coding, you can change though that first name and that last name into different email conventions like first dot last or first initial, and then the whole surname because you might because you might not know what convention the company uses. And so with a very simple Google hack, or Google Dork, if you will, and a very little bit of coding, then you can, you can generate a huge, huge list of potential usernames. And then it's just a case of picking a few common passwords password one password 123 And it's a numbers game to depending on how many you're sending in, it only takes that one individual to have a weak password and then you potentially you're, you're into a system. And then another example is the documents as wel and stripping out metadata from documents is a really common strategy for getting information because documents hosted on people's websites often contain usernames, and they don't realise those usernames are hidden in those documents. But the first step is getting hold of those documents. And how do you do that. And again, with simple Google hacks, you can accomplish this. So you can search for a company name, and then just put something simple like file type colon PDF, and all the results you get back will be PDFs for that company, then you can download them all manually, if you want. And then extract that metadata using the various freely available tools like EXIT tool and things like that. And there are there are tools out there that automate this process, like the old tools meta google, for example. But the point is, you can do it very easily, manually yourself just with Google. I think there's, there have been quite a lot of instances recently, war stories, if you will, I guess, you know, where we've used these kind of Google hacks. And we found, you know, really quite sensitive information.

Absolutely. And, again, kind of leveraging that, that powerful, powerful search engine. To try and find things, we've got some really good examples given that we've also used it to find people's residential addresses, by piecing together from from various social media sites, piecing together this image of a person, how they normally present themselves on social media sites. And we were then able to use that information, and cross referencing with things like Google Maps, for example. And we're actually able to find where somebody lived. Just simply based on again, using these Dorking strings, using these as a way of interacting with Google. And again, in a reasonably short amount of time, it only took about 45 to 50 minutes for the, for the researcher in question to actually go from finding a name and then all the way through to to have this residential address, which naturally I'm sure most people don't really want their their residential addresses to be to be publicly known online or to be found by the sorts of people that are going to be doing this, this research. And there are other examples as well, where Greg gave a talk on this as revealed recruit sensitive information, not perhaps as serious as where exactly where somebody lives. But you can use it for example, defined supplements. Or, more specifically, you can use it to find an s3 buckets that have been crawled by by Google. And once you've found that s3 bucket, that's a very common misconfiguration, and s3 buckets, leaving them open public. And that lists all the files in all the first 1000 files, strictly speaking. And we one of the researchers actually managed to again, using these very simple site in URL, Google, Google, Dorking, Google hacking STS, they were able to find a quite extensive list of driving licences and passports that had just been left in an s3 bucket, presumably by some HR for one of these businesses. And naturally, that's, again, that's quite a very simple technique. But the damage can be caused as a result of that, or the potential for damage to some sort of somebody just using these very simple techniques.

The residential addresses quite an interesting one. I mean, some people might think, well, so what, you know, I tell people where I saw loads of people where I live, but if you think about how many times if you've been on a phone call to a business, and they've authenticated you by saying close to the first line in new dress, you know, it's that's can be quite, quite serious. And in the worst case scenarios here about people being swatted, where they call in a gun crime at a particular residential address to call the SWAT team and things like that. So, you know, this little bits of personally identifiable information like that, in that, you know, in the wrong hands, can be really quite damaging. So it's, it's important to, you know, to be aware of that is out there and make it possible to get it removed.

Absolutely. Which which kind of nicely segues into, you know, what can you pay and what can you do about this? Alright, so, we've, we've expanded upon over the last couple of minutes, you know, what you can do, you can find out there, but now we have found something out, but what are we going to be able to do about it? And unfortunately, with a lot of this open source intelligence, you can't just kind of switch it off. You've got to, you need to understand what's out there. And you also need to understand the risk of that information process and that will be very different. From same residential address, or I can use a name. But once the information is out there, as I said, it's very hard to get it removed from it. That's kind of not really how the internet works. But you can do put in place series of compensatory controls. Alright, so residential addresses ever. That's fine. What are we going to be able to do about it? Well, first of all, you can try and make people aware that's now publicly information, publicly available information. And that shouldn't be used as a factor of authentication. In an extreme case, you can also move obviously, that's, that's probably slightly, slightly over the top. But again, that might be necessary. If, for example, we use Google Dorking, or it's possible to use Google Chrome to find lists of usernames for your, for your for your business, then implementing things like multi factor authentication on publicly available login portals or cloud environments that again, would help to mitigate against the risk of having this information out that process.

Thank you both for talking us through the first of many OSINT deep dives. Join us next week.