If you’re in charge of a domain that others visit on a regular basis, you want to ensure that users continue to trust that domain, after all, your domain is an extension of your brand, which is your business. So you want to keep phishers and attackers from creating fakes and copies of it to lure your customers away or tarnish your brand with false information.
Extracting a list of URLs from a PDF can be difficult. One of the tools that makes it a lot easier is pdf-parser by Didier Stevens.
For now we’ll focus on pdf-parser for extracting URLs in PDFs. This is a powerful tool with many uses, the full scope of which is beyond this brief tutorial.
I’ll start off this post by saying that I DO NOT claim to be a “RegExpert” (Regular Expression Expert). I claim to be an expert in only a couple of things in life:
- Knowing how to give good doggos belly rubs.
- Not thinking of anything in particular.
But because neither of these things pays the bills (yet), I’ve found that a little knowledge in regular expressions is a good thing to have. Better, sometimes having resources in your back pocket can be useful.
I’ll show a couple of small examples here, but will also give some examples of resources that can help with some of the tricky situations where regex is needed.