The Art of Knowing Everything Before You Hack Anything part 1

The Art of Knowing Everything Before You Hack Anything part 1 | by Yousef Elsheikh - Freedium Milestone: 20GB Reached We’ve reached 20GB of stored data — thank you for helping us grow! Patreon Ko-fi Liberapay Close < Go to the original The Art of Knowing Everything Before You Hack Anything part 1 Introduction Yousef Elsheikh Follow ~15 min read · April 5, 2026 (Updated: April 5, 2026) · Free: Yes Introduction Recon isn't just "googling the target." It's the art and science of quietly building a complete, high-resolution map of your target's entire digital footprint — before you ever send a single packet. In 2026, companies run hybrid cloud environments, microservices, partner integrations, and thousands of forgotten subdomains. The attack surface has never been bigger… or more hidden. In this series, we'll walk through a complete, modern reconnaissance methodology — from passive OSINT and infrastructure mapping to advanced cloud hunting, JavaScript analysis, automation pipelines, and turning raw data into easy, high-impact bugs. Whether you're a beginner bug bounty hunter or an experienced red teamer, you'll leave with practical techniques, battle-tested tools, and a repeatable workflow that actually scales. Let's discuss our agenda for this part : Understanding Your Target Acquisition Attack Surface Expansion Infrastructure Mapping 1. Understanding Your Target Before you run a single tool, you need to actually understand what you're looking at. This step is boring, nobody talks about it, and it's the reason most hunters miss easy bugs. Check the scope first. Read the program's rules carefully — not just the asset list, but the fine print. What's out of scope? What vulnerability types do they exclude? What are their rules of engagement? Missing this step can get you disqualified or, worse, land you in legal trouble. Look at things like "any API endpoint ending with /self-service-* is out-of-scope" — details like that save you hours of wasted effort. Browsing time. Open the web app, the mobile app, and any other asset or service the target offers. Use it like a normal user. Sign up for an account. Go through every flow — registration, login, password reset, profile editing, payment, settings. Understand how it works, what data it handles, and where the complexity lives. The more features a product has, the more places things can break. This isn't busywork. When you browse first, you notice things that tools never catch — a forgotten subdomain linked in the footer, a debug parameter in the URL, a different tech stack on the mobile API, an error message that leaks internal paths. You're building mental context that makes everything else faster. Quick research round: Check if they have a blog or YouTube channel — engineering blogs often reveal their tech stack, infrastructure decisions, and internal tooling. Follow their LinkedIn page and note who works there. Collect employee accounts on GitHub and GitLab — look at their projects, repos, commits, and especially anything accidentally pushed and deleted (it's still in the Git history). Gather email formats and addresses — these help with password spraying, phishing simulations, and OSINT later. Don't you ever forget to take notes on everything. If you didn't write it down, it didn't happen. 2. Acquisitions This is one of the most valuable steps in recon, especially when the target is a large company with a wild portfolio. Many hunters skip this part entirely, and it's an absolute gold mine. Everyone knows Crunchbase — unfortunately, it's paid. The Most Accurate Free Method After some digging, I found that Wikipedia is surprisingly effective for this. Just search for the company you want — let's say Google: https://en.wikipedia.org/wiki/ google Acquisitions Wikipedia will list all the assets and companies acquired by the target. Clean, organized, and free. Google Assets A Hidden Gem While searching for free methods to get accurate acquisition data, I found this treasure: A C hatBot created by Jason Haddix , powered by AI — specifically built for acquisition and recon lookups: https://chatgpt.com/g/g-3GwxLih5t-arcanum-acquisition-and-recon-bot Feed it a company name, and it pulls acquisition data for you. Impressive and practical. Now that you know what the target owns — their main assets and everything they've acquired — it's time to map what's actually running. Let's expand the attack surface. 3. Attack Surface Expansion In this section, we'll cover three key areas: ASN → CIDRs → IPs TLS/SSL Certificates Automation ASN / CIDRs / IPs What is an ASN? Meta owns AS32934. That single number maps to every IP range they control. An ASN (Autonomous System Number) is a unique identifier assigned to a network or group of IP ranges that operate under a single routing policy. Every major company that owns its own IP space has one (or more). Think of it as the company's "network ID" on the internet. Why does it matter? If you can find a target's ASN, you can map it to their CIDR blocks, which gives you every IP range they own. Many of those IPs host assets that never show up in DNS or subdomain enumeration — forgotten staging servers, internal tools, admin panels, legacy apps. That's your gold. Quick note: CIDR (Classless Inter-Domain Routing) is a compact way to describe a block of IP addresses. Instead of listing thousands of IPs one by one, you write it as 192.168.1.0/24 — where the number after the slash tells you the size of the range. The smaller the number, the bigger the block: /24 is 256 IPs, /16 is 65,536, and /8 is over 16 million. How to find it — the workflow I actually use: First , search on bgp.he.net — just type the company name. This gives you the ASN and its associated CIDR blocks in seconds. It's free and usually my first stop. Second , validate with whois lookups on known IPs to trace back to the ASN and confirm ownership: whois -h whois.cymru.com " -v $(dig +short example.com | head -1)" Third , automate with ASNmap to pull all CIDRs programmatically: asnmap -d facebook.com This resolves the domain to its ASN, then expands it into every CIDR block — ready to pipe into other tools. Once you have the ASN, expand it into CIDRs, then scan those IP ranges for live hosts, open ports, and hidden services. This is how you find assets that are completely invisible to traditional recon. How to validate the ownership You have thousands of IPs from the ASN. But which ones actually belong to the target? Two key Shodan filters, two different questions: org: — "Who owns this IP?" This searches by the organization name in WHOIS/BGP records. It returns IPs that Meta actually owns — registered to their ASN, allocated to their network. This is the more accurate filter for validating IP ownership. If Shodan says org:"Meta Platforms, Inc." for an IP, that IP is registered to Meta. Period. ssl.cert.Subject.cn: — "Who is this IP serving?" This searches by the Common Name on the SSL certificate. It returns any IP on the internet that serves a certificate for *.facebook.com — regardless of who owns the IP. That includes CDN nodes, cloud-hosted services, third-party partners, and misconfigured servers. When you know the ASN (tight, precise): # The precise approach - ASN + cert match asn:AS32934 ssl.cert.Subject.cn:facebook.com When you don't know the ASN and want to cast a wider net: # The exclusion approach - cert match minus CDNs ssl.cert.Subject.cn:facebook.com -org:"Cloudflare" -org:"Amazon" -org:"Akamai" Extract just the IPs (pipe-friendly): shodan search --fields ip_str "asn:AS32934 ssl.cert.Subject.cn:facebook.com" Download the full dataset, then parse locally (better for large results, saves API credits): shodan download meta_origins "asn:AS32934 ssl.cert.Subject.cn:facebook.com" shodan parse --fields ip_str,port,org,ssl.cert.subject.CN meta_origins.json.gz Count before you commit: shodan count "asn:AS32934 ssl.cert.Subject.cn:facebook.com" TLS/SSL Certificates SSL certificates leak information that targets don't always realize is public. Fields like CN (Common Name), SAN (Subject Alternative Names), JARM , and JA3 fingerprints can reveal subdomains, internal hostnames, and infrastructure details that were never meant to be exposed. The main function of a TLS certificate A TLS certificate does two things. First, it proves identity — when your browser connects to facebook.com , the certificate says "yes, this server is really Facebook, and here's a trusted authority (like Let's Encrypt or DigiCert) vouching for it." Second, it enables encryption — the certificate contains the public key used to establish an encrypted connection, so nobody between you and the server can read the traffic. From a recon perspective, these certificates are gold because they're public by design. The server hands them to anyone who connects. Every field in that cert — the CN, the SANs, the issuer, the expiration date — is information the target is broadcasting to the entire internet whether they realize it or not. SSL/TLS Protocol Versions Six versions, three are dead, one is dying, two are safe: SSLv2 (1995) and SSLv3 (1996) — both completely broken. SSLv3 is vulnerable to the POODLE attack. Never use either. If you find a server still running these, that's a finding. TLSv1.0 (1999) and TLSv1.1 (2006) — deprecated. TLSv1.0 is also vulnerable to POODLE. TLSv1.1 doesn't support perfect forward secrecy. Most browsers have dropped support for both. Finding these in production is a reportable misconfiguration. TLSv1.2 (2008) — currently the minimum acceptable standard. Supports perfect forward secrecy and modern cipher suites. Still widely used and considered secure when configured properly. TLSv1.3 (2018) — the gold standard. Faster handshake, stronger encryption, removed legacy insecure features entirely. This is what everything should be running. How this is useful in bug bounty / red teaming: If you scan a target's ASN with tlsx and find servers still running TLSv1.0 or SSLv1.1, that server is likely old, neglected, and unmaintained — which means it probably has other, more serious vulnerabilities too. The outdated protocol is your signal that nobody is watching that box. It's also a valid finding on its own in many programs (weak TLS configuration). Scanning for TLS Misconfigurations with tlsx What is tlsx? tlsx is a fast, configurable TLS grabber built by ProjectDiscovery — think of it as a Swiss army knife for TLS reconnaissance. Point it at any host, and it grabs the SSL/TLS certificate and extracts everything useful: CN, SAN, issuer, expiration date, encryption algorithms, and JARM/JA3 fingerprints. It also flags misconfigurations along the way. Install it from github.com/projectdiscovery/tlsx . Why is this useful for ASNs, CIDRs, and IP validation? Here's the problem: you find a target's ASN, expand it to CIDRs, and now you have thousands of IPs. But which ones actually belong to the target's active infrastructure? WHOIS tells you who registered the IP. tlsx tells you who's actually using it right now — by looking at what certificate each IP is serving. If you scan an IP on Meta's ASN and tlsx pulls back a certificate with CN=*.facebook.com and SANs listing instagram.com , whatsapp.com — that IP is confirmed Meta infrastructure, live and serving their domains. If it pulls back a cert for some unrelated company or a self-signed default, that IP might be reassigned, decommissioned, or shared hosting. tlsx gives you ground truth that WHOIS alone can't. Three Commands, Three Different Jobs These three commands look similar but each one answers a completely different question: asnmap -d facebook.com | tlsx -san -cn asnmap -d facebook.com | tlsx -jarm asnmap -d facebook.com | tlsx -ex -ss -mm -re All three start the same way — asnmap -d facebook.com resolves Facebook's domain to its ASN, then expands that into CIDR blocks. The difference is what tlsx does with those IPs. tlsx -san -cn — "What domains does this IP serve?" This is your discovery command. It grabs the SSL certificate from each IP and extracts the CN (Common Name) and SAN (Subject Alternative Names). The output tells you what each IP is serving — domain names, subdomains, internal hostnames. Use it to map which domains live on which IPs, find hidden subdomains, and validate ownership. tlsx -jarm — "What infrastructure is this IP running?" This is your fingerprinting command. It doesn't look at the certificate content at all. Instead, it sends a series of crafted TLS handshakes to the server and hashes the responses into a JARM fingerprint . That fingerprint identifies the TLS stack — the specific server software, version, and configuration. Use it to group servers by infrastructure, distinguish CDN edge nodes from origin backends, and detect when two IPs that look different are actually the same server. tlsx -ex -ss -mm -re — "Is this server misconfigured?" This is the vulnerability scanner mode. Each flag targets a specific misconfiguration: -ex (expired) — finds certificates past their expiration date. Neglected servers have other, worse problems. -ss (self-signed) — finds certificates the server signed itself. Self-signed certs scream "internal tool" — staging environments, admin panels, dev servers that were never meant to be public. -mm (mismatched) — finds certificates where the domain doesn't match the CN or SAN. This is a dead giveaway you've found an origin server behind a CDN. -re (revoked) — finds certificates the CA has revoked but the server still serves. The admin either doesn't know or doesn't care. Two more misconfigurations worth knowing about (not covered by tlsx flags): wildcard certificates ( *.target.com ) — if you compromise one subdomain, the same cert and potentially the same private key covers everything. And weak encryption algorithms (SHA-1, RSA-1024) — rare on modern infrastructure but still show up on legacy systems. 4. Infrastructure Fingerprinting (Red Team Gold) Here's where it gets interesting. You can calculate the hash of a certificate or server response and compare it across hosts to distinguish CDN-fronted servers from origin servers — confirming you've found the real backend behind the proxy. The tlsx pipeline — ASN to origin in one command: echo AS32934 | asnmap | tlsx -san -cn -jarm -resp-only -silent | grep "facebook.com" ASN → CIDRs → TLS scan → filter by domain. What comes out the other end are IPs that live on the target's own network and serve their certificate — origin servers. You can go deeper by comparing JARM hashes: # Get the JARM hash of the CDN-fronted domain echo facebook.com | tlsx -jarm -silent # Compare against the IPs you found cat origin_candidates.txt | tlsx -jarm -silent Different JARM = different infrastructure = you've found the origin behind the proxy. Finding the Origin IP — Effective Methods Most modern targets sit behind a CDN or WAF — Cloudflare, Akamai, Fastly, AWS CloudFront. You hit the domain, but you're talking to an edge node, not the actual server. The origin IP is hidden behind that proxy, and finding it means you can bypass the WAF entirely and talk to the backend directly. Here are some proven ways to uncover it. Favicon Hash Matching : Every web app serves a favicon.ico . That tiny icon has a hash, and Shodan indexes it. If the origin server serves the same favicon as the CDN-fronted domain, you can find it by hash. The workflow: download the favicon, calculate its MurmurHash, and search Shodan with http.favicon.hash:. Filter out CDN nodes — what's left is likely the origin. pythonimport mmh3 import requests import codecs response = requests.get("https://target.com/favicon.ico") favicon = codecs.encode(response.content, "base64") hash = mmh3.hash(favicon) print(f"Favicon hash: {hash}") print(f"Shodan dork: http.favicon.hash:{hash}") or just hit this curl on your terminal : curl -s '/favicon.ico' | base64 | python3 -c 'import mmh3,sys;print(mmh3.hash(sys.stdin.buffer.read()))' curl command Searching in shodan with favicon hash : FOFA Favicon Hash + ZoomEye These are Shodan alternatives, and honestly they're underrated. Most hunters only use Shodan, which means the same origin IPs get reported over and over. FOFA and ZoomEye index different parts of the internet and update at different intervals — so they sometimes catch servers that Shodan missed or hasn't re-scanned yet. FOFA (fofa.info) — Chinese search engine, massive index. The favicon search syntax is: icon_hash="hash_value" Zoomeye The favicon search syntax is: iconhash:"hash_value" SPF Records — The Origin IP Hiding in Plain Sight This one is embarrassingly simple and yet most hunters walk right past it. SPF (Sender Policy Framework) records are DNS TXT records that tell email servers which IPs are authorized to send email for a domain. Companies set these up for email deliverability — but in doing so, they often list the IP address of their origin server directly in the DNS record. Network Tools: DNS,IP,Email DNS and Network troubleshooting and diagnostic tools integrated into one sweet interface. mxtoolbox.com Look for the v=spf1 record. Inside it, you'll see ip4: entries — those are explicit IP addresses. You might also see include: directives pointing to other domains that resolve to more IPs. Use MXToolbox for a clean visual breakdown — paste the domain, run an SPF lookup, and the origin IP might be staring right at you. If the company runs their mail server on the same infrastructure as their web app, that ip4: entry is the origin. Censys — Finding Exposed Origin Servers with Certificate Data Censys continuously scans the entire internet and indexes every SSL certificate it finds. That means if the origin server has port 443 open and serves a certificate for target.com , Censys has already found it — even if no DNS record points there. Here's the workflow: Visit search.censys.io and enter the target domain. Look at the associated IP addresses in the results. Cross-reference with known CDN IP ranges — anything that's not Cloudflare, Akamai, or Amazon is a candidate. Verify with curl: curl -v https:/// -H 'Host: target.com' -k If the server responds with the target's actual content (a redirect to the real domain, the homepage HTML, or a 301 Moved Permanently to https://target.com ), you've confirmed it — that IP knows how to serve the target's site. That's your origin. The key insight from Christophe Tafani-Dereeper's research on this: the protection offered by CDNs relies entirely on the origin server being only accessible through the CDN. But most companies never configure their origin to reject direct connections. They assume hiding the IP is enough — and it isn't, because Censys has already found it. VirusTotal — Passive DNS That Does the Work for You VirusTotal isn't just a malware scanner. Its domain report feature aggregates passive DNS data from dozens of sources, showing you every IP that domain has ever resolved to — including pre-CDN records and short-lived DNS changes that other tools miss. https://www.virustotal.com/vtapi/v2/domain/report?apikey=YOUR_KEY&domain=target.com Or just search the domain on the VirusTotal website and check the Relations tab. You'll see historical DNS resolutions, associated subdomains, and IP relationships. It's free, passive, and often surfaces IPs that SecurityTrails and ViewDNS don't have. hakoriginfinder — Slow, Loud, and Accurate This is hakluke's tool and the approach is clever. It doesn't rely on passive databases at all. You give it your candidate IPs, and it sends HTTP requests to each one with the Host header set to the target domain. If a candidate IP responds with content that matches the CDN-fronted site, that IP is the origin. prips 1.1.1.0/24 | hakoriginfinder -h https://target.com -p 80,443,8080,8443 prips: expands a CIDR range into individual IPs The tool compares responses automatically. If an IP returns the target's actual page content, it flags it. This is the most reliable confirmation method because you're not trusting stale data — you're directly proving the IP serves the target. Accuracy: 9/10. The downside: it's noisy (you're sending requests to every IP) and slow. Use it as the final confirmation step, not the discovery step. Install it from: ``` github.com/hakluke/hakoriginfinder ``` CloudFlair — Automated Censys + Verification CloudFlair automates the entire Censys approach in one command. It searches Censys for certificates matching your target domain, finds all IPs presenting those certificates, filters out known Cloudflare ranges, then tests each candidate against the live site to confirm origin matches. python cloudflair.py target.com --censys-api-id It handles the full pipeline: certificate search → IP extraction → CDN filtering → response comparison. If it finds a match, that's your confirmed origin. Install it from: github.com/christophetd/CloudFlair A note on ethics: Once you find the origin IP, you still need to confirm it's in scope. Just because you bypassed the CDN doesn't mean the program allows direct origin testing. Check the rules. And if the origin IP belongs to a shared hosting provider, tread carefully — other people's infrastructure might be on the same box. End of Part 1 : In this part, we went from a company name to a full infrastructure map — ASNs, CIDR blocks, validated IPs, TLS misconfigurations, JARM fingerprints, and origin servers hiding behind CDNs. All of it before sending a single exploit. That's what separates hunters who find P1s from hunters who collect duplicates: the more you know before you attack, the less you have to guess. In Part 2 , we'll go wider and deeper — subdomain enumeration at scale, cloud asset discovery, JavaScript analysis, content fuzzing, and building automated recon pipelines that do the heavy lifting while you sleep. Part 1 was about building the map. Part 2 is about reading it. Follow me so you don't miss it. Happy hunting. Linkedin Twitter (x.com) #bug-bounty #red-team #reconnaissance #bugbounty-writeup #information-security Reporting a Problem Sometimes we have problems displaying some Medium posts. If you have a problem that some images aren't loading - try using VPN. Probably you have problem with access to Medium CDN (or fucking Cloudflare's bot detection algorithms are blocking you).