The GraphQL Goldmine: How to Reverse Engineer APIs for Scraping

norvilis.com · zilton · 5 days ago · view on HN · security
0 net
The GraphQL Goldmine: How to Reverse Engineer APIs for Scraping - DevBlog by Zil Norvilis × Zil Norvilis Full-Stack Developer with a love for Ruby on Rails (Hotwire stack) Follow Email X.com GitHub LinkedIn Bluesky Custom Social Profile Link --> #ruby #graphql #scraping #webdev The “One Endpoint” Revolution In the old REST world, if you wanted to scrape a user’s profile, their posts, and their comments, you might have to hit three different endpoints: /users/1 , /users/1/posts , and /comments?post_id=5 . In GraphQL , there is only one door: /graphql . And the best part? The website explicitly tells the server exactly what data it wants in a structured language. If you can intercept that message, you can ask for the data yourself - and often, you can ask for more data than the website is showing. Here is the 4-step process to cracking any GraphQL API. Step 1: Spotting the Target Open your Chrome DevTools ( Network Tab ) and refresh the page. Filter by Fetch/XHR . You aren’t looking for a dozen different requests. You are looking for a single request, usually named: graphql api query Click it. Look at the Payload tab. If you see a JSON object with keys like operationName , query , and variables , you have struck gold. The Payload looks like this: { "operationName" : "GetProductDetails" , "variables" : { "slug" : "awesome-sneakers-v2" }, "query" : "query GetProductDetails($slug: String!) { product(slug: $slug) { id name price stockLevel } }" } Step 2: The “Introspection” Cheat Code This is the biggest security flaw in most GraphQL implementations. Developers often forget to turn off Introspection in production. Introspection allows you to ask the API: “Tell me everything you know.” How to test it: Copy the URL of the GraphQL endpoint. Download a GraphQL client like Altair or Insomnia . Paste the URL and click “Reload Docs” (or “Schema”). If it works: You will see a documentation sidebar appear on the right. You can now browse every single data field available in their database. UI shows “In Stock”? The Schema might reveal "exact_inventory_count": 542 . UI shows “User Name”? The Schema might reveal "email" , "created_at" , or "last_login" . You don’t need to guess endpoints anymore. You have the map. Step 3: Replicating the Request in Ruby Scraping GraphQL is just making a POST request with a specific JSON body. require 'http' require 'json' endpoint = "https://api.example.com/graphql" # 1. The Query (Copy this from the Network Tab Payload) query_string = <<~ GRAPHQL query GetProducts($category: String, $cursor: String) { products(category: $category, first: 20, after: $cursor) { pageInfo { hasNextPage endCursor } edges { node { id name price { amount currency } } } } } GRAPHQL # 2. The Variables (This is what we change to paginate!) variables = { "category" : "electronics" , "cursor" : nil } # 3. Send it response = HTTP . headers ( "Content-Type" => "application/json" , "User-Agent" => "Mozilla/5.0..." # Always mimic a browser ). post ( endpoint , json: { query: query_string , variables: variables }) data = JSON . parse ( response . body ) # 4. Extract Data data . dig ( "data" , "products" , "edges" ). each do | edge | puts edge . dig ( "node" , "name" ) end Step 4: Pagination (The Infinite Scroll) GraphQL pagination is superior to REST pagination. It usually uses Cursors (a pointer to a specific record) rather than Pages (Page 1, Page 2). Look at the pageInfo object in the response: "pageInfo" : { "hasNextPage" : true , "endCursor" : "OPQ123==" } To get the next batch of data, you simply update your Ruby variables hash: cursor = data . dig ( "data" , "products" , "pageInfo" , "endCursor" ) variables [ "cursor" ] = cursor # ... make the request again ... Advanced: Defeating “Persisted Queries” Sometimes, you look at the payload and you don’t see a query string. Instead, you see this: { "operationName" : "GetProduct" , "extensions" : { "persistedQuery" : { "version" : 1 , "sha256Hash" : "a3f89..." } } } This is a security feature called Persisted Queries . The server has cached the query strings and only accepts a specific Hash ID ( sha256Hash ) to execute them. How to beat it: You cannot modify the query (add/remove fields). You can still modify the variables . Just copy the sha256Hash from the Network tab and send that in your Ruby payload instead of the query string. You can still iterate through pagination by changing the variables. Summary GraphQL is a scraper’s dream. Check for Introspection: It might give you the keys to the kingdom. Copy the Query: Don’t write GraphQL by hand; copy it from the Network tab. Loop the Cursor: Pagination is standardized and easy to loop. Extract: Enjoy your perfectly structured, type-safe JSON data. Have you ever found exposed private data via GraphQL introspection? Share your war stories (anonymously!) in the comments. 👇 Share on X Facebook LinkedIn Bluesky You May Also Enjoy The Ruby Browser War: Playwright vs. Ferrum (2026 Edition) The New Guard For a decade, Selenium was the king of browser automation. It was slow, clunky, and flaky, but it was all we had. How to Use ActiveRecord Without Rails Very often I find myself writing small Ruby scripts to scrape data, process CSV files, or build a tiny Sinatra API. In these situations, generating a massive... Is it Worth Becoming a Rails Developer in 2026? (The Honest Truth) The “Dead” Language Paradox If you listen to Twitter (or X), Ruby on Rails died sometime around 2017. Everyone moved to Next.js, Go, or Rust. Bootcamps stopp... Enter your search term...