Show HN: Search 7,500 MCP servers across NPM, PyPI, and the official registry

api.rhdxm.com · c5huracan · 1 day ago · view on HN · security
0 net
I crawled 7,500+ MCP servers. Here's what I found. - Meyhem I've been experimenting with agents, trying to understand how they actually find and use tools. What surprised me wasn't the agents themselves. It was the infrastructure gap underneath them. There are thousands of MCP servers out there. The problem is they're spread across GitHub awesome-lists, npm, PyPI, and the official MCP registry, and none of these sources know about each other. If your agent needs a database connector, you're searching each source manually, comparing what you find, and hoping you haven't missed something better in a source you didn't check. I wanted to know: how many MCP servers actually exist? So I built crawlers for all four sources. Started with over 10,000 candidates. Then the interesting part: a significant chunk were dead. Deleted repos, archived projects, URLs that just 404'd. The MCP ecosystem is growing fast, but a meaningful portion of what's listed doesn't exist anymore. After filtering out the dead stuff, the duplicates, and the abandoned projects, about 7,500 live servers survived. Here's what stood out. The language split is closer than you'd expect Python has a slight edge over TypeScript. I assumed TypeScript would dominate given that Anthropic's MCP SDK is TypeScript-first, but the numbers tell a different story. JavaScript is a distant third, then Go and Rust. The ecosystem is more diverse than the tooling would suggest. The long tail is the whole problem This is the part that matters. The majority of servers have fewer than 10 GitHub stars. Under 200 have over 1,000. The server you need probably exists. Someone almost certainly built exactly what you're looking for. But you'll never find it by browsing, because it's buried under thousands of others with no way to surface it. That's the core discovery problem, and it only gets worse as the ecosystem grows. Some names you wouldn't expect Microsoft's markitdown sits near the top with 90k stars. Context7: 48k. Chakra UI and Mantine both have MCP servers, which I had no idea about until the crawler surfaced them. Same for Netdata, MindsDB, and OpenBB. These are well-known projects that have quietly added MCP support, and unless you were watching their repos, you'd never know. What I built What struck me is that this is essentially search for a brand new surface. MCP servers are a new kind of thing to find, the way web pages were in 1998 or mobile apps were in 2009. Only this time we know more about how to do it, and the tools let you go further and faster than ever before. I turned all of this into a search engine called Meyhem . One query, ranked results across every source. The ranking uses a blend of community signals and relevance, so it surfaces quality rather than just whatever happens to match a keyword. You can try it right now: import httpx httpx.post('https://api.rhdxm.com/find', json={'query': 'web scraping', 'max_results': 3}).json() this.textContent='Copied!')">Copy That returns the top web scraping servers, ranked by quality and community signal. It's also an MCP server itself, so your agent can use it to find other MCP servers. Add this to Claude Code, Cursor, or your favorite MCP client: { "mcpServers": { "meyhem": { "url": "https://api.rhdxm.com/mcp/" } } } this.textContent='Copied!')">Copy Then just ask your agent to find an MCP server for whatever you need. It's also listed on awesome-mcp-list , one of the larger community-maintained MCP directories. What agents are actually searching for I didn't do much promotion. Published to the MCP Registry, listed on a couple of skill marketplaces, mentioned it in a few places. Then I looked at the query logs. 2,000+ searches across 66 agent IDs. The topics are all over the place. Government regulatory filings. Meme coins. Military news. Rust framework performance. Supplement industry M&A. Someone is using it to compare conflict coverage across international news outlets. The queries read like a list of things that have nothing to do with each other, and that's the point. I didn't build this for any of these use cases. People found it and bent it to whatever they needed. The largest single bucket, 639 queries under one generic default ID, turned out to be at least 20 distinct users when I clustered by topic. Nobody changed the default, so the real unique user count is significantly higher than what the stats show. What's interesting is how agents search. They don't browse. They repeat narrow queries on a schedule, drill into one domain for weeks, switch languages mid-session. This is what search looks like when the searcher has no curiosity, only a task. It changes what "good results" means, and the existing discovery infrastructure isn't built for it. Most of this usage is organic. Nobody asked permission. How it was built What started as a few experiments turned into all of this, pair-programming with Claude over the past few weeks. Crawlers, a DuckDB index with full-text search, FastAPI, the whole lot. Each source had its own quirks. npm's registry API was the most straightforward. PyPI needed filtering to separate real MCP servers from the noise. The awesome-lists required markdown parsing plus GitHub API enrichment to pull star counts and metadata. The official registry had 9,000+ entries but only about 2,700 unique repos after deduplication. The cleanup took as long as the crawling. Stripping emoji from descriptions, fixing broken URLs, filtering out abandoned projects so they don't pollute results. Building a good index is less about what you put in and more about what you keep out. It's all running at api.rhdxm.com . Let me know if you try it, especially if something breaks.