Full Anatomy of Mercor's Data Breach

share.jotbird.com · chirau · 4 days ago · view on HN · news
quality 7/10 · good
0 net
Anatomy of Mercor's Data Breach Anatomy of Mercor's Data Breach ¶ A technical analysis a complete operational data (production database, user & customer data) loss ¶ Disclaimer: All personally identifiable information (PII) in this document has been obfuscated. Names are partially masked (e.g., T** O**** ), emails redacted (e.g., e****[email protected] ), phone numbers truncated (e.g., +4479571**** ), bank details masked (e.g., 000**-*** ), financial identifiers hidden (e.g., acct_1Rc***** ), IP addresses truncated (e.g., 71.194.*.* ), and MAC addresses partially redacted (e.g., 1C:93:7C:**:**:** ). This analysis is conducted for educational and security research purposes. Note on source material: This entire analysis is based on two small sample files made publicly available by Lapsus$ — a database schema sample and a database export containing table structures with example rows, plus partial Airtable workspace exports. These files were shared after Mercor allegedly paid a ransom to have the data removed from the group's leak site — a fact confirmed to us directly by Lapsus$. Despite receiving payment, the group continues to share samples and is actively engaged in selling the full dataset to private bidders. Together these two files represent a fraction of a percent of the claimed 211GB production database. We did not have access to the full database, the 939GB of source code, the 3TB of cloud storage, the Slack exports, or the Tailscale VPN data. Everything documented in this report — every bank routing number, every Apple Foundation Model output, every Persona KYC session token, every desktop screenshot URL — was found in these two small files alone. The full breach is orders of magnitude larger. What follows is the tip of the iceberg. Table of Contents ¶ Executive Summary Why This Breach Is Serious Why AI Training Data Is Worth Billions The Extent - What Data Was Exposed The Scope - Who Is Affected The Scale - Mercor Client Ecosystem The Airtable Export - 84 Workspaces 1055 Files Customer and Third-Party Platform URLs The Screenshot Problem Platform Overview Evidence - The Database Layer by Layer Part I - User and Identity Layer Part II - Identity Verification and Fraud Detection Part III - The Hiring Pipeline Part IV - Interviews and Assessments Part V - Work Trials and Onboarding Part VI - Projects and AI Task Management Part VII - Time Tracking and Productivity Surveillance Part VIII - Payments and Financial Infrastructure Part IX - Communications and Outreach Reverse Engineering - Architecture and Infrastructure Part X - Infrastructure and DevOps Part XI - Analytics and ML Layer Part XII - Reference Data Layer Exposed Surface Area Summary Technical Architecture Reverse-Engineered Grounds for Legal Action I. Client Company Claims - Loss of Proprietary AI Training Data and Trade Secrets II. Contractor Class Claims III. Statutory Violations IV. Negligence V. Third-Party Claims Conclusion - What Happens Now The Data Is Still in Circulation The Ongoing Threat The Case for Radical Transparency A Structural Critique Appendix A - Complete Table Inventory Executive Summary ¶ This document presents a systematic technical analysis of a small sample from a database export from Mercor , an AI-powered talent marketplace that connects software engineers, AI data labelers, and knowledge workers with companies seeking contract labor. As reported by the Wall Street Journal , Mercor has rapidly become one of the key intermediaries in the AI industry — placing contractors inside organizations like Meta, OpenAI, Google DeepMind, Anthropic, Apple, and Amazon to perform AI training, data labeling, software engineering, and other knowledge work. What we analyzed is two small sample files shared by Lapsus$ after Mercor allegedly paid a ransom to have the breach data removed. Despite that payment, the group continues to distribute samples and is actively selling the full dataset to private bidders. Together these files represent a tiny sliver of the claimed 211GB production database. Yet even these small samples contain over 250 table schemas with sample data rows exported from Mercor's Aurora MySQL production environment, plus Airtable workspace exports containing actual AI training data and model evaluation records. The samples cover every operational dimension of the platform — from contractor signup through identity verification, AI-conducted interviews, job placement, real-time work surveillance, and payment disbursement. If these samples — containing just one or two rows per table — already expose full bank routing numbers, government ID verification tokens, desktop screenshot URLs, signed legal documents, and proprietary AI model outputs from Apple and Amazon, the full 211GB database contains the same data for every contractor and every transaction Mercor has ever processed. Scope of This Article and the Full Scale of the Breach ¶ Important: This article analyzes only two small sample files from the production database , shared by Lapsus$ after Mercor allegedly paid a ransom. The full production database is 211GB, which is itself a fraction of the claimed 4-terabyte breach . Every finding documented below was derived from these small samples alone. The full database would contain the complete records for every contractor, every transaction, every screenshot, and every payment Mercor has ever processed. The Breach at a Glance ¶ Mercor's official account attributes the breach to a supply-chain attack on the open-source Python package LiteLLM — a widely used AI proxy library estimated to be present in 36% of cloud environments . On March 27, 2026, using a maintainer's compromised credentials, the TeamPCP hacking group published two malicious PyPI package versions ( 1.82.7 and 1.82.8 ) that were available for download for approximately 40 minutes. The reported attack chain: the poisoned dependency landed in Mercor's development environment, swept the machine for SSH keys, AWS tokens, Kubernetes secrets, and .env files, deployed privileged containers across Mercor's Kubernetes clusters, and used the stolen credentials to begin exfiltrating data through Mercor's Tailscale VPN . However, there are reasons to question whether LiteLLM was the sole or even primary attack vector. Exfiltrating 4 terabytes of data — production databases, 939GB of source code repositories, 3TB of cloud storage including video recordings and screenshots, plus Slack, Airtable, and Tailscale exports — is not a fast operation. At typical egress speeds, this would have taken days to weeks of sustained data transfer. A 40-minute window of malicious package availability seems insufficient to establish the deep, persistent access required to systematically exfiltrate this volume of data across this many distinct systems (Aurora MySQL, S3 buckets, GitHub repositories, Airtable, Slack, Tailscale). It is entirely possible that Mercor was already compromised through other means — whether through prior credential exposure, an insider threat, or a separate vulnerability — and that the LiteLLM incident was coincidental or merely one of multiple entry points. Mercor's characterization of itself as "one of thousands of companies" affected by LiteLLM may be an attempt to deflect from deeper, more embarrassing security failures. Lapsus$ group subsequently claimed responsibility for the breach, posting samples of the allegedly stolen data. Lapsus$ confirmed to us directly that ransom negotiations with Mercor took place and that Mercor paid. Despite that payment, the group continues to distribute samples and is actively selling the full dataset to private bidders. Mercor confirmed the security incident but characterized itself as "one of thousands of companies" affected by the LiteLLM compromise. The company declined to answer whether any customer or contractor data had been accessed, exfiltrated, or misused. Security researcher Archie Sengupta noted it was a "very big breach." Y Combinator president Garry Tan was more direct : "Incredible amount of SOTA training data now just available to China thanks to @mercor_ai leak. Every major lab. Billions and billions of value and a major national security issue." What Was Taken - The Full 4TB ¶ The attackers claim to have exfiltrated the following assets. This article only analyzes the first item — the production database. The remaining categories are not covered in this analysis but are described here to convey the full scale of exposure. Asset Size Contents Production Database 211 GB The subject of this article. 250+ Aurora MySQL tables containing candidate profiles (resumes, work history, skills, education), PII (names, emails, phones, addresses, dates of birth, possibly SSNs and government ID documents), interview recordings/transcripts and AI assessment scores, employer/client data (companies, contracts, pricing), and internal user accounts and credentials. Source Code 939 GB The complete contents of Mercor's GitHub organization — including the mercor-monorepo and all associated repositories. This exposes proprietary AI/ML models for candidate matching and evaluation, the full platform backend and frontend code, API keys, secrets, and internal service credentials embedded in repositories, and all infrastructure-as-code (Terraform/Terragrunt deployment configs, CI/CD pipelines, cloud architecture). Cloud Storage Buckets ~3 TB The actual files referenced by the S3 URLs found in the database. Organized into three categories: Video — AI interview recordings of candidates (the ai-interviewer-recordings and dailyco-recordings S3 buckets), containing face and voice biometric data; GCF-Source — Google Cloud Function source code, representing additional serverless application logic beyond the main repositories; FME Review & Verification — Identity verification documents including passports, driver's licenses, and facial recognition/biometric data used in the Persona KYC flow (the mercor-background-check-photos , certn-api-s3-certn-images , certn-api-s3-one-id-images , and certn-api-s3-certn-rcmp-documents buckets). Also included: every Insightful desktop screenshot ever captured from contractor machines (the mercor-insightful-screenshots-production bucket), and signed legal documents (offer letters, CIIAs, NDAs). Tailscale VPN Data Included Internal network topology and routing configurations, device certificates and authentication keys, access paths to internal services, dashboards, and admin tools. This is effectively a map of Mercor's internal network . Slack Export Included A full export of Mercor's enterprise Slack workspace ( mercor.enterprise.slack.com ) and potentially client-specific workspaces like project-mega.slack.com and glowstone-mli-rubrics.slack.com . Slack exports include every message, file upload, DM, and channel history — candid internal discussions, client communications, incident response threads, and operational decisions. Airtable Export Included Complete exports of all Airtable workspaces used for annotation and project management (6+ distinct workspace IDs found in the database). This exposes task definitions, contractor submissions, quality review data, and client project configurations — effectively the work product of Mercor's annotation pipeline. Google Workspace Unknown It is unclear whether the attackers obtained a full export of Mercor's Google Workspace. Even the small sample analyzed here contains 30+ Google Doc URLs, 10+ shared Drive folder URLs, Google Sheets, and Google Forms. The full database would contain vastly more. If the Workspace was also exfiltrated, it would include all internal documents, email (Gmail), calendar entries, and shared drives. Why This Matters Beyond Mercor ¶ The database analyzed in this report is merely the index — the structured metadata that describes, catalogs, and points to the stolen assets. Think of it as the card catalog for an entire stolen library: The source code reveals how the system works — every algorithm, every API endpoint, every security mechanism, and every hardcoded credential The Slack export reveals what was said about it internally — incident responses, client negotiations, and operational discussions The cloud storage contains the actual files — the screenshots of contractor screens showing client systems, the video interviews showing candidates' faces and voices, the passport scans and government IDs submitted for verification The Airtable export contains the work product itself — the annotation data, task submissions, and quality reviews that Mercor's clients (including frontier AI labs) paid for The Tailscale VPN data provides a map to anything that was missed — the internal network topology that could enable further unauthorized access if credentials haven't been fully rotated As Garry Tan noted, the AI training data alone — the prompts, responses, evaluations, and RLHF annotations produced by Mercor's contractors for organizations like OpenAI, Meta, and Google DeepMind — represents potentially billions of dollars in value. If this data reaches competitors — whether domestic rivals or labs in other countries — it would allow them to shortcut years of investment. The source code for Mercor's proprietary ranking algorithms (MercorScore, the Bradley-Terry tournament system, the Bayesian fraud model) adds further competitive intelligence value. Together, this represents one of the most comprehensive corporate breaches in recent memory: not a single database table or a handful of credentials, but the complete digital footprint — code, data, communications, files, network maps, and work product — of an organization entrusted with some of the most sensitive work in the AI industry. Why This Breach Is Serious ¶ Why AI Training Data Is Worth Billions ¶ To understand why this breach is significant and not just another corporate data leak, it helps to understand what AI training data is and why companies like OpenAI, Anthropic, Apple, Amazon, Meta, and Google pay enormous sums to produce it. Modern AI models like GPT-4, Claude, and Gemini are not programmed — they are trained. The raw intelligence comes from pre-training on internet text, but the ability to follow instructions, reason carefully, and refuse harmful requests comes from a second phase that depends entirely on human-generated data . This is the data Mercor's contractors produce. It falls into several categories, all of which are present in the breach: Supervised Fine-Tuning (SFT) data — Humans write high-quality responses to prompts, demonstrating how the model should behave. The TASKS and TASK_VERSIONS tables across Mercor's 84 Airtable workspaces contain these prompt-response pairs, organized by domain (legal, medicine, finance, coding, etc.). A single SFT dataset covering a specialized domain can cost millions of dollars to produce because it requires experts — lawyers, doctors, engineers — writing at $95/hour for months. Reinforcement Learning (RL) preference data — Humans compare two model outputs and judge which is better. This is the core of RLHF (Reinforcement Learning from Human Feedback), the technique that transformed GPT-3 into ChatGPT. The API_PREFERENCE workspaces, PHASE_1_TASKS (Amazon), and the GPT-4 vs Claude Evaluation project all contain this data — complete with the prompts, both model responses, and the human preference judgment. This data teaches models what humans actually want , which is the hardest and most expensive part of AI development. RL rubrics and evaluation criteria — Before humans can judge model outputs, someone must define what good looks like . The CRITERIA , RUBRIC_VERSIONS , QA_SPECS , and LLM_CALL_CONFIGURATION tables across 60+ Airtable workspaces contain these rubrics. They encode the evaluation methodology itself — the scoring frameworks, the edge cases, the quality thresholds. This is proprietary intellectual property that defines how each AI lab measures progress. A competitor with access to these rubrics doesn't just get the training data — they get the recipe . RL environments and Chain-of-Thought data — The AMAZON_LLM_COT_EVALUATION workspace contains full Chain-of-Thought traces — the step-by-step reasoning that models produce before giving a final answer. The ACADEMIC_REASONING_SFT workspace contains a COT table explicitly for reasoning supervision. The Panacea — Consulting RL Envs project built reinforcement learning environments. This data teaches models how to think , not just what to say. Benchmark evaluation data — The ATHENA_HLE workspaces (likely Humanity's Last Exam) and AIME_RUBRICS (AIME math competition) contain evaluation data for some of the most important AI benchmarks. The MODEL_RESPONSES and AWAITING_REVIEW_METRICS tables contain graded model outputs against these benchmarks. If this data is used to train future models, it contaminates the benchmarks — the models will appear to perform better than they actually do, undermining the entire AI evaluation ecosystem. Pre-release model outputs — The APPLE_ENDPOINT_SANDBOX workspace contains actual outputs from Apple's unreleased Foundation Models ( afm-text-083 , afm-model-086 ). These responses reveal the model's capabilities, limitations, safety alignment, and failure modes before Apple has publicly launched them. For a competitor, this is the equivalent of obtaining a rival's product prototype. Why this data is so expensive to reproduce: Each data point requires a skilled human — often a domain expert — spending minutes to hours crafting, evaluating, or comparing model outputs. At Mercor's reported average rate of $95/hour across 30,000+ contractors, the annual cost of data production runs into hundreds of millions of dollars. OpenAI, Anthropic, and the other labs have each spent years and billions of dollars building these datasets incrementally, refining their rubrics, and developing their evaluation methodologies. The breach doesn't just expose data . It exposes the methodology — the rubrics, the evaluation criteria, the domain taxonomies, the quality control processes, and the scoring frameworks that each lab has spent years developing. Any competitor with access to this material — domestic or foreign — could replicate years of alignment research in months, at a fraction of the cost, by simply adopting the proven evaluation frameworks and training on the stolen preference data. This is why Garry Tan called it "billions and billions of value." The data in these Airtable workspaces is not supplementary. It is the core competitive advantage of the AI labs that produced it — and it is now for sale. The Extent - What Data Was Exposed ¶ The breadth of personally identifiable information (PII) in this breach is staggering. The following inventory documents every category of sensitive data present in the database dump, with specific column names, source tables, and — where available — the format of the exposed data as observed in sample records. This inventory is intended to serve as a factual reference for affected individuals, regulators, and legal counsel. 1. Personal Identity Information ¶ Data Element Database Column(s) Source Table(s) Format Observed in Sample Full legal name name , first_name , last_name MercorUsers_New , MercorUserFinancials (embedded in Stripe JSON) T** O**** , H****i A****a (full plaintext names) Personal email address email MercorUsers_New , Candidates , LinkedinWarmIntros , UserReferences , MLExperimentsJobPerformanceReviews e****[email protected] , a*****[email protected] , a*****[email protected] (full plaintext) Phone number with country code phone MercorUsers_New +4479571**** (full international format) Date of birth birthday UserMetadata , Candidates , WorkAuthorization_Audit Date field — exact DOB for each contractor Physical home address physicalLocation , residenceCity , residenceState , residenceZipCode UserMetadata , UserLocation , Candidates City, state, zip code, and country of residence Profile photograph profilePic MercorUsers_New URL to stored profile image Country of residence residenceCountry , countryOfResidence UserLocation , UserMetadata , Candidates USA , United Kingdom LinkedIn profile URL linkedinUrl , url Candidates , LinkedinWarmIntros , LinkedinUsers https://www.linkedin.com/in/s**-s**-s******-d***** (full URL with real name) 2. Government Identity Documents and Biometrics ¶ Data Element Database Column(s) Source Table(s) Format Observed in Sample Government ID verification outcome governmentIdStatus IDVerificationChecks not_applicable , passed , failed Liveness detection result livenessStatus IDVerificationChecks Binary pass/fail — confirms a live facial scan was performed Facial comparison thumbnail thumbnail_key (in providerResponse JSON) IDVerificationChecks intr_AAABnNOWs0wnj7Tmg0hBQpL5_thumbnail.jpg — a stored facial image key Persona KYC session token sessionId , sessionToken IDVerificationChecks face_baseline_intr_AAABnNOWs0wnj7Tmg0hBQpL5 — replayable session ID Persona account identifier persona_account_id (in providerResponse JSON) IDVerificationChecks act_QMTuQh33A4QU23J8ECPSd32BBKb4 Address verification status addressStatus IDVerificationChecks Confirms whether home address was verified against government records Verification attempt count attemptNumber , maxAttempts IDVerificationChecks Tracks repeated identity verification attempts Note: The cloud storage buckets ( mercor-background-check-photos , certn-api-s3-one-id-images , certn-api-s3-certn-rcmp-documents ) reportedly contain the actual document images — passports, driver's licenses, and RCMP criminal record documents — referenced by these database records. 3. Financial and Banking Data ¶ Data Element Database Column(s) Source Table(s) Format Observed in Sample Bank name bank_name (in accountDetails JSON) MercorUserFinancials BANK OF M******* (plaintext) Bank routing number routing_number (in accountDetails JSON) MercorUserFinancials 000**-*** (full routing number in plaintext) Bank account last 4 digits last4 (in accountDetails JSON) MercorUserFinancials 07** Bank account holder name account_holder_name (in accountDetails JSON) MercorUserFinancials H****i A****a (full legal name on bank account) Stripe Express account ID providerMethodId , stripeAccountId UserPaymentMethods , MercorUsers_New acct_1Rc***** Full Stripe account JSON accountDetails MercorUserFinancials Complete Stripe API response including all fields above plus charges_enabled , payouts_enabled , default_currency , TOS acceptance timestamp, and external account details Wise transfer & quote IDs wiseTransferId , wiseQuoteId WiseDisbursements Transfer identifiers for international payments Payment amounts totalPayableAmount , totalBillableAmount , totalAmount PaymentLineItems , MoneyOut_Audit , WiseDisbursements Amounts in cents (e.g., 250000 = $2,500.00) Pay rates payableRate , billableRate Jobs , Jobs_Audit Exact hourly/monthly compensation — both what contractor earns and what client pays Tax form status tax_form Jobs Tax filing status per contractor Stripe subscription ID stripeSubscriptionId Jobs Billing subscription identifier Payout schedule and currency schedule.interval , default_currency (in JSON) MercorUserFinancials daily payout with 7 day delay, currency cad Payment failure reasons dispatchFailureReason , failureReason PaymentLineItems , MoneyOut_Audit , WiseDisbursements Structured failure codes revealing payment issues The MercorUserFinancials.accountDetails field is particularly egregious — it stores the complete Stripe Connect API response as a JSON blob, which includes the contractor's full legal name, personal email, bank name, routing number, last four digits of the account, account holder name, country, currency, and TOS acceptance details. This is not a reference or a token — it is the raw financial identity of each contractor stored in a single database column. 4. Employment and Performance Records ¶ Data Element Database Column(s) Source Table(s) Format Observed in Sample Employment contract terms payableRate , billableRate , commitment , expected_hours , startDate , expiresAt Jobs , Jobs_Audit Full contract terms including pay rate, hours, and duration Signed offer letters offerLetter Jobs , WorkTrial_Audit S3 key or base64 encoded signed legal document Digital signatures signature Jobs , WorkTrial_Audit , WorkAuthorization_Audit Contractor's digital signature on legal agreements CIIA/NDA agreements ciiaa_direct , ciiaaPassthrough Jobs , WorkTrial_Audit Confidentiality and IP assignment agreements Terms of work tow Jobs , WorkTrial_Audit Full terms of engagement Safety waiver safety_waiver Jobs Safety waiver acceptance Dismissal date and reason dismissalDate , dismissalReason , dismissalFlag Jobs , JobPerformanceReviews_New Date of termination and categorized reason Offboarding reason Offboarding Reason MLExperimentsJobPerformanceReviews Plaintext offboarding justification Performance scores score , Quality of Work , Engagement , performanceScore JobPerformanceReviews_New , MLExperimentsJobPerformanceReviews , ContractorPerformance_New Numeric ratings with text justifications Performance review text reviewNotes , Justification for rating , performanceSummary , jobPerformanceSummary JobPerformanceReviews_New , MLExperimentsJobPerformanceReviews Free-text evaluations of individual contractors Reviewer identity reviewedBy , Reviewer JobPerformanceReviews_New , MLExperimentsJobPerformanceReviews Named Mercor staff who wrote the review (e.g., A*** K***** ) Client project name Account , Project , projectName MLExperimentsJobPerformanceReviews , JobPerformanceReviews_New OpenAI , Apertus - Elephant — links contractor performance to specific client The MLExperimentsJobPerformanceReviews table is especially damaging: it contains the contractor's full name , email , client company name (e.g., OpenAI ), project name , reviewer's name , quality score , engagement score , offboarding reason , and a free-text justification — all in a single row. Sample: A***** D**** , a*****[email protected] , OpenAI , Apertus - Elephant , reviewed by A*** K***** , rated 4 - Redefines Expectations . 5. Criminal Background and Adverse Media Checks ¶ Data Element Database Column(s) Source Table(s) Format Observed in Sample Criminal background check status status BackgroundCheck , BackgroundCheck_New clear / consider (whether criminal history was flagged) Adverse media check status adverseMediaCheckStatus BackgroundCheck Whether negative news/media was found about the individual Background check package package BackgroundCheck e.g., tasker_pro — defines which checks were run RCMP criminal record documents Referenced via S3 bucket certn-api-s3-certn-rcmp-documents-ca-central-1-production Royal Canadian Mounted Police criminal record check documents External candidate ID at Checkr/Certn externalCandidateId , backgroundCheckId , reportId BackgroundCheck Cross-references to external background check providers Work location for check workLocation BackgroundCheck Country/jurisdiction of background check 6. Work Authorization and Immigration Status ¶ Data Element Database Column(s) Source Table(s) Format Observed in Sample Work authorization status workAuthorizationStatus UserMetadata , Candidates , WorkAuthorization_Audit Whether individual is authorized to work in a given country Physical country vs. residence country physicalCountry vs. residenceCountry UserLocation , WorkAuthorization_Audit Mismatch between these fields is flagged as fraud — revealing who may be working from an unauthorized location Location attestation with signature agreedToLocation , signature , attestedAt WorkAuthorization_Audit Signed attestation of physical work location Work authorization status is classified as sensitive personal data under GDPR and many state privacy laws. Its exposure, combined with physical location data and location mismatch fraud flags, could be used to identify individuals working from countries where they lack authorization — creating potential immigration enforcement risk. 7. Device Fingerprints, Network Identifiers and Surveillance Data ¶ Data Element Database Column(s) Source Table(s) Format Observed in Sample IP address ip InsightfulScreenshots 71.194.*.* (full IPv4 address, geolocatable) MAC address gateways InsightfulScreenshots ["1C:93:7C:**:**:**"] (unique hardware identifier) Hardware fingerprint (HWID) hwid InsightfulScreenshots 8f9f16f0-1fb7-47e4-a2a1-209838aa5c5e (persistent device ID) Computer hostname computer InsightfulScreenshots desktop-ue2kgro Operating system & version os , osVersion InsightfulScreenshots win32 , 10.0.19045 Application file path appFilePath InsightfulScreenshots C:\Program Files\Google\Chrome\Application\chrome.exe Active window title windowTitle InsightfulScreenshots Full window title revealing document/conversation content Browser URL visited browserUrl InsightfulScreenshots Full URL being viewed at time of screenshot Desktop screenshot image storageUrl InsightfulScreenshots Direct S3 URL to actual screenshot image file Productivity score externalProductivityScore InsightfulScreenshots Numeric productivity rating per screenshot interval Timezone timezone InsightfulScreenshots , Timelog America/Chicago — reveals approximate geographic location Session duration duration , timeStart , timeEnd Timelog Exact milliseconds worked per session Pay deduction reason reasonForDeduction , appName Deductions Why money was subtracted from pay, linked to specific application The combination of IP address + MAC address + HWID creates a triple device fingerprint that uniquely identifies not just the person but the specific physical machine they used. Under GDPR, device fingerprints are explicitly classified as personal data. Under CCPA, unique device identifiers constitute personal information. 8. Fraud Profiling and Algorithmic Decision-Making ¶ Data Element Database Column(s) Source Table(s) Format Observed in Sample Fraud probability score posteriorProbability , modelScore FraudEvents , FraudSignalAuditLog Bayesian probability (0.0–1.0) that individual is fraudulent Fraud decision currentDecision , status FraudStates , FraudCheck APPROVE / ESCALATE / REJECT — algorithmic verdict on individual LLM-generated fraud reasoning currentReasoning , manual_review_rational FraudStates , FraudCheck AI-written paragraph explaining why individual was flagged: "The primary concern is a maximum location mismatch score of 1.0, indicating the user's IP address is entirely inconsistent with their stated profile location..." Fraud signal inventory currentKeySignals , flag_reasons FraudStates , FraudCheck ["location_mismatch: 1.0", "email_diff: 0.125", "email_is_pwned: False"] HaveIBeenPwned result email_is_pwned (in signals) FraudStates Whether contractor's email was found in known data breaches VPN/Tor detection Referenced in fraud signals FraudStates , FraudSignalAuditLog Whether VPN or Tor usage was detected Cheating detection isCheating , cheatingProbability , signs CheatingDetection Whether individual was flagged for cheating during interviews Duplicate account detection userIdList DuplicateGroups Groups of accounts believed to belong to the same person Automated fraud decisions directly impacted individuals' ability to earn income through the platform. Under GDPR Article 22, individuals have the right not to be subject to decisions based solely on automated processing that produce legal or similarly significant effects. The exposure of the complete fraud reasoning — including the LLM-generated explanations — reveals the inner workings of an automated decision-making system that determined whether people could work and earn money. 9. Communications and Third-Party PII ¶ Data Element Database Column(s) Source Table(s) Format Observed in Sample In-platform message content content Comms , CommsSent Full text of messages between contractors, recruiters, and clients Outreach email content subject , content , messageTemplate EmailTemplates , OffPlatformCampaignSteps Full email templates with subject lines Phone call logs Call metadata AircallComms Aircall VoIP call records Professional reference PII name , email , company , relationship UserReferences Third parties' names, emails, and employers — people who did not sign up for Mercor LinkedIn profiles of non-users linkedinUrl , email LinkedinWarmIntros Full LinkedIn URLs and email addresses of people contacted for warm intros Voucher/endorser PII voucherUserId , candidateEmail , candidateName , candidateLinkedinId CandidateVouches Names, emails, and LinkedIn IDs of both vouchers and vouched-for candidates Recruiter notes noteBody , notesForCandidate ListingNotes , Candidates Candid internal commentary about individuals The exposure of third-party PII is particularly significant for legal liability. UserReferences contains the names, email addresses, employers, and relationships of professional references — individuals who never created Mercor accounts and never consented to having their data stored in Mercor's production database. LinkedinWarmIntros contains LinkedIn URLs and emails of people contacted for recruitment outreach. These third parties had no contractual relationship with Mercor and no opportunity to consent to or opt out of data collection. 10. PostHog Behavioral Analytics De-Anonymized ¶ Data Element Database Column(s) Source Table(s) Format Observed in Sample User email linked to analytics session userEmail PosthogAnalytics Personally identified analytics sessions (defeating anonymization) Company context company PosthogAnalytics Which company the user was associated with during the session Session timing startTimeUtc , endTimeUtc PosthogAnalytics Exact session start/end times Active/inactive time activetime , inactivetime PosthogAnalytics How long the user was actively engaged vs. idle Entry URL startUrl PosthogAnalytics The URL the user was on when the session started PostHog sessions are typically anonymous or pseudonymous. The PosthogAnalytics table explicitly links userEmail to session data — effectively de-anonymizing behavioral analytics and creating a personally identifiable record of how each contractor and company user navigated the platform. Legal Significance of This PII Inventory ¶ Any single category above would trigger breach notification obligations under most privacy laws. The combination creates exposure across multiple overlapping regulatory regimes: Regulation Applicable Data Key Provisions GDPR (EU/UK) All categories — Mercor processes data of EU/UK contractors (sample shows United Kingdom, Harrow residence) Articles 5, 6, 9 (special categories), 13-14 (transparency), 22 (automated decisions), 33-34 (breach notification within 72 hours) CCPA/CPRA (California) Personal identity, financial, employment, device identifiers, behavioral analytics Right to know, right to delete, right to opt-out of sale/sharing, private right of action for data breaches resulting from failure to maintain reasonable security Illinois BIPA Facial geometry scans from Persona liveness detection, facial comparison thumbnails stored as image keys $1,000–$5,000 per violation statutory damages, private right of action, no harm requirement FCRA (Federal) Background check results, adverse media checks, fraud decisions used for employment decisions Requires permissible purpose, adverse action notices, accuracy obligations, private right of action ECPA / Wiretap Act Desktop screenshots capturing communications, browser URLs, window titles Consent requirements for interception of electronic communications State Data Breach Notification Laws (all 50 US states) Name + financial account number, name + SSN, name + government ID Mandatory notification to affected individuals, typically within 30-60 days PIPEDA (Canada) All categories — sample shows Canadian contractor ( country: CA , BANK OF M******* , routing_number: 000**-*** ) Breach notification to Privacy Commissioner and affected individuals SOX / PCI-DSS Financial account data, payment card information if present, bank routing numbers Compliance obligations for financial data handling The exposed data supports claims for: Negligence — Failure to implement reasonable security measures for highly sensitive personal data Breach of contract — Violation of privacy commitments made to contractors in terms of service and privacy policies Breach of fiduciary duty — Mishandling of financial and identity data entrusted to Mercor as an employment intermediary Violations of specific statutes — BIPA (facial geometry scans from Persona KYC liveness detection), FCRA (background check data used in employment), CCPA (failure to maintain reasonable security), GDPR (multiple articles) Unjust enrichment — Mercor profited from collecting and processing this data without adequately protecting it Third-party claims — Professional references, LinkedIn contacts, and vouching parties whose data was collected without direct consent The Scope - Who Is Affected ¶ The breach affects multiple distinct populations, each with different legal standing: Contractors (Primary Class) — Every person who signed up, completed an interview, or performed work through Mercor has their full PII exposed: full legal name, personal email, phone number, date of birth, home address, government ID verification status, bank name and routing number , employment terms with exact pay rates, performance reviews with dismissal reasons, and in many cases desktop screenshots of their computer screens while working. The MercorUserFinancials table alone contains sufficient information for bank account fraud — the bank name, routing number, last four digits of account number, account holder name, and country are all stored in plaintext JSON. Client Companies — Companies that hired through Mercor have their project names (including OpenAI , Apertus - Elephant ), internal tooling references, billing details, hiring criteria, candidate evaluation notes, Slack workspace URLs, Okta SSO group configurations, and annotation platform URLs exposed. These include some of the most valuable and secretive AI organizations on the planet. Mercor Employees — Internal staff are identifiable through the IacDeploymentRuns table (GitHub usernames as actor fields), CatfishAuditLog (Slack user IDs and real names), DATABASECHANGELOG (migration author names), MLExperimentsJobPerformanceReviews (reviewer names like A*** K***** ), and the IAM table (users with ghost role assignments within client companies). Third Parties Who Never Consented — Professional references ( UserReferences ) provided their name, email, employer, and relationship to the contractor. LinkedIn contacts ( LinkedinWarmIntros ) had their profile URLs and email addresses stored. Vouching parties ( CandidateVouches ) provided detailed relationship information. These individuals had no direct contractual relationship with Mercor, likely received no privacy notice, and had no opportunity to consent to or opt out of data collection. Their data was collected incidentally through the contractors they were associated with. The Scale - Mercor Client Ecosystem ¶ What elevates this breach from a typical startup data leak to an industry-wide crisis is who Mercor's clients are . Meta, OpenAI, and Google DeepMind are among Mercor's publicly known clients — as reported by the Wall Street Journal — but even our small sample reveals direct evidence of engagements with at least six major technology companies , plus numerous additional clients identifiable through project codenames and Airtable workspace names. Confirmed Client Engagements Found in the Sample ¶ The sample file contains not just the production database tables but also an ./EXPORTS/ directory with full Airtable workspace dumps — organized by client name. These exports contain the actual work product: prompts, model responses, evaluation rubrics, and contractor submissions. The client names appear directly in the directory structure: Client Evidence in Sample What Was Exposed Apple Airtable workspace: AIRTABLE_APPLE_ENDPOINT_SANDBOX_APP3PG4U42BALES9K containing tables: TEXT , DEEP_L , TEXT_ORCHESTRATOR , RUBRIC_AUTO_GEN Apple's proprietary AI model outputs. The TEXT table contains prompt-response pairs from Apple Foundation Models ( afm-text-083 , afm-model-085 , afm-model-086 ) — Apple Intelligence's internal language models. Sample: model afm-text-083 responding to user prompts with temperature=0.7, top_p=0.9. The DEEP_L table shows translation evaluation (text→Spanish). The TEXT_ORCHESTRATOR table shows orchestrator model ( afm-model-086 ) being tested. This is pre-release Apple Intelligence evaluation data. Amazon Airtable workspace: AIRTABLE_AMAZON_LLM_COT_EVALUATION___UPDATED_APP0JM1SJ4XOHMAQC containing tables: DOMAINS , PHASE_1_TASKS , PHASE_1_REVIEWS , TALENT Amazon's LLM Chain-of-Thought evaluation data. The DOMAINS table shows evaluation categories ( math , stem ). The PHASE_1_TASKS table contains full model A vs. model B comparison data with complete Chain-of-Thought reasoning traces, final responses, and preference judgments. Tasks are claimed by named Mercor staff (e.g., n****[email protected] ). This exposes Amazon's internal model evaluation methodology and scoring rubrics. OpenAI Performance review record: Account: OpenAI , Project: Apertus - Elephant , reviewed by named staff. Feather platform URL: feather.openai.com/campaigns/998855ab-... . Project codename in Projects_Audit . Named contractor ( A***** D**** , a*****@gmail.com ) rated 4 - Redefines Expectations on OpenAI project work. Direct URL to OpenAI's internal Feather annotation platform with campaign UUID. Anthropic Airtable workspace: AIRTABLE_API_PREFERENCE containing PROMPTS , RESPONSES , ROLES , DOMAINS tables. Project: GPT-4 vs Claude Evaluation comparing GPT-4 and Claude 3.5 Sonnet. AgentSandboxes table shows agentType: claude . LLM preference evaluation data comparing Anthropic's Claude 3.5 Sonnet against GPT-4 across use cases. AI coding agent sandbox sessions running Claude. Exposes model comparison methodology and evaluation criteria. Meta Publicly confirmed client per WSJ. Project references in Projects_Audit and ProjectIntegrations . Contractor work product, project configurations, Slack workspace integrations. Google DeepMind Publicly confirmed client per WSJ. Contractor work product and project data in the full database. Airtable Workspace Inventory ¶ The sample file reveals 25+ distinct Airtable workspaces that were exported as part of the breach. Each workspace name follows a pattern that often includes the client name or project identifier. Beyond the named clients above, the Airtable exports include: Airtable Workspace Domain Notable Tables APEX_LEGAL APEX benchmark - Legal TASKS , CRITERIA , TALENT , LLM_CALL_CONFIGURATION APEX_INSURANCE APEX benchmark - Insurance TASKS , CRITERIA , TALENT , IMPORTED_TABLE APEX_DATA_SCIENCE APEX benchmark - Data science TASKS , CRITERIA , TALENT , LLM_CALL_CONFIGURATION APEX_MECHANICAL_ENGINEERING APEX benchmark - Engineering TASKS , HELPER , FAILURE_ANALYSIS , TALENT APEX_DIY APEX benchmark - DIY/consumer TASKS , CRITERIA , TALENT ATHENA_HLE___RUBRICS Athena HLE (Humanity's Last Exam) rubrics TASKS , MODEL_RESPONSES , AWAITING_REVIEW_METRICS ATHENA_HLE__STEM_ Athena HLE STEM evaluation ATHENA_STEM_V_1 , QA_SPECS BEAR_MEDICINE Medical domain tasks DISCIPLINES , REVIEWER_ASSESSMENT , WRITER_DAILY_ACTIVITY , BONUS_PAYOUTS , PODS AIME_RUBRICS AIME (math competition) rubrics TEAMS , TASKS , USERS ARXIV_Q_A (multiple versions) Academic paper Q&A generation WORK_QUEUE , DOUBLE_BLIND , LEAD_AUDIT_QA , TESTING_ARXIV_LINKS AUTO_REVIEWER Automated review system SUBMISSIONS , LLM_CALL_CONFIGURATIONS , PROJECTS 09_29_CAND_MODEL_EVAL Candidate model evaluation (IB1, IB2, CML) IB_1 , IB_2 , CML , CML_DEPRECATED_ API_PREFERENCE API preference evaluation PROMPTS , RESPONSES , ROLES , DOMAINS , PROMPT_TEMPLATES APEX_EXPANSION_WEBSITE_TASKS Website-related expansion CRITERION , FILE , TASK APEX_EVALS General evaluation framework EVALUATION_RESULTS APEX_V1_REVISION Apex V1 revision EXPERT , RUBRIC , CRITERION , ROLE The ATHENA_HLE workspaces are particularly significant — "HLE" likely refers to Humanity's Last Exam , a high-profile AI benchmark designed to test frontier model capabilities. The MODEL_RESPONSES table in the rubrics workspace suggests Mercor contractors were grading AI model outputs against this benchmark, and the AWAITING_REVIEW_METRICS table indicates an active review pipeline. If this data reached adversarial actors, it could be used to game or contaminate one of the most important AI evaluation benchmarks. The BEAR_MEDICINE workspace reveals medical domain annotation work with DISCIPLINES , REVIEWER_ASSESSMENT , and WRITER_DAILY_ACTIVITY tables — indicating Mercor contractors were creating or evaluating medical AI training data, adding healthcare data to the breach's sensitivity profile. Evidence from Named Projects in the Database ¶ Beyond the Airtable exports, the production database tables contain additional project references: Project Codename Domain Evidence Source Apertus — Elephant AI model evaluation (OpenAI-linked) MLExperimentsJobPerformanceReviews : Account: OpenAI Project Mega Large-scale annotation (dedicated Slack workspace: project-mega.slack.com ) ProjectIntegrations , ActionsQueue Panacea — Consulting RL Envs Reinforcement learning environments Projects_Audit , 400+ billable hours Agentic Code Final QC Audit AI code generation quality control (GitHub issue solving) TaskDefinitions GPT-4 vs Claude Evaluation LLM preference ranking (GPT-4 vs Claude 3.5 Sonnet) Airtable export: AIRTABLE_AIRTABLE_AI_AGENT_DEMO Creative Writing Evals Creative content evaluation Projects_Audit arXiv Q&A Academic paper Q&A generation (multiple Airtable versions incl. Snowflake integration) Airtable exports (3+ copies with dates) Queensland (litigation) Legal domain Projects_Audit FP&A / Corporate Finance Finance domain Projects_Audit Obsidian Human data client ( billingModel: "invoice" , tagged humandataclient ) Company The Magnificent Seven, Frontier AI Labs, and the Competitive Fallout ¶ Mercor is not a niche startup. According to Big Think and TechCrunch , Mercor has signed deals with six of the seven "Magnificent Seven" tech giants — Apple, Microsoft, Alphabet, Amazon, Meta, and Nvidia — plus frontier model developers OpenAI and Anthropic . The company employs over 30,000 contractors, pays an average rate of $95/hour, and reached a $500 million annual revenue run rate within 17 months of launch. It is valued at $10 billion. This means the stolen data — the 211GB database, the 939GB of source code, the 3TB of cloud storage, and the 84 Airtable workspaces documented above — contains the operational records, AI training data, and work product for engagements touching nearly every major AI program in the Western world . The small sample analyzed in this report already confirms direct evidence of work for Apple (Foundation Model outputs), Amazon (LLM Chain-of-Thought evaluation), OpenAI (Feather platform, Apertus project), Anthropic (Claude evaluation), and Meta (multimedia annotation templates). The full 211GB database — which we have not seen — would contain the complete records for all six Magnificent Seven clients plus the frontier labs. The competitive implications are severe: The training data itself is the prize. The leaked RLHF annotations, model evaluation data, and preference rankings produced by Mercor's contractors represent billions of dollars in training data investment. This data — now in the hands of Lapsus$ and available to any buyer — could be used by any competitor to accelerate their own model development without incurring the cost of generating it. As Y Combinator president Garry Tan noted : "Incredible amount of SOTA training data now just available to China thanks to @mercor_ai leak. Every major lab. Billions and billions of value." Apple Foundation Model outputs are in the dump. The AIRTABLE_APPLE_ENDPOINT_SANDBOX workspace contains actual afm-text-083 and afm-model-086 model responses — pre-release Apple Intelligence outputs. These provide direct insight into Apple's model capabilities, safety alignment approach, and weaknesses before public release. Any competitor — whether a Silicon Valley rival or a lab in Beijing, London, or Tel Aviv — now has access to Apple's unreleased model behavior. Amazon's Chain-of-Thought evaluation methodology is exposed. The AIRTABLE_AMAZON_LLM_COT_EVALUATION workspace reveals how Amazon evaluates LLM reasoning quality, including the full prompts, complete Chain-of-Thought traces, and preference rubrics. The methodology itself is as valuable as the data — it reveals what Amazon considers "good reasoning" and how they measure it. The Anthropic/Claude evaluation data could inform adversarial attacks. The preference evaluation data comparing Claude 3.5 Sonnet against GPT-4 — including the exact prompts, response pairs, and preference reasoning — could be used to identify weaknesses in Claude's alignment or to train models that specifically exploit those weaknesses. Mercor's global contractor base spans dozens of jurisdictions. With 30,000+ contractors across many countries, Mercor's database contains work authorization records, physical location data, and IP-based geolocation. The platform's fraud detection system flags contractors whose physical IP doesn't match their declared residence — meaning the database contains a map of which contractors may be working from undisclosed locations. Beyond the companies confirmed in the data, multiple sources — including former Mercor employees — claim that Mercor also maintains engagements with Chinese AI laboratories , including companies developing frontier models that compete directly with the labs whose training data is now in the breach. If true, this means Mercor was a single point of compromise connecting competing labs on opposite sides of the global AI race, with training data, evaluation methodologies, model outputs, and contractor talent pools for all of them sitting in the same breached infrastructure. Even setting aside the question of direct Chinese client relationships, the stolen data — RLHF annotations, preference rankings, model evaluation rubrics, and Chain-of-Thought traces produced for OpenAI, Anthropic, Apple, Amazon, Meta, and Google — is now available on the black market. Given that Lapsus$ is actively auctioning the data, this material will reach whoever is willing to pay for it. The TaskDefinitions table also references autograder configurations using openai/gpt-4.1 and openai/gpt-5 as scoring models, and task rubrics include constraints like "LLMs other than ChatGPT are prohibited" — rules that only make sense when the work product is destined for a specific model vendor's training pipeline. The scope of client engagements extends far beyond AI companies. The Airtable workspaces alone span legal, insurance, data science, mechanical engineering, medicine, academic research, and mathematics — suggesting Mercor's contractor workforce touches data and systems across a wide range of industries. Any attacker with access to the full dump could enumerate every active client engagement by cross-referencing the Company , Projects_Audit , ProjectIntegrations , Listings_New tables, and the complete Airtable export directory. The Airtable Export - 84 Workspaces, 1055 Files ¶ A separate directory tree from the breach ( EXPORTS/ ) reveals the full structure of the exfiltrated Airtable data. The export contains 84 unique Airtable workspaces totaling 1,055 JSONL files — each file containing the complete contents of one Airtable table. This is not a sample. It is the complete export of every Airtable base connected to Mercor's Fivetran data pipeline. The directory structure reveals how Airtable sits at the center of Mercor's operation. It is used as: The annotation task management system — Every domain-specific project has its own Airtable base with a standardized schema: TASKS , TASK_VERSIONS , CRITERIA , DOMAIN , SUBDOMAIN , TALENT , QA_SPECS , WORKFLOW , LLM_CALL_CONFIGURATION , CONTROL_PANEL , and FILES . This is a fully industrialized annotation pipeline. The work product repository — Tables like PHASE_1_TASKS (Amazon), TEXT (Apple), PROMPTS / RESPONSES (API Preference), and MODEL_RESPONSES (Athena HLE) contain the actual task inputs and outputs — the prompts sent to AI models, the model responses, and the human evaluations. This is the training data itself. The talent and compensation ledger — TALENT tables appear in nearly every workspace, tracking which contractors worked on which tasks. CALCULATED_BONUSES , BONUS_PAYOUTS , TIMELOG , and CLAIMS tables track compensation. WRITER_STATS , REVIEWER_STATS , and WRITER_DAILY_ACTIVITY tables (in BEAR_MEDICINE ) track individual productivity. The QA and audit system — QA_SPECS , LEAD_AUDIT_QA , DOUBLE_BLIND , and REVIEWER_ASSESSMENT tables track quality control processes. The named workspaces can be organized into categories that reveal the full breadth of Mercor's operations: Client-Named Workspaces (Direct Client Evidence): Workspace Client Content APPLE_ENDPOINT_SANDBOX Apple Apple Foundation Model outputs ( afm-text-083 , afm-model-086 ), translation testing ( DEEP_L ), orchestrator testing ( TEXT_ORCHESTRATOR ), rubric auto-generation AMAZON_LLM_COT_EVALUATION (2 versions) Amazon LLM Chain-of-Thought evaluation: DOMAINS , PHASE_1_TASKS , PHASE_1_REVIEWS , MODEL_A_STRENGTHS AAIE___META_MULTIMEDIA_TEMPLATE_COMMAND_CENTER Meta Meta multimedia annotation template with OVERALL_META , PROJECTS , FORMS , and TEMPLATE tables. Workspace name explicitly says "META" and "USE META_X_MULTIMEDIA_SPL_AIRTABLE_TEMPLATE" API_PREFERENCE / API_PREFERENCE_V2 / API_PREFERENCE__COPY__FOR_BRENDAN / API_PREF___KANIX Anthropic/Multi-vendor LLM API preference evaluation: PROMPTS , RESPONSES , ROLES , DOMAINS , PROMPT_TEMPLATES , QA . Multiple versions and personal copies for named staff APEX - Mercor's AI Benchmark Suite (Compromised): The APEX_ prefix identifies Mercor's proprietary suite of AI benchmarks — domain-specific evaluation frameworks used to measure AI model performance across verticals. Each APEX benchmark has its own Airtable workspace with a standardized schema: TASKS , TASK_VERSIONS , CRITERIA , DOMAIN , SUBDOMAIN , QA_SPECS , WORKFLOW , LLM_CALL_CONFIGURATION , and CONTROL_PANEL . The complete APEX suite spans 15+ domains: Workspace Benchmark Domain Notable Tables APEX_LEGAL Legal reasoning Standard APEX schema APEX_INSURANCE Insurance domain Standard APEX + IMPORTED_TABLE APEX_FINANCE Financial services Standard APEX + HELPER APEX_ACCOUNTING Accounting Standard APEX APEX_CONSULTING Management consulting Standard APEX + TEST_HEX_TABLE APEX_DATA_SCIENCE Data science Standard APEX APEX_MECHANICAL_ENGINEERING Engineering Standard APEX + FAILURE_ANALYSIS , HELPER APEX_MEDICINE Medical/healthcare Standard APEX APEX_FOOD Food industry Standard APEX + DELIVERIES APEX_GAMING Gaming Standard APEX APEX_RETAIL___E_COMMERCE Retail & e-commerce Standard APEX + DOMAIN_QC APEX_SALES___MARKETING Sales & marketing Standard APEX APEX_SHOPPING_STYLISTS Personal shopping Standard APEX APEX_DIY (2 versions) DIY/consumer Standard APEX APEX_WEBSITE_TASKS / APEX_EXPANSION_WEBSITE_TASKS Web content CRITERION , FILE , TASK The exposure of the complete APEX benchmark suite — including all tasks, criteria, scoring rubrics, and LLM_CALL_CONFIGURATION — renders these benchmarks untrustworthy . Any AI model trained on the leaked APEX data will appear to perform well on these benchmarks without genuinely possessing the evaluated capabilities. This is benchmark contamination at scale. Unless Mercor rebuilds the entire APEX suite from scratch with new tasks, new criteria, and new evaluation data, every APEX benchmark result produced after this breach is suspect. The EVALS workspace — which contains APEX_RESULTS , BOREALIS_RESULTS , and LUCIUS_RESULTS — further confirms that APEX was actively used to evaluate and compare models, making the contamination risk concrete and immediate. Other AI Benchmark and Evaluation Workspaces: Workspace Purpose Notable Tables ATHENA_HLE___RUBRICS Humanity's Last Exam rubric grading MODEL_RESPONSES , AWAITING_REVIEW_METRICS , CLAIMS ATHENA_HLE__STEM_ (4 versions incl. July 3, 2025 dated copy) HLE STEM vertical evaluation ATHENA_STEM_V_1 APEX_HLE_BASED_RUBRICS HLE-derived rubric system CRITERIA , LLM_CALL_CONFIGURATION APHRODITE__SEARCH_HLE Search-based HLE evaluation HLE search variant ACADEMIC_REASONING_SFT Supervised fine-tuning for academic reasoning COT (Chain-of-Thought), ROLES , TALENTS AIME_RUBRICS AIME math competition rubrics TEAMS , USERS , TASKS EVALS / EVALS__COPY_ General evaluation framework APEX_RESULTS , BOREALIS_RESULTS , LUCIUS_RESULTS , _09_04_HLE_RUBRICS 09_29_CAND_MODEL_EVAL (5 versions) Candidate model evaluation (IB1, IB2, CML) Iterative model comparison datasets Medical Domain Workspaces: Workspace Purpose Notable Tables BEAR_MEDICINE Medical annotation DISCIPLINES , REVIEWER_ASSESSMENT , ASSESSMENT , WRITER_DAILY_ACTIVITY , REVIEWER_STATS , WRITER_STATS , ALL_TIME_TOP_5 , BONUS_PAYOUTS , CLAIM_LOCK , AHT_STATS , ASSESSMENT_ANALYSIS , PODS BEAR_RADIOLOGISTS Radiology-specific annotation Radiologist-specific tasks BANKERS Financial/banking domain Banking-specific tasks Aircall Integration (complete phone system export): The export also includes a full Aircall directory — Mercor's VoIP phone system — containing 27 tables : CALL , CALL_TRANSCRIPTION , CALL_TRANSCRIPTION_CONTENT_UTTERANCE , CALL_SENTIMENT , CALL_SENTIMENT_PARTICIPANT , CALL_SUMMARY , CALL_ACTION_ITEM , CALL_TAG , CALL_TOPIC , CONTACT , CONTACT_EMAIL , CONTACT_NUMBER , USERS , USER_AVAILABILITY , and more. This represents the complete call history including full transcriptions, sentiment analysis, AI-generated summaries, and contact information for every recruiter phone call. What the Airtable Export Means: The Airtable export transforms this breach from a database leak into a complete AI training data theft . The database tables documented in the rest of this article provide the metadata — who worked on what, when, and how much they were paid. The Airtable export contains the actual work product : every prompt, every model response, every human evaluation, every rubric score, every Chain-of-Thought trace, and every preference judgment that Mercor's contractors produced for Apple, Amazon, OpenAI, Anthropic, Meta, and dozens of other clients. The iterative versioning visible in the workspace names (e.g., APEX_RUBRICS with 12+ dated copies from August 7, 2025 through January 23, 2026) reveals that this export captured the complete historical evolution of Mercor's benchmark and evaluation pipeline — not just a snapshot, but the full development history of rubrics, task definitions, and evaluation criteria across months of refinement. For the APEX benchmarks specifically, this means every iteration of every benchmark task is now public — an attacker can study how the benchmarks evolved and craft model training data that targets the final versions. Customer and Third-Party Platform URLs Found in the Dump ¶ Beyond project codenames, the dump contains direct URLs to customer platforms, internal tools, and third-party services — embedded in configuration fields, JSON blobs, onboarding documents, and metadata columns across dozens of tables. An exhaustive search of the file reveals 1,800+ unique URLs . The most sensitive are catalogued below. Client Annotation and Work Platforms ¶ These are URLs to the actual platforms where Mercor contractors perform work for clients. Each one identifies a specific client engagement and, in many cases, a specific campaign or task within that client's systems: URL / Domain What It Reveals Source Table feather.openai.com/campaigns/998855ab-60e7-4aed-9f08-5fccd56fe53e OpenAI's internal Feather annotation platform — a specific campaign UUID, confirming Mercor contractors work directly inside OpenAI's tooling Projects_Audit (annotationPlatform) alabaster-studio.com/project/abacus/conversation/7c9facb4-... A client project management / collaboration platform — captured as the live browser URL during a monitored work session InsightfulScreenshots (browserUrl) glowstone-mli-rubrics.slack.com (channels: C0994P7BH2N , D09969QHV62 ) A client-specific Slack workspace for MLI rubric development — likely a client or partner organization's dedicated workspace ProjectIntegrations , ActionsQueue project-mega.slack.com A dedicated Slack workspace for a single large-scale annotation project ProjectIntegrations 6 distinct Airtable workspace IDs ( appX7l7xADlyFD3nL , appEzeshKTIKSrvBV , app9DBchZKUj2auMZ , appCZwMqiIUkP7KIQ , appLmn3266lQsaUXK , appYFQOZicXUoO2yz ) Airtable used as an annotation and project management platform — each app ID is a distinct workspace, likely per-client or per-project Projects_Audit , OnboardingDocument ta-01km6j8ztpd4vttvzb7ctgqteh-8080-ms3c95f46vnxcii7cwsi84ago.w.modal.host A Modal.com serverless deployment — indicating Mercor or a client runs ML model inference on Modal AgentSandboxes or service configuration Mercor Internal Infrastructure URLs ¶ These URLs expose Mercor's own internal architecture, allowing an attacker to map the entire operational surface: URL / Domain What It Reveals Source Table work.mercor.com Primary contractor work portal (100+ URLs with job IDs like /create/job_AAABm... ) Comms , ActionsQueue team.mercor.com Company-facing team portal Comms , EmailTemplates talent.docs.mercor.com/how-to/okta-access Internal documentation portal — includes onboarding guides for Okta and Insightful setup ActionsQueue api.mercor.com API gateway endpoint Configuration fields dev.coil.mercor.com Development webhook endpoint for the coil microservice ProjectIntegrations coil.mercor.com Production coil service endpoint ProjectIntegrations c-mercor.okta.com Okta SSO instance — the identity provider for all contractor and staff authentication ActionsQueue , UserMetadata linear.app/mercor Mercor's Linear issue tracker — exposes internal engineering project management Configuration metadata pic-gen.r2.mercor.com Cloudflare R2 image generation service Asset URLs ddcd-2601-642-4c01-5a8d-...ngrok-free.app An ngrok development tunnel — a temporary public URL exposing a local dev server, including the developer's IPv6 address embedded in the subdomain Webhook configurations AWS S3 Buckets ¶ Each S3 bucket below contains files that are directly addressable via URL if the bucket permissions are misconfigured. The bucket names alone reveal the categories of stored data: S3 Bucket Contents mercor-insightful-screenshots-production Every screenshot captured from contractor desktops during monitored work mercor-background-check-photos Background check identity documents and photographs ai-interviewer-recordings Audio/video recordings of AI-conducted interviews dailyco-recordings Daily.co video call recordings production-pdx-5557735*****-web-recordings Production call recordings (AWS account ID 5557735***** is embedded in the bucket name) kite-uhn-brain-injury.s3.ca-central-1.amazonaws.com Medical documents — bucket name references brain injury records at UHN (University Health Network), a major Canadian hospital system certn-api-s3-certn-images-ca-central-1-production Certn identity verification images certn-api-s3-certn-rcmp-documents-ca-central-1-production RCMP (Royal Canadian Mounted Police) criminal record check documents certn-api-s3-one-id-images-ca-central-1-production OneID government identity verification images The S3 bucket kite-uhn-brain-injury is particularly alarming — it suggests that either Mercor or a client project involved handling protected medical records, and the bucket name alone leaks the nature of the data and the institution involved. Google Workspace Documents ¶ The dump contains direct URLs to 30+ Google Docs , 2+ Google Sheets , 2+ Google Forms , and 10+ shared Google Drive folders used for project onboarding, task instructions, rubric definitions, and team coordination: docs.google.com/document/d/1111XpiZ9eZvH8X_... — Onboarding materials docs.google.com/document/d/1770ZnTy0_Yt-U-U7W... — Project documentation docs.google.com/spreadsheets/d/10LWCzAD1e-J8W7v... — Tracking spreadsheets docs.google.com/forms/d/e/1FAIpQLSdLnOJ9DZoq... — Assessment/intake forms drive.google.com/drive/folders/14eFptQgb2FjWoFh... — 10+ shared project folders Many of these Google Docs likely remain live and accessible if the sharing permissions are set to "anyone with the link" — a common practice for contractor onboarding materials. Communication and Collaboration Evidence ¶ Platform Evidence Count Slack 4 distinct workspaces: mercor.enterprise.slack.com , project-mega.slack.com , glowstone-mli-rubrics.slack.com , 6385b64336a9545.slack.com 4 workspaces, 5+ named channels Google Meet Meeting room codes: deo-ixih-ivt , cae-eois-jwn , hhr-erjm-svp , pmi-ogrs-aap , szd-qvcr-hfp , zoz-shgt-epy 6+ meeting rooms LinkedIn Contractor profile URLs with full names Multiple profiles Aircall Call recordings via media-web.aircall.io and assets.aircall.io Recruiter phone call audio Ashby HQ Job postings at jobs.ashbyhq.com and app.ashbyhq.com Hiring platform Certn Background check portals: mercor.certn.co/hr/applications/{uuid}/ , enrollment at certn.trustmatic.ws/web-enrolment/ Identity verification flows What This URL Inventory Means ¶ An attacker with this data does not need to guess what Mercor's clients are or what systems contractors access. The URLs are already in the database . Specifically: OpenAI's Feather platform URL with a campaign UUID gives an attacker a direct entry point to probe OpenAI's annotation infrastructure S3 bucket names allow targeted enumeration attacks — checking whether buckets are publicly accessible or brute-forcing object keys based on the naming patterns visible in the dump Google Docs and Drive folders may still be live and accessible if shared via link — giving an attacker access to project rubrics, onboarding materials, and task instructions Slack workspace identifiers enable social engineering against teams working on specific projects The ngrok tunnel URL embeds a developer's IPv6 address, adding another vector for targeting Mercor engineering staff The AWS account ID ( 5557735***** ) embedded in the S3 bucket name enables targeted cloud reconnaissance The Screenshot Problem ¶ The most dangerous element of this breach is the Insightful time-tracking screenshot system — and the danger compounds with every client Mercor serves, every platform URL catalogued above, and every S3 bucket of screenshots that can be systematically correlated. Mercor requires contractors to install the Insightful (formerly Workpuls) monitoring agent on their computers. This agent captures a screenshot of the contractor's desktop every few minutes while they are clocked in. Each screenshot is uploaded to mercor-insightful-screenshots-production.s3.amazonaws.com and indexed in the InsightfulScreenshots table with rich metadata: The full screenshot image (stored at a direct, addressable S3 URL — e.g., https://mercor-insightful-screenshots-production.s3.amazonaws.com/screenshots/[employeeId]/[timestamp]_[uuid].png ) The application open at the time ( appName , appFileName , appFilePath ) The window title (which often contains document names, code file paths, or chat conversations) The browser URL being visited (which can include feather.openai.com , client Airtable workspaces, or any of the platform URLs catalogued above) The contractor's IP address , MAC address (via gateways ), and hardware fingerprint ( hwid ) The contractor's timezone , OS version , and Insightful agent version A sample screenshot record from the dump shows a contractor working in Google Chrome on alabaster-studio.com/project/abacus/conversation/... — with their IP ( 71.194.*.* ), MAC address ( 1C:93:7C:64:**:** ), hardware ID, and full filesystem path to Chrome all recorded. Here is why this is catastrophic in context: The database contains all the ingredients for a systematic visual intelligence operation. An attacker can join tables to correlate screenshots with client projects and platform URLs: Which client project a contractor was assigned to (from ProjectIAM and Jobs ) Which annotation platform that project uses (from Projects_Audit.annotationPlatform — e.g., feather.openai.com , specific Airtable workspace IDs) Every screenshot taken while the contractor worked on that project (from InsightfulScreenshots filtered by contractorId and projectId ) The exact URLs, window titles, and application contents visible in those screenshots — cross-referenced against the known client platform URLs to confirm which client's systems are shown This means an attacker doesn't just get a list of Mercor's clients — they get a visual archive of what contractors saw inside those clients' systems . If the project was for OpenAI, the screenshots show OpenAI's Feather annotation interface, the prompts being graded, and the evaluation criteria. If the project was for Meta, the screenshots show Meta's internal tooling. If the project involved reinforcement learning environments, the screenshots show the RL training data and reward models. The scope of what these screenshots can reveal includes: Proprietary client code and architecture visible in IDE windows, terminal sessions, and browser tabs Annotation platform interfaces showing the exact tasks, rubrics, and datasets used to train frontier AI models Internal Slack channels and email threads visible in background windows — the ProjectIntegrations table confirms contractors are added to client Slack workspaces ( project-mega.slack.com , mercor.enterprise.slack.com ) Authentication tokens, API keys, and session cookies potentially visible in browser URL bars, developer tools, or terminal output Unreleased product features, research results, and trade secrets visible in dashboards or documents Other contractors' work and personal information if collaborative tools were open on screen Perhaps most critically, the screenshots create an involuntary record of contractor misconduct . As the Wall Street Journal has reported on the growing concerns around AI training data supply chains, contractors in these roles often have privileged access to sensitive client systems. If any contractor was engaged in unauthorized data exfiltration — copying proprietary datasets, screenshotting confidential research, leaking model weights, or otherwise violating their employment agreements — that activity was captured frame by frame by the monitoring system and is now available to anyone with the dump . The monitoring system that was designed to protect Mercor's clients has become a comprehensive, timestamped, visually indexed archive of everything those clients wanted to keep secret. This creates a cascading breach . Mercor's data exposure is not just a breach of Mercor — it is a proxy breach of every client organization whose internal systems, annotation platforms, Slack workspaces, and proprietary tooling were visible on a contractor's screen during monitored work sessions. The number of indirectly breached organizations equals the number of clients Mercor has ever served. Platform Overview ¶ Mercor presents itself publicly as an AI-powered hiring platform. The database tells a more complete story: it is a full-stack labor marketplace and employment management system that spans acquisition, vetting, matching, contracting, surveillance, and payment. The platform operates across at least three distinct product surfaces: Talent Portal — Where contractors create profiles, complete interviews, apply to listings, and track their work Company Portal — Where client companies post listings, review candidates, manage projects, and receive invoices Godmode / Internal Admin — An internal dashboard ( GodmodeCompanies , GodmodeArbitraryCells ) used by Mercor staff for operations The backend is a microservices architecture with at least 13 named services: coil , site_fe , team_fe , work_fe , mercor_go , mercor_api , mercor_api_nginx , celery , workflow , db_trigger_consumer , steve , woz , and payments_temporal_worker . These are deployed on AWS ECS and managed via Terraform/Terragrunt in the mercor-monorepo GitHub repository. The primary database is Aurora MySQL (AWS), with the analytics warehouse being Snowflake (evidenced by dbt model tables like DbtFirmSchoolRank and DbtSchoolRankings ). Schema migrations are managed by Liquibase (evidenced by DATABASECHANGELOG and DATABASECHANGELOGLOCK tables). Evidence - The Database Layer by Layer ¶ The following sections present a systematic walk through every domain of the exposed database, with obfuscated sample records drawn directly from the dump. This is the evidence base for the claims made above. Part I - User and Identity Layer ¶ The Contractor Profile ¶ At the core of Mercor's data model is the contractor. The MercorUsers_New table stores the primary user record, while MercorUsers_New_backup appears to be a historical snapshot. A sample (obfuscated): Field Value userId 7d10d057-0c11-438a-ace1-9a9c8a50c925 email e****[email protected] name T** O**** phone +44795718**** location United Kingdom, Harrow createdAt 2025-08-30 09:49:20 lastLogin 2025-09-20 09:16:33 insightfulId wesvspdyd5m3zg2 stripeAccountId NULL isDeleted 0 The insightfulId field is particularly significant — it links this user to their Insightful (formerly Workpuls) monitoring agent, meaning every screenshot taken of this person while working is tied to this identifier. The MercorUsers_New table extends the backup with additional fields: phoneVerificationStatus , phoneVerifiedAt , phoneOptIn — indicating ongoing additions to the user data model. The authType field suggests support for multiple authentication providers (Firebase, Google OAuth, email/password). Location and Residence Data ¶ UserLocation stores both declared residence and physical presence: Field Value residenceCountry USA physicalCountry USA residenceState NULL physicalState NULL The distinction between residence and physical country is central to Mercor's fraud detection logic — a mismatch between declared location and actual IP-derived location is one of the primary fraud signals. UserMetadata enriches the contractor record with: workAuthorizationStatus — eligibility to work in specific countries birthday — date of birth physicalLocation — freeform address field contractorMail — a Mercor-provisioned email address (e.g., @mercor.com ) oktaUserId / oktaAccountState — SSO integration maxContracts — cap on concurrent engagements fraudStatusEnum — a denormalized fraud verdict UserAvailability_Audit captures declared working hours: maxWeeklyHours , desiredWeeklyHours , expectedStartOffset , and timezone — allowing Mercor to understand contractor bandwidth and scheduling preferences. Referral and Social Vouching ¶ CandidateVouches is a comprehensive social trust mechanism. When a voucher endorses a candidate, they fill out a structured questionnaire: How did you know this person? (social platforms, working together, studying together, other) Why are they qualified? (skills, education, employer, expertise, other) Each field has a paired *Detail text field. This creates a rich graph of professional and social relationships. UserReferences stores professional references with names, companies, relationships, and contact emails — conventional hiring data now sitting in an exposed database. UserState tracks lifecycle metrics: resumeUploaded , interviewsCompletedCount , jobApplicationsCount , totalMillisWorked . Part II - Identity Verification and Fraud Detection ¶ The KYC Layer ¶ Mercor uses Persona as its identity verification provider. The IDVerificationChecks table records each check with: provider : persona source : e.g., interview-face-comparison sessionId : the Persona interview session token verificationStatus , governmentIdStatus , livenessStatus , addressStatus fraudDecision : NULL / escalated / approved providerResponse : full JSON blob from Persona's API A sample Persona response shows: { "type": "baseline", "interview_id": "intr_AAABnNOWs0wnj7Tmg0hBQpL5", "thumbnail_key": "intr_AAABnNOWs0wnj7Tmg0hBQpL5_thumbnail.jpg", "persona_account_id": "act_QMTuQh33A4QU23J8ECPSd32BBKb4" } The thumbnail key references a stored facial image from the verification session. BackgroundCheck and BackgroundCheck_New record criminal background and adverse media checks (via Checkr or Certn ): Field Example externalCandidateId Checkr candidate UUID workLocation USA package tasker_pro status clear / consider adverseMediaCheckStatus clear ScreeningPackage defines what checks are bundled per company engagement, including checkConfig (JSON with individual check types) and graceDays (how many days a contractor has to complete checks before being blocked). The Fraud Pipeline ¶ Mercor operates a multi-stage fraud pipeline that is one of the most sophisticated components in the database. It runs at four stages: profile , interview , post-interview , and on-project . FraudStates — The current fraud verdict per user, maintained as a state machine: Field Example Value userId 000087ef-2296-445c-b355-9d5e600e0af2 currentStage profile currentDecision ESCALATE currentConfidence medium currentReasoning "The primary concern is a maximum location mismatch score of 1.0, indicating the user's IP address is entirely inconsistent with their stated profile location..." currentKeySignals ["location_mismatch: 1.0", "email_diff: 0.125", "email_is_pwned: False"] The reasoning field contains LLM-generated natural language explanations — almost certainly from Vertex AI / Gemini based on the signal schema. FraudCheck — The central fraud queue: stage , interviewId , jobId — context of the check process_status , retryCount — pipeline execution state flag_reasons , automatedReasons , manual_review_rational , manual_review_signs assigned_to , assigned_on — human reviewer assignment splReview — special review flag FraudSignalAuditLog — Every individual signal evaluated: signalType — e.g., location_mismatch , email_is_pwned , vpn_detected modelName — which ML model produced the score modelScore — numeric confidence status — accepted / rejected FraudEvents — Bayesian belief updates per event: priorAlpha , priorBeta , priorProbability , priorStatus posteriorAlpha , posteriorBeta , posteriorProbability , posteriorStatus evidence — JSON describing what caused the update This is a textbook Beta-Binomial Bayesian fraud model — prior beliefs updated with evidence to produce posterior fraud probability estimates. ProductionFraudState — Final fraud disposition: fraudModality — type of fraud (identity, time, quality) source — automated / manual productionModelId — versioned model that made the call OnProjectFraudWindows — Time-based on-project fraud analysis: fraudType , flags , flagMetadata , windowMetadata , screenshotMetadata Analyzes patterns within work sessions using screenshot data CheatingDetection / CheatingDetection_Audit — Interview cheating detection: isCheating , cheatingProbability , signs Tracks whether candidates used external resources during AI interviews QAReviewLog — Manual fraud review outcomes: stage , signalType , decision , comments Assigned to specific reviewerId for human-in-the-loop adjudication AutoFraudChecks — Automated rule-based checks triggered on a schedule or event. DuplicateGroups — Groups of user IDs believed to be the same person ( userIdList ), with merge tracking ( mergedIntoGroupId ). Part III - The Hiring Pipeline ¶ Listings ¶ Listings_New is the job posting table. A Mercor listing is considerably more structured than a typical job board entry: Field Description title Job title description Full job description rateMin / rateMax Pay rate range hoursPerWeek Expected commitment payRateFrequency hourly / monthly workArrangement Remote / hybrid eligibleLocation Which countries can apply ineligibleResidenceLocation Explicitly excluded countries listingType Job category evaluationCriteria JSON rubric for ranking candidates automatedCommsOn Boolean — auto-send rejection emails automaticRejectionsOn Boolean — auto-reject below threshold timeToAutoReject Days until auto-rejection fires goalNumHires Target headcount referralBoost Bonus multiplier for referred candidates isExploreAlways Always appear on public explore page disableApplications Freeze new applications EvaluationCriteria stores the per-listing scoring rubric used during candidate ranking — each criterion has shortCriteria , type (hard filter or soft score), hardFilter boolean, and position for display ordering. ListingNotes stores internal recruiter notes per listing — including candid operational commentary. A sample (obfuscated): "33 leads confirmed on sheet by B ***** to send offers — @N*** to staff RM for conversion"* This reveals that Mercor staff are managing candidate pipelines directly, with named individuals responsible for conversions. Candidates ¶ Candidates / Candidates_Audit tracks every application: Field Description status applied / shortlisted / offered / rejected listingStepConfigId Which step in the hiring funnel notesForCandidate Recruiter notes visible to candidate birthday Date of birth at application time physicalLocation Where they were when applying workAuthorizationStatus Work eligibility rejectionReason Categorized rejection reason starred Recruiter-starred flag automaticRejectAt Scheduled auto-rejection timestamp numCommsSent / lastCommSentAt Outreach tracking referralId Linked referral if any CandidateMatchScores provides ML-generated match scores: matchScore — numeric compatibility score contextualSummary — LLM-generated natural language explanation of why this candidate fits this listing MercorScores stores the tournament-based ranking scores: mScoreRaw / mScoreNormalized — the MercorScore numComparisons — how many pairwise comparisons informed the score contextualSummary — LLM narrative on the candidate's standing aggregateFeatureScore — combined feature vector score PairwiseComparisons stores individual A/B comparisons: winnerResumeId / loserResumeId winnerUserId / loserUserId reasoning — LLM explanation of why candidate A beat candidate B This implements a Bradley-Terry tournament ranking model — candidates are repeatedly compared in pairs, with each comparison updating relative ranking scores. TalentViewSearchUsers and SharableTalentViewConfig enable companies to create saved talent searches and share curated candidate shortlists with colleagues. SharableTalentViewConfigUsers adds per-candidate evaluation data including likeCount , dislikeCount , and free-text feedback . Part IV - Interviews and Assessments ¶ AI Interview System ¶ Mercor's interview process is AI-conducted and rubric-graded. The Forms_Audit table reveals the full interview configuration: items — JSON array of interview questions evaluationCriteria — per-question scoring criteria assessmentRubricId — linked rubric allowCopyPaste — flag to restrict copy-paste (cheating prevention) allowFormRetakes / maxRetakeAttempts — retry policy prep — pre-interview preparation materials shown to candidate feedbackConfig — how/whether to share scoring feedback AssessmentRubrics defines the grading framework: title , instructions — rubric metadata sumScores , sumSquareScores , countScores — aggregate statistics across all uses of this rubric passThreshold — minimum score to pass AssessmentRubricItems_Audit stores individual rubric criteria: criteria — the evaluation criterion text shortName — abbreviated label points — maximum points format — scoring format (binary, scale, etc.) webSearch — whether web search context is provided to the grader smartScoring — whether AI auto-scoring is enabled type — criterion type FormSubmissions records every interview submission: responseStatus — submitted / abandoned / in_progress activeTimeSeconds — actual time spent on the form posthogSessionIds — linked PostHog analytics session assessmentVersionId — which version of the assessment was taken AssessmentEvalState tracks the grading pipeline: assessmentType , jobType , status retryCount , reason triggerSource , triggeredByUserId durationMs — how long grading took InterviewEvals stores scored results: communicationScore , technicalScore qaPairScores — per question-answer pair scores InterviewIssues records reported problems during interviews: issue — issue type (technical problem, suspected cheating, content issue) source — who reported it (candidate, system, reviewer) startPosition / endPosition — timestamp positions within the interview reportedBy — user ID of reporter InterviewScores provides the final aggregate score per interview. Part V - Work Trials and Onboarding ¶ Work Trial Contracts ¶ WorkTrial_Audit captures the structured trial engagement contract: Field Description payableAmount Amount payable to contractor (cents) billableAmount Amount charged to company (cents) ciiaaDirect Confidentiality agreement (direct) ciiaaPassthrough Confidentiality agreement (passthrough) tow Terms of work offerLetter S3 key or base64 of signed offer letter signature Digital signature string startDate / endDate Trial period projectId Linked project billingAccountId Billing target The presence of offerLetter and signature fields indicates that signed legal documents are stored directly in the production database. WorkTrialConfig defines reusable work trial templates per company: emailTemplateSubject / emailTemplateBody — invitation email content emailTemplateSubjectExtension / emailTemplateBodyExtension — offer extension emails interviewIds , formIds — prerequisite steps before trial activation isUnified — whether trial is shared across listings Onboarding Pipeline ¶ OnboardingState defines the onboarding funnel steps: Field Example shortName interview_completed name Interview Completed threshold 1 order 0 OnboardingDocument stores the per-project onboarding materials (links, instructions, or document content) shown to newly hired contractors. TierProgress tracks contractor progress through Mercor's internal tier/certification system — mapping contractors to planId , tierId , status , and completedAt . PlanAssignments assigns contractors to specific plans with defined startDate , endDate , userHours allocation, and tasksCompleted tracking. Part VI - Projects and AI Task Management ¶ Project Structure ¶ Projects_Audit reveals the full project configuration: Field Description companyId Client company name Internal project name screenshotEnabled Whether Insightful monitoring is active userGroupEmail Google Group for project members projectType Project category annotationPlatform e.g., Scale AI, Label Studio annotationPlatformIDs External platform project identifiers ssotLink Single source of truth document URL taskMetricsDatastore Where task data is stored status active / archived notes Internal notes offerExtendedText Custom text in offer letters for this project ProjectIAM / ProjectIAM_Audit defines role-based access: each record maps a userId to a roleId within a projectId , with status and assignedBy for audit purposes. ProjectIntegrations is particularly revealing — it links each project to: oktaGroupId / oktaOwnerGroupId / oktaEPMGroupId — Okta SSO groups googleGroupId — Google Workspace group slackChannelId / workspaceNotificationChannel — Slack notification channels projectShortId — human-readable project identifier This table effectively maps every production project to its Slack workspace and Okta group, providing a complete picture of Mercor's organizational structure. AI Task System ¶ TaskDefinitions / TaskDefinitions_Audit define the structure of AI training tasks: Field Description rubric JSON grading rubric for this task type autograder Autograding configuration (model, prompts) task_schema JSON Schema defining the task response format metadata Additional task configuration TaskAudits records individual task submissions for review: Field Description taskDefinitionId Which task definition was used recordId The submitted task record s3KeyPrefix S3 location of submission artifacts authorId Contractor who submitted auditorId Reviewer assigned status pending / approved / rejected outcome Final grading outcome autoOutcome Automated grading result dispute Dispute information if challenged disputedBy Who filed the dispute TaskAssignments maps tasks to specific jobs and users, with appliedBy tracking who made the assignment. DeliverableBatches groups deliverables for invoicing: uid , name — batch identifier invoiceLineItemId — linked invoice line taskCount , status metadata — additional batch configuration ProjectCustomColumns adds arbitrary metadata fields to projects, with sqlQuery indicating some columns are dynamically computed from database queries. ProjectCustomColumnValueHistory tracks changes to these values over time. ProjectArchetypes stores character/role descriptions for specific project types — suggesting Mercor operates AI roleplay or persona-based annotation tasks ( archetypeText , elements ). ProductivityProjectRules defines per-project productivity monitoring rules ( rules JSON, is_active , versioned). Part VII - Time Tracking and Productivity Surveillance ¶ The Insightful Integration ¶ This is the most invasive component of the exposed data. Mercor uses Insightful (formerly Workpuls) — a workforce monitoring agent installed on contractors' computers — to capture screenshots and activity data. InsightfulScreenshots — Every screenshot record contains: Field Example (obfuscated) storageUrl https://mercor-insightful-screenshots-production.s3.amazonaws.com/screenshots/[id]/[timestamp]_[uuid].png storageKey screenshots/wmcw2pdyvenmluy/1767129970810_3b62edd1-... screenshotTimestamp 1767129970810 (Unix ms) ip 71.194.*.* gateways ["1C:93:7C:64:**:**"] (MAC address) os win32 osVersion 10.0.19045 agentVersion 7.9.3 computer desktop-ue2kgro hwid 8f9f16f0-1fb7-47e4-a2a1-209838aa5c5e appName Google Chrome appFileName chrome.exe appFilePath C:\Program Files\Google\Chrome\Application\chrome.exe windowTitle Alabaster Studio - Google Chrome browserUrl alabaster-studio.com/project/abacus/conversation/[uuid] browserSite alabaster-studio.com isBlurred 0 externalProductivityScore 1 Every screenshot includes the contractor's IP address, MAC address (gateway), hardware fingerprint, operating system, the exact application open, the window title, and the URL being visited — all timestamped to the millisecond. The storageUrl field contains direct S3 URLs to screenshot image files. The S3 bucket mercor-insightful-screenshots-production is referenced explicitly. The hwid (hardware ID) field provides a persistent device fingerprint that can re-identify a contractor even if they change their email or create a new account. Timelog ¶ Timelog / Timelog_Audit records every work session: Field Description externalId Insightful shift/session ID externalProjectId Insightful project ID employeeId Insightful employee identifier duration Session duration (ms) timeStart / timeEnd Session timestamps timezone Contractor's timezone taskId / taskName Task being worked on lineItemUid Linked payment line item adjustmentReason If hours were manually adjusted userId Mercor user ID isCompleted Whether session was completed normally linkFailReason If Insightful–Mercor link failed Deductions ¶ Deductions records time deducted from pay: Field Description durationToSubtractMs Milliseconds deducted appName Application that triggered the deduction reasonForDeduction Why the time was removed payoutCycleID Which pay cycle was affected approvedBy / approvedAt Approval chain appliedBy / appliedAt Application record This reveals that Mercor can and does subtract pay from contractors based on monitored activity, with an approval workflow for doing so. Part VIII - Payments and Financial Infrastructure ¶ Contractor Payment Methods ¶ UserPaymentMethods / UserPaymentMethods_Audit stores linked payment accounts: Field Example (obfuscated) provider stripe providerMethodId acct_1R0V**** (Stripe Express account) methodType express_account status onboarded countryCode USA US contractors use Stripe Express accounts. International contractors use Wise (evidenced by WiseDisbursements ). The metadata field includes context like "context": "backfill" — indicating historical payment method imports. MercorUserFinancials stores additional financial account details: paymentProvider — stripe / wise providerIdentifier — account identifier accountDetails — JSON with bank routing and account details lastFetchedOn — when the financial data was last synced Payment Line Items ¶ PaymentLineItems is the core payment ledger: Field Description cycleStartTs / cycleEndTs Pay period boundaries totalPayableAmount Amount owed to contractor (cents) totalBillableAmount Amount charged to company (cents) status pending / paid / failed jobUid Linked job contract timelogUid Linked timelog entry bonusUid Linked bonus if applicable referralUid Linked referral payment dispatchFailureReason Why a payment failed moneyOutId Linked outbound transfer PayoutCycles defines payment periods: cycleStartTs / cycleEndTs — date boundaries status — open / processing / completed configId / configVersion — which payout configuration governs this cycle PayoutConfigs stores payout rules: type — payment cadence (daily, weekly, etc.) configuration — JSON with limits, caps, and routing rules MoneyOut_Audit records every outbound payment: externalAccountId — contractor's Stripe or Wise account externalTransferId — transfer ID at the payment provider totalAmount — disbursed amount paymentMethod — stripe / wise status — pending / paid / failed failureReason — structured failure code WiseDisbursements records international transfers: wiseTransferId , wiseQuoteId — Wise API identifiers amount , currency sequenceNumber — ordering within a batch status , failureReason Company Billing ¶ BillingAccounts manages company-side billing: Multiple billing accounts per company Linked to Stripe customer IDs BillingConfigs defines billing rules: rules JSON — markup percentages, caps, billing model isLatestVersion — versioned configuration BillingRateCards defines per-contract rate structures: formulaType — e.g., markup_percentage , flat_rate rateRows — tiered rate table InvoiceLineItems records invoice line entries: rawAmount / adjustedAmount — pre- and post-adjustment amounts taskCount — number of tasks in this line sowId — Statement of Work identifier RevenueAdjustments records revenue corrections: amountCentsUsd , category , reason revenueRecognitionDate — accounting date formula , labels , aggregationFields attachments , invoices — supporting documentation Referral System ¶ Referrals / Referrals_Audit tracks the contractor referral program: Field Description referredUserId / referringUserId The parties totalEarned / totalEarningsPotential Referral payment amounts state Current state paidAt When the referral bonus was paid disputeStatus If disputed isGuaranteedReferral Whether guaranteed payment applies referral_cap Maximum referral earnings isPaymentBlocked Payment hold ReferralEligibility manages the conditions under which referral payments vest — including onboardingStateId requirements and criteriaId checks. GuaranteedReferralQuota manages quota-based guaranteed referral programs: quotaId , referringUserId , offPlatformUserId shortenedLink — the referral tracking URL weekStart , status Part IX - Communications and Outreach ¶ Internal Messaging ¶ Comms is the platform messaging table: Field Description commId Message identifier groupId Conversation thread senderId / receiverId Parties content Message body type Message type (system, human, etc.) triggerRef What triggered this message listingReferenceUID Associated listing CommsSent records delivery tracking — when messages were sent, to whom, via what channel. EmailTemplates stores company-specific email templates: subject , content — template body isGlobal — available to all companies isPersonal — creator-private tags — categorization External Outreach ¶ LinkedinWarmIntros manages LinkedIn outreach campaigns: Field Example (obfuscated) linkedinUrl https://www.linkedin.com/in/[username] email s**@homeinheritance.com referringUserId Internal user who made the intro commEvent WARM_INTRO/OUTREACH status sent OffPlatformCampaigns and OffPlatformCampaignSteps manage multi-step email/LinkedIn outreach sequences: campaignType — category stepNumber — sequence position subject / messageTemplate — email content with template variables scheduledAt — send time outreachedCandidateIds / failedCandidateIds — delivery tracking AircallComms records phone call logs from Mercor's Aircall integration — the VoIP platform used for recruiter outbound calls, with call metadata and outcomes. FirstTimeInvites tracks first-contact outreach to candidates: commEvent — invitation type contentType / subject — message details listingIdCount — how many listings the invite covers refListingUid — the originating listing Notification Infrastructure ¶ AutomationTemplates defines automated workflow triggers: handler — which service handles this automation sourceType / sourceSql — SQL query that triggers the automation templateBody — notification content template cron — scheduled execution autoApprove — whether human approval is required triggerConfig , config — detailed trigger configuration ProjectAutomations links automation templates to specific projects. Reverse Engineering - Architecture and Infrastructure ¶ The database schema, table names, column conventions, and embedded metadata allow us to reverse-engineer Mercor's complete technical architecture — from microservice names to third-party integrations — purely from the contents of this dump. Part X - Infrastructure and DevOps ¶ Deployment Pipeline ¶ IacDeploymentRuns is one of the most operationally sensitive tables: Field Example runType plan / apply environment staging / production status success / failed commitSha 784cfd495ddfa3b67187433cb7cb66f2d27ad458 branch dacq/backend-v2 actor k*********77 (GitHub username) githubRunId 23520976410 githubRunUrl https://github.com/Mercor-io/mercor-monorepo/actions/runs/23520976410 prNumber 26645 stacksAffected ["iac/aws/envs/staging"] resourcesAdded 25 resourcesChanged 2 resourcesDestroyed 6 summary Full Terraform plan output (including deprecation warnings) durationSeconds 134 This table exposes: The full URL path to the private GitHub monorepo Individual engineer GitHub usernames (actors) Branch naming conventions Terraform variable names and deprecated configurations Number of AWS resources created/modified/destroyed per deployment Complete Terraform plan output in summary Named Terraform service stacks include: talent-success-coil , referrals-coil , iac/aws/envs/staging . ProductionDeployment records ECS production releases: releaseTag — semantic version tag buildHash — Docker image hash deployedAt — deployment timestamp deploymentIds — ECS deployment identifiers taskDefinitionArns — ECS task definition ARNs (include AWS account ID) PreprodDeployment records pre-production (staging) releases: commitSha — exact commit deployed loadTestPassed — whether load testing passed releaseOwner — engineer responsible for the release ProductionVersion maintains a single-row current version pointer: lastVersion , lastReleaseTag , lastBuildHash , updatedAt RollbackExecution records emergency rollback events: services — which microservices were rolled back Details of 4-second rollback capability observed in the data Database Schema Management ¶ DATABASECHANGELOG and DATABASECHANGELOGLOCK are Liquibase tables that record every schema migration: ID , AUTHOR , FILENAME , DATEEXECUTED , MD5SUM , DESCRIPTION , COMMENTS , EXECTYPE , LIQUIBASE These tables reveal the full history of schema changes, including the names of engineers who authored migrations, the migration scripts' filenames (revealing internal project structure), and the exact timestamp each change was applied to production. Agent Sandboxes ¶ AgentSandboxes records AI coding agent sessions: Field Description agentType Type of AI agent status active / stopped / expired backendType Compute backend host Sandbox hostname stopReason Why session ended transcriptRawUrl S3 URL of raw conversation transcript transcriptConsolidatedUrl S3 URL of consolidated transcript acpSessionId Agent control protocol session ID sandboxToken Authentication token for sandbox claimedAt / expiresAt Session lifecycle timestamps The sandboxToken field suggests that expired sandbox tokens are persisted in the database — a potential credential exposure if these tokens have long validity windows. Part XI - Analytics and ML Layer ¶ School and Firm Rankings ¶ DbtFirmSchoolRank contains Mercor's proprietary employer prestige scores: Field Example firmId 000013c1653de847e38d755ca1c310a5 firmName 75th ranger regiment, u.s. army academicField overall nProfiles 2 avgSchoolRank 90.00 firmSchoolRank 81723 firmSchoolRankPercentile 0.528839 This table represents a proprietary ranking of ~154,000 firms by the average educational prestige of their employees — effectively a derived signal used to score resumes. It is computed from the full contractor profile database using an empirical Bayesian model ( ebPriorStrength , ebAvgSchoolRank ). DbtSchoolRankings ranks individual schools within academic fields: schoolName , academicField , schoolScore , schoolRank Resume Evaluation ¶ UserResumeEvaluation stores ML-generated resume scores: Field Description workExperienceScore Quality of work experience yearsOfWorkExperience Parsed years of experience graduationYear Estimated graduation year mScore Composite score inferredRole Predicted job function educationScore Academic credential score awardScore Competitive award weighting rateAcademicCompetitions Participation in academic competitions rateCompetitiveProgramming Competitive programming score rateHackathonPerformance Hackathon achievement score technicalSkills JSON list of detected skills highestDegree Parsed degree level searchFlag , imageFlag , transcriptFlag Data quality flags Behavioral Analytics ¶ PosthogAnalytics links PostHog behavioral sessions to user identity: userEmail — email address (PII) company — company context startTimeUtc / endTimeUtc — session boundaries activetime / inactivetime — engagement metrics startUrl — entry point URL This directly links PostHog analytics sessions (which include click-level behavior) to user identity — a significant privacy concern as PostHog sessions are typically anonymized. SearchAnalytics records search quality metrics: avg_relevance_score , avg_prestige_score p99_latency_ms — 99th percentile search latency position_weighted_relevance_score — ranking quality metric ForecastMetrics stores ML forecast outputs: entity , id , dt , snapshot_dt modelVersion , predictedValue Used for capacity planning, fill rate forecasting, and contractor supply predictions. ML Experiments ¶ MLExperimentsJobPerformanceReviews reveals the experimental ML pipeline: Column Description Date of review Review date Account Client company Project Project name Reviewer Reviewer name (Mercor staff) Work type Category of work Review type Type of performance review Name / Email Contractor identity Quality of Work Score Engagement Score Offboarding Reason Why contractor was removed Justification for rating Free-text explanation This table contains raw performance review data used to train or evaluate ML models for automated contractor performance assessment — with staff names, contractor names, and qualitative judgments all stored in plaintext. Part XII - Reference Data Layer ¶ Skills and Certifications ¶ Skills is the platform's skills taxonomy: skillId , name , description , type , parent — hierarchical skill tree CertificationPolicy — linked certification requirement CertificationPolicies_Audit defines the rules for earning certifications: rules — JSON eligibility criteria isRevokable , requiresApproval icon , iconColor , showBadge , displayText — display configuration Certifications_Audit records individual earned certifications: evidence — JSON array of qualifying events (e.g., {"id": "proj_...", "score": 88.84, "sourceType": "project_hours_worked"} ) status — AUTO_AWARDED / MANUALLY_AWARDED / REVOKED isCertified — current state SkillCertifications_Audit and SkillCertificationsEvidence_Audit track per-skill certification with scores and source evidence. ContractorEndorsements stores peer endorsements: endorsingUserId / endorsedUserId contents — endorsement text tags — skills endorsed sentiment — positive/negative source — where the endorsement originated Company Data ¶ Company stores client company records: name , description , website , logo billingModel — pricing structure billingStartDay / billingEndDay — billing cycle configuration brandVisible — whether company name is shown to candidates universe — internal company segmentation externalName — display name if different from legal name IAM / IAM_Audit manages company-level role assignments: roleId — e.g., ghost (internal Mercor staff), admin , member companyId , userId_v4 , status A sample IAM record shows a user with roleId: ghost being REMOVED from a company — revealing Mercor's internal staff operated within client company contexts under a ghost role identity. URL Management ¶ ShortenedUrls manages the platform's link shortening system: Used for referral tracking, campaign links, and onboarding flows UrlClicks records every click on shortened URLs: urlId , clickedAt , ipHash , userId , country Even with ipHash (rather than raw IP), the combination of userId , country , and timestamp enables click attribution across the contractor population. Catfish Audit Log ¶ CatfishAuditLog is a security/compliance tool: Field Description slackUserId / slackUserName Mercor staff member targetEmail Person being looked up platform Where the lookup happened intent Declared reason for the lookup status Success/failure This table records every time an internal Mercor employee looks up a user's information through an internal tool called "Catfish" — indicating awareness that internal user lookup is an auditable, privacy-sensitive operation. Ironically, this audit log itself now sits in the exposed dataset. Exposed Surface Area Summary ¶ Domain Tables Sensitivity Key Exposure User & Identity ~10 Critical PII (name, email, phone, location) for all contractors Identity Verification & Fraud ~12 Critical Government ID outcomes, facial comparison tokens, fraud verdicts Hiring Pipeline ~10 High Application status, rejection reasons, recruiter notes Interviews & Assessments ~15 High Interview responses, scores, cheating flags, rubrics Work Trials & Onboarding ~6 High Signed legal documents, offer letters, digital signatures Projects & AI Tasks ~15 Medium-High Client company projects, task definitions, AI training data Time Tracking ~4 Critical Per-minute screenshots, browser URLs, MAC addresses, hardware fingerprints Payments & Finance ~20 Critical Stripe account IDs, bank details, exact payment amounts, payout records Communications ~10 Medium Message content, outreach campaigns, phone call logs Infrastructure & DevOps ~10 High Commit SHAs, GitHub URLs, ECS ARNs, Terraform configs, sandbox tokens Analytics & ML ~10 Medium Resume scores, school rankings, PostHog identity links Reference Data ~15 Medium Skills taxonomy, certifications, endorsements, company configurations Technical Architecture Reverse-Engineered ¶ The following architecture is entirely reconstructed from database table names, column values, JSON blobs, and embedded metadata. No source code or documentation was available — everything below was inferred from the data alone. Backend Services ¶ Based on the database content, Mercor's backend comprises at least 13 microservices: Service Inferred Function mercor_api Primary API backend mercor_api_nginx API gateway / reverse proxy mercor_go Go-language service (likely performance-critical paths) coil Contractor-facing service (multiple instances by function) site_fe Public website frontend team_fe Company/team portal frontend work_fe Work/task frontend celery Async task queue workflow Workflow orchestration db_trigger_consumer Database event consumer steve Internal tool/admin service woz Fraud/ML pipeline service payments_temporal_worker Temporal.io worker for payments Frontend Portals ¶ Public site — site_fe , routes handled by Next.js (inferred from URL patterns) Company portal — team_fe — for clients to manage listings and review candidates Work portal — work_fe — for contractors to find and complete tasks Internal admin — Godmode interface used by Mercor staff Data Infrastructure ¶ Primary DB : Aurora MySQL (AWS) Analytics warehouse : Snowflake (via Fivetran sync, evidenced by dbt models) Schema migrations : Liquibase Object storage : S3 (screenshots, offer letters, transcripts) Monitoring : Insightful agent on contractor machines Auth : Firebase + Okta (SSO) Analytics : PostHog Feature flags / A-B : Inferred from configVersion patterns Third-Party Integration ¶ Provider Purpose Persona Identity verification (KYC) Stripe US contractor payments (Express accounts) Wise International contractor payments Insightful Workforce monitoring / screenshot capture Okta SSO for company and internal access Aircall Recruiter phone calls PostHog Product analytics Vertex AI / Gemini Fraud LLM reasoning OpenAI (GPT-4.1 / GPT-5) AI interview conductor and task autograder Checkr / Certn Background checks HaveIBeenPwned Email breach checking Customer.io Transactional email GitHub Actions CI/CD pipeline Terraform / Terragrunt Infrastructure as code Temporal.io Payments workflow orchestration Liquibase Database schema versioning Grounds for Legal Action ¶ The evidence documented throughout this report supports multiple independent legal claims by distinct plaintiff classes. This section consolidates the factual basis for each claim, cross-referencing the specific database tables, column names, and sample values that constitute the evidentiary foundation. I. Client Company Claims - Loss of Proprietary AI Training Data and Trade Secrets ¶ This is the most consequential category of legal exposure. Mercor's client companies — Apple, Amazon, OpenAI, Anthropic, Meta, Google, and others — entrusted Mercor with their most valuable competitive assets: the data, methodologies, and evaluation frameworks that define how their AI models are built. All of it is now in criminal hands. A. Trade Secret Misappropriation Under the federal Defend Trade Secrets Act (DTSA) and state Uniform Trade Secrets Acts, a trade secret is information that derives economic value from not being generally known and is subject to reasonable efforts to maintain its secrecy. The breach exposes client trade secrets across three categories: 1. AI Training Data as Trade Secrets. The SFT data, RLHF preference rankings, and Chain-of-Thought traces produced by Mercor's contractors for each client constitute trade secrets. Each dataset represents millions of dollars of investment and years of iterative refinement. The TASKS , TASK_VERSIONS , and PHASE_1_TASKS tables across 84 Airtable workspaces contain the actual work product — prompts, model responses, and human evaluations — that each client paid to produce. Their value derives entirely from secrecy: once a competitor has access to another lab's RLHF preference data, they can train equivalent alignment without the cost. 2. Evaluation Methodology as Trade Secrets. How an AI lab evaluates its models — what rubrics it uses, what scoring thresholds it applies, how it structures domain-specific benchmarks — is core intellectual property. The CRITERIA , RUBRIC_VERSIONS , QA_SPECS , and LLM_CALL_CONFIGURATION tables across 60+ workspaces expose this methodology in full. Amazon's Chain-of-Thought evaluation framework, Apple's endpoint testing rubrics, and the cross-model preference evaluation criteria are all now available to any buyer. This is not just data — it is the recipe for how each lab measures AI progress. 3. Pre-Release Model Capabilities as Trade Secrets. The APPLE_ENDPOINT_SANDBOX workspace contains actual outputs from Apple's unreleased Foundation Models ( afm-text-083 , afm-model-086 ). These responses reveal the model's capabilities, safety alignment, and failure modes before public launch. Under trade secret law, the unauthorized disclosure of pre-release product capabilities is a textbook misappropriation. Key legal point: Trade secret protection requires "reasonable efforts to maintain secrecy." Mercor's storage of this data — in plaintext, behind a flat network with no segmentation, accessible via a single VPN hop — likely fails this standard. Clients may argue that they maintained secrecy on their end but that Mercor's negligent security destroyed the trade secret status of the data. This creates a damages claim for the full economic value of the lost trade secrets. B. Breach of Confidentiality and NDA Violations The database confirms confidentiality agreements governed the relationship. The Jobs table contains ciiaa_direct , ciiaaPassthrough , confidentiality , and tow (terms of work) fields. The WorkTrial_Audit table contains signed CIIAs and offer letters. The exposure of: Apple : Foundation Model outputs ( afm-text-083 , afm-model-086 ), endpoint sandbox testing data, translation evaluation, orchestrator configurations Amazon : Complete LLM Chain-of-Thought evaluation framework with full reasoning traces, preference judgments, domain taxonomy (math, STEM), and named Mercor staff assignments OpenAI : Feather platform campaign UUIDs, Apertus - Elephant project data, contractor performance reviews naming OpenAI as the account Meta : Multimedia annotation template command center ( AAIE___META_MULTIMEDIA_TEMPLATE ), project configurations Anthropic : Claude 3.5 Sonnet evaluation data compared against GPT-4, preference reasoning, agent sandbox configurations running Claude constitutes a breach of these confidentiality obligations. Each client has a separate breach of contract claim with damages measured by the economic harm caused by the disclosure. C. Loss of Competitive Advantage The breach doesn't just expose data — it destroys competitive moats. If a Chinese AI lab purchases the stolen data, they acquire: The exact prompts and rubrics that OpenAI uses to fine-tune its models The evaluation methodology that Amazon uses to measure Chain-of-Thought reasoning quality Apple's pre-release model outputs revealing capabilities and weaknesses The preference data that teaches Anthropic's Claude how to respond to contentious queries Each client's AI training pipeline is now potentially replicable by any competitor with access to the stolen Airtable workspaces. The damages extend beyond the cost of producing the data — they include the competitive harm of having that data available to rivals. D. Secondary Breach via Desktop Screenshots The InsightfulScreenshots table creates a mechanism for visual intelligence extraction from client systems. Screenshots captured during monitored work sessions show whatever was on the contractor's screen — client internal dashboards, Slack conversations, code repositories, proprietary tools, unreleased product interfaces. Mercor stored these screenshots on S3 with metadata linking each image to the specific projectId . An attacker can systematically extract visual intelligence about every client's internal systems by filtering screenshots by project. This constitutes a secondary breach of each client's confidential systems, for which Mercor bears direct liability. E. APEX Benchmark Contamination Mercor's proprietary APEX benchmark suite — covering 15+ domains from legal to medicine to mechanical engineering — is now compromised. All tasks, criteria, scoring rubrics, and evaluation data are exposed. Any client that relied on APEX benchmark results for vendor selection, model comparison, or procurement decisions now faces the risk that those results are unreliable. Models trained on the leaked APEX data will appear to perform well without genuinely possessing the evaluated capabilities. Clients may claim damages for decisions made in reliance on benchmarks that are now contaminated. II. Contractor Class Claims ¶ A. Financial Data Exposure and Identity Theft Risk The MercorUserFinancials table stores the complete Stripe Connect API response as plaintext JSON — including bank name, routing number, last four digits, account holder name, email, and country. This is sufficient for bank account fraud. Every contractor whose financial data is in this table faces ongoing risk of unauthorized transactions, account takeover, and identity theft. The UserPaymentMethods table adds Stripe Express account IDs and Wise transfer identifiers. The exposure of this data — unencrypted, untokenized, in a database accessible via a single VPN hop — constitutes negligence per se under multiple state data breach statutes. B. Surveillance Overreach and Privacy Violations The Insightful monitoring system captured far more than work activity: Full desktop screenshots every few minutes — not just the work application, but everything on screen Browser URLs for all tabs, including personal browsing IP addresses and MAC addresses from personal home networks Hardware fingerprints of personal devices Contractors used personal computers for Mercor work (the data shows personal Chrome installations, personal hostnames like desktop-ue2kgro ). The monitoring system captured personal activity on personal devices — personal emails, banking sessions, medical information, or other private content visible in background windows. All of this is now in criminal hands. Under ECPA and state wiretap laws, the capture of third-party communications visible in screenshots (Slack messages, emails, video calls) may constitute unlawful interception. C. Wrongful Termination via Automated Fraud Decisions The database reveals that automated fraud decisions directly determined whether contractors could earn a living: FraudStates.currentDecision = REJECT → contractor blocked from the platform FraudStates.currentReasoning contains LLM-generated explanations that were almost certainly never disclosed to affected contractors ProductionFraudState.status → final production fraud verdict with no apparent appeal mechanism Under FCRA, if Mercor used these automated fraud scores or background check results ( BackgroundCheck.status ) to deny, suspend, or terminate contractor engagements without providing required adverse action notices, each instance is a separate violation. Under GDPR Article 22, EU/UK contractors have the right not to be subject to decisions based solely on automated processing. D. Wage-Related Claims The Deductions table records pay subtractions based on monitored activity — exact milliseconds deducted, which application triggered the deduction, and who approved it. If deductions were applied using data from the now-compromised monitoring system, or if the breach reveals inconsistent application, contractors have wage theft claims in addition to privacy claims. III. Statutory Violations ¶ A. CCPA/CPRA — Private right of action for data breaches resulting from failure to maintain reasonable security (Cal. Civ. Code § 1798.150). Plaintext bank routing numbers, unencrypted PII, and excessive data collection constitute failure to implement reasonable security. Statutory damages: $100–$750 per consumer per incident. B. GDPR — EU/UK contractors confirmed in the data (sample: United Kingdom, Harrow ). Violations include data minimization failure (Article 5(1)(c)), integrity/confidentiality failure (Article 5(1)(f)), automated decision-making without safeguards (Article 22), and breach notification delays (Article 33). Fines up to €20 million or 4% of annual global turnover. C. Illinois BIPA — Persona's liveness detection requires a scan of face geometry, explicitly listed as a biometric identifier (740 ILCS 14/10). The IDVerificationChecks table confirms facial geometry scans were captured ( livenessStatus ), facial comparison performed ( interview-face-comparison ), and thumbnail images stored ( thumbnail_key ). Statutory damages: $1,000–$5,000 per violation, no harm requirement. (Note: MAC addresses and hardware fingerprints are not biometric identifiers under BIPA.) D. FCRA — Background check results and automated fraud scores used in employment decisions without required adverse action notices. Per-violation damages. E. ECPA / State Wiretap Laws — Desktop screenshots capturing third-party communications visible on screen. Per-interception damages. F. PIPEDA — Canadian contractors confirmed (sample: country: CA , BANK OF M******* ). Breach notification to Privacy Commissioner and affected individuals required. IV. Negligence - Security Failures Evidenced in the Data ¶ The database structure itself constitutes evidence of systemic negligence: Plaintext financial data : Complete Stripe API responses with bank names, routing numbers, and account holder names stored as unencrypted JSON No field-level encryption : Names, emails, phones, DOBs, and addresses readable as-is in the export Excessive data collection : Full Stripe API responses when only an account ID was needed; desktop screenshots capturing vastly more than needed to verify work hours; HaveIBeenPwned results stored as fraud signals; Persona KYC session tokens persisted indefinitely Infrastructure failures : ngrok dev tunnels with developer IPv6 in production config; AWS account ID embedded in S3 bucket names; sandbox tokens persisted after session expiry; GitHub Actions URLs exposing the private monorepo V. Third-Party Claims ¶ Individuals who never created Mercor accounts have their data exposed: UserReferences : Names, emails, employers, and relationships of professional references LinkedinWarmIntros : LinkedIn profile URLs and email addresses of people contacted for outreach CandidateVouches : Relationship details provided by vouchers These individuals never consented to data collection and likely never received a privacy notice. Under GDPR Article 14, Mercor was required to notify them within one month. The breach exposes them to targeted social engineering using their real relationship data. Summary - Combined Legal Exposure ¶ Claim Plaintiff Class Key Evidence Trade secret misappropriation Apple, Amazon, OpenAI, Anthropic, Meta, Google Pre-release model outputs, evaluation methodologies, RLHF data, rubrics, CoT traces Breach of confidentiality / NDA All client companies Signed CIIAs in database, client-named Airtable workspaces with proprietary data Competitive harm All client companies Training data, evaluation frameworks, and benchmark data now available to rivals APEX benchmark contamination Companies relying on APEX results Complete benchmark tasks, criteria, and scores exposed Financial data negligence 30,000+ contractors Plaintext bank routing numbers, Stripe account details Surveillance overreach 30,000+ contractors Desktop screenshots of personal devices, personal browsing, background windows Automated adverse actions Contractors denied/terminated Fraud scores, LLM-generated reasoning, no disclosure or appeal CCPA violations 30,000+ contractors Failure to maintain reasonable security GDPR violations EU/UK contractors Data minimization, automated decisions, notification delays BIPA violations Contractors who completed Persona KYC Facial geometry scans, liveness detection Third-party privacy References, LinkedIn contacts, vouchers Data collected without consent, now in criminal hands The client claims are likely the largest in dollar terms — the economic value of the lost trade secrets (training data, evaluation methodologies, pre-release model outputs) runs into the billions. The contractor claims are the broadest in scope — affecting every individual who ever used the platform. Together, the total legal exposure is conservatively in the hundreds of millions of dollars before punitive damages. Conclusion - What Happens Now ¶ The breach is not a past event. It is an ongoing situation with no clear resolution. The Data Is Still in Circulation ¶ Mercor allegedly paid the attackers to have the data removed from the Lapsus$ leak site — a fact confirmed to us directly by Lapsus$ themselves. The data was taken down briefly. It reappeared. The group is now actively selling the full dataset to private bidders while continuing to distribute samples. The two files analyzed in this report were obtained after the ransom was paid. This is the predictable outcome of paying ransom for digital assets — there is no mechanism to verify deletion, no way to revoke copies already distributed, and every economic incentive for the attackers to continue monetizing the data through private sales, selective leaks, and derivative attacks. Mercor's ransom payment bought nothing except proof that they considered the data worth paying to suppress. The attackers now possess: The complete identity of every Mercor contractor — name, email, phone, date of birth, home address, bank routing number, government ID verification status, and a photographic record of their desktop activity The complete client map — which companies use Mercor, what projects they run, which annotation platforms they use, and what their internal Slack workspaces and Okta SSO groups are called Apple's pre-release Foundation Model outputs , Amazon's Chain-of-Thought evaluation methodology, OpenAI's Feather platform campaign UUIDs, and Anthropic's model comparison data The source code for Mercor's entire platform — including its fraud detection algorithms, MercorScore ranking system, and payment infrastructure — providing a complete blueprint for exploitation Tailscale VPN credentials and network topology — a map of Mercor's internal infrastructure that could enable further unauthorized access if credentials have not been fully rotated 939GB of code repositories that likely contain hardcoded API keys, database credentials, and third-party service tokens scattered across commit history This is not a dataset that loses value over time. The PII is permanent. The bank routing numbers don't expire. The government ID verification records don't reset. The signed legal documents don't un-sign. And the AI training data — the RLHF annotations, preference rankings, and rubric evaluations produced for frontier AI labs — retains its full value to any competitor seeking to accelerate their own model development. The Ongoing Threat ¶ With this data, the attackers (or any subsequent buyer) can: Launch targeted phishing campaigns against every Mercor contractor, using their real name, employer, project assignment, and pay rate to craft highly convincing social engineering attacks Commit financial fraud using the bank names, routing numbers, and account holder names stored in MercorUserFinancials Blackmail contractors whose desktop screenshots may reveal confidential client information, personal browsing activity, or employment at companies their current employer doesn't know about Attack Mercor's clients using the Slack workspace URLs, Okta SSO configurations, and annotation platform campaign IDs as entry points for further social engineering or credential stuffing Sell the AI training data — the prompts, responses, evaluations, and preference rankings — to competitors or foreign actors, undermining billions of dollars of investment by OpenAI, Anthropic, Apple, Amazon, Meta, and Google DeepMind Exploit the source code to identify vulnerabilities in Mercor's (and potentially its clients') systems that have not yet been patched Impersonate Mercor staff using the internal employee names, Slack IDs, and GitHub usernames found throughout the database to conduct supply-chain attacks against Mercor's clients and partners Each of these vectors becomes more dangerous the longer the data remains in circulation — and there is no indication it will stop circulating. The Case for Radical Transparency ¶ There is an uncomfortable truth that Mercor, its clients, and the affected contractors must confront: the data is out. It cannot be put back. The current trajectory — where the breach is acknowledged in vague corporate language, specific questions are deflected, and affected individuals receive minimal information about what was exposed — serves no one except the attackers. It creates an information asymmetry where the adversary has complete knowledge of what was taken, while the victims operate in the dark. Every contractor whose bank routing number is in MercorUserFinancials deserves to know — specifically — that their bank name, routing number, and account holder name were stored in plaintext JSON and are now in the hands of criminal actors. Every contractor whose desktop screenshots are in the mercor-insightful-screenshots-production S3 bucket deserves to know that their IP address, MAC address, browser history, and application usage during work sessions are exposed. Every client whose annotation platform URLs, Slack workspaces, and proprietary model outputs appear in the Airtable exports deserves to understand the exact scope of their secondary exposure. The alternative to transparency is prolonged paranoia. If Mercor does not disclose the specific contents of the breach, every contractor must assume the worst about what was taken. Every client must assume their internal systems were visible on a contractor's screen. Every reference, every LinkedIn contact, every vouching party must assume their personal information was collected without their knowledge and is now compromised. Perhaps the most constructive path forward — however counterintuitive — is full, detailed, public disclosure of exactly what the breach contained. Not the raw data itself, but a complete accounting: which tables, which fields, which categories of PII, which clients, which time periods. The world can adjust to a known breach. It cannot adjust to an unknown one. Sunlight remains the best disinfectant, and in the aftermath of a breach of this magnitude, the cost of silence far exceeds the cost of honesty. The contractors who built the AI training data that powers the world's most valuable models deserve at least that much. A Structural Critique - Youth Velocity and the Cost of Immaturity ¶ Mercor's three founders — Brendan Foody, Adarsh Hiremath, and Surya Midha — were 21 years old when they raised their Series A. They became the world's youngest self-made paper billionaires at 22 when their Series C valued the company at $10 billion. The average age of the Mercor team was reported at 22 years old . They are Thiel Fellows — college dropouts celebrated for building fast. They stored bank routing numbers in plaintext, ran a flat network where a single VPN hop reached everything, and let 4 terabytes walk out the door without anyone noticing. Perhaps Mercor is best understood as a phenomenon of hype and strong mimetic desire within the AI industry. Perhaps the AI labs got ahead of themselves too early. Perhaps researchers and vendor managers chose Mercor not because they evaluated the vendor thoroughly enough to handle critical workloads, but because OpenAI was already using it. The pattern is worth examining. OpenAI was one of Mercor's earliest major customers . The relationship began when Mercor's 20-year-old CEO cold-emailed OpenAI's head of human data operations, Shaun VanWeelden, and landed a contract to recruit Math Olympiad winners for model training. VanWeelden later left OpenAI to become Mercor's managing director . Two sitting OpenAI board members — Adam D'Angelo (Quora CEO) and Larry Summers (former U.S. Treasury Secretary) — invested in Mercor's earlier funding rounds. This is not without precedent. Much of the AI data infrastructure landscape has been shaped by proximity to OpenAI. Scale AI's Alexandr Wang was Sam Altman's roommate during the pandemic. Scale went through Y Combinator when Altman ran it. Altman and Wang later discussed an acquisition . With Mercor, the signal was unmistakable. OpenAI used them. OpenAI's board members invested in them. OpenAI's head of data operations joined them. Once that signal propagated, perhaps the other labs followed not because of independent evaluation, but because OpenAI had validated the choice for them. The $10 billion valuation, the press coverage, and the youngest-billionaires narrative reinforced what was already a foregone conclusion. The Girardian irony is that this breach — the scapegoating event — may produce the same mimetic cycle in reverse. The labs may collectively abandon Mercor, collectively discover the next shiny vendor, and collectively onboard without asking the hard questions about security and privacy. The sacrifice of the scapegoat restores order. The community moves on, having learned nothing structural — only that this particular vendor was the wrong one. Having reverse-engineered Mercor's complete operational architecture from its database schema — the annotation pipeline, the evaluation frameworks, the contractor management system, the payment infrastructure — it is clear that the underlying business is well-understood and replicable. For new entrepreneurs, the opportunity is straightforward: build the same platform, but treat security and privacy as foundational rather than an afterthought. The market for AI training data is not going away. The demand for a vendor that handles it responsibly has never been higher. Appendix A - Complete Table Inventory ¶ All 149+ tables organized by functional domain, with column lists and sample data where present. Domain 1 - User and Identity ¶ Table Key Columns Notes MercorUsers_New userId, email, name, phone, profilePic, createdAt, lastLogin, location, isWhiteListed, source, firebaseUID, authType, isAnonymous, insightfulId, stripeAccountId, customerId, isDeleted, phoneVerificationStatus, phoneVerifiedAt, phoneOptIn Primary contractor user table. Sample: e****[email protected] , T** O**** , +44795718**** , United Kingdom,Harrow MercorUsers_New_backup userId, email, name, phone, profilePic, createdAt, lastLogin, location, isWhiteListed, source, firebaseUID, authType, isAnonymous, insightfulId, stripeAccountId, customerId, isDeleted Historical backup snapshot of user table UserLocation userLocationId, userId, residenceCountry, residenceState, residenceCity, residenceZipCode, physicalCountry, physicalState, physicalCity, physicalZipCode, version, createdAt, updatedAt Tracks declared residence vs. physical location. Used in fraud detection. Sample: residenceCountry=USA, physicalCountry=USA UserLocation_Audit All UserLocation columns + auditAction, auditTimestamp Audit trail for location changes UserMetadata userMetadataId, userId, workAuthorizationStatus, birthday, physicalLocation, countryOfResidence, createdAt, updatedAt, maxHourCap, contractorMail, fraudStatus, oktaUserId, fraudStatusEnum, oktaAccountState, externalId, maxContracts, offPlatformEmail Extended user metadata including Okta SSO ID and fraud status UserState id, userId, resumeUploaded, interviewsCompletedCount, jobApplicationsCount, totalMillisWorked, createdAt, updatedAt Lifecycle counters — tracks user progression through platform UserAvailability_Audit availabilityId, version, userId, maxWeeklyHours, desiredWeeklyHours, expectedStartOffset, expectedStartOffsetUpdatedAt, earliestStartDateChoice, timezone, updatedAt, createdAt, auditAction, auditTimestamp Declared working hours and timezone preferences UserReferences referenceId, email, name, company, relationship, userId Professional references provided by contractors WorkAuthorization_Audit workAuthorizationId, userId, birthday, physicalCountry, workAuthorizationStatus, agreedToLocation, signature, attestedAt, source, version, createdAt, updatedAt, auditAction, auditTimestamp Work authorization attestations with digital signatures UserPlatformStatus id, userId, status, action, source, sourceDetail, isLatest, createdAt Platform access status (active, suspended, banned) LinkedinUsers id, name, url, email, company, position, lastUpdated LinkedIn profile cache used for warm intros and candidate sourcing MembershipSnapshots scopeType, scopeId, userId, createdAt Point-in-time snapshots of group/project memberships Domain 2 - Identity Verification and Background Checks ¶ Table Key Columns Notes IDVerificationChecks verificationCheckId, userId, candidateId, jobId, listingId, provider, source, sessionId, sessionToken, onboardingUrl, sessionStatus, verificationStatus, governmentIdStatus, livenessStatus, addressStatus, attemptNumber, maxAttempts, providerResponse, fraudDecision, flagReasons, manualReviewStatus, createdAt, updatedAt, completedAt Persona KYC session records. providerResponse contains full JSON API response including facial thumbnail keys. provider=persona BackgroundCheck contractorID, externalCandidateId, workLocation, package, invitationId, invitationCreatedAt, invitationCompletedAt, backgroundCheckId, reportId, status, createdAt, updatedAt, adverseMediaCheckStatus Criminal background check records (Checkr). Status: clear / consider BackgroundCheck_New Richer version of BackgroundCheck with additional fields Updated background check schema BackgroundCheckDetails Detailed per-check results Granular check outcomes ScreeningPackage id, companyId, name, isActive, lastUpdatedAt, checkConfig, graceDays Per-company screening package configurations defining which checks are required Domain 3 - Fraud Detection ¶ Table Key Columns Notes FraudStates userId, currentStage, currentDecision, currentConfidence, currentReasoning, currentKeySignals, currentTimestamp, previousStageDecision, createdAt, updatedAt Current fraud state per user. currentDecision : APPROVE / ESCALATE / REJECT. LLM-generated reasoning. Sample signal: location_mismatch: 1.0 FraudCheck id, user_id, stage, interviewId, jobId, triggered_on, process_status, retryCount, flag_reasons, automatedReasons, status, priority, idVerificationStatus, manual_review_status, manual_review_rational, manual_review_signs, isMostRecent, assigned_to, assigned_on, splReview Central fraud queue. Tracks automated and manual review states FraudSignalAuditLog id, userId, userVersionId, stage, signalType, modelName, triggeredOn, status, modelScore, createdAt Per-signal audit trail. Every fraud signal evaluated is logged here FraudEvents id, eventId, userId, eventType, stage, priorAlpha, priorBeta, priorProbability, priorStatus, posteriorAlpha, posteriorBeta, posteriorProbability, posteriorStatus, evidence, createdAt, createdBy, notes Bayesian belief update log. Each event updates prior→posterior fraud probability ProductionFraudState id, userId, status, fraudModality, source, sourceDetail, lastEvaluatedStage, productionModelId, userVersionId, isLatest, createdAt, updatedAt Final production fraud verdict. fraudModality : identity / time / quality AutoFraudChecks Automated rule-based fraud check records Scheduled fraud scans OnProjectFraudWindows id, employeeId, contractorId, projectId, scanDate, startTime, endTime, fraudType, fragmentCount, flags, flagMetadata, windowMetadata, screenshotMetadata, createdAt, updatedAt, userVersionId On-project time fraud analysis windows. Analyzes screenshot patterns QAReviewLog id, userId, reviewerId, bucketName, status, assignedOn, completedAt, isActive, lockKey, createdAt, updatedAt, comments, decision, userVersionId, stage, signalType, flags Human QA reviewer assignments and decisions for fraud cases CheatingDetection annotationId, userId, interviewId, interviewConfigId, formResponseId, formId, isCheating, cheatingProbability, signs, notes, reportedBy, createdAt, updatedAt Interview cheating detection results CheatingDetection_Audit All CheatingDetection columns + auditAction, auditTimestamp Audit trail for cheating detection DuplicateGroups groupId, userIdList, mergedIntoGroupId, createdAt Groups of suspected duplicate/sock-puppet accounts Domain 4 - Hiring Pipeline ¶ Table Key Columns Notes Listings_New listingId, version, uid, companyId, title, description, commitment, referralAmount, createdAt, deletedAt, status, requiredInterviewConfigId, rateMin, rateMax, hoursPerWeek, location, formId, automatedCommsOn, payRateFrequency, isPrivate, autoRedirectToApply, evaluationCriteria, offersEquity, rejectionTemplateSubject, rejectionTemplateBody, campaignId, ownerIds, goalNumHires, goalDeadline, isExploreAlways, interviewSchedulingEnabled, interviewScheduleLink, disableApplications, isMostRecent, offerExtendedText, minHeadcount, maxHeadcount, referralBoost, timeToAutoReject, automaticRejectionsOn, computedExplorePageVisibility, workArrangement, eligibleLocation, ineligibleResidenceLocation, listingType Primary job listing table. Includes pay ranges, location eligibility, automation settings Listings_New_Audit All Listings_New columns + auditAction, auditTimestamp Audit trail for listing changes Candidates candidateId, userId, companyId, listingUid, createdAt, deletedAt, status, notesForCandidate, birthday, physicalLocation, workAuthorizationStatus, responseId, version, uid, source, countryOfResidence, isMostRecent, listingId, listingStepConfigId, linkedinUrl, actionItem, lastSignificantUpdatedAt, rejectionReason, updatedBy, starred, appliedAt, goalId, automaticRejectAt, addedAt, referralId, isEligible, numCommsSent, lastCommSentAt Per-application record. Tracks status, notes, scheduled auto-rejection, outreach counts Candidates_Audit All Candidates columns + auditAction, auditTimestamp Audit trail for application changes CandidateMatchScores candidateId, listingId, matchScore, contextualSummary ML-generated candidate-to-listing fit scores with LLM explanations EvaluationCriteria evaluationCriteriaId, listingId, criteria, shortCriteria, type, hardFilter, position, updatedAt, evalCriterionCritique, evalCriterionCritiquePass, status Per-listing scoring rubric criteria ListingNotes listingNoteId, listingId, authorUserId, assigneeUserId, notificationStatus, createdAt, noteBody Recruiter notes on listings. Contains candid operational commentary SavedListings id, userId, listingId, listingUid, createdAt Candidates who bookmarked a listing ListingPipelines Pipeline stage configurations per listing Hiring funnel stage definitions TalentViewSearchUsers searchId, userId, score, addedAt, starredAt, deletedAt Users surfaced in talent search results SharableTalentViewConfig viewId, name, description, userIds, userCount, maxCandidatesCount, createdAt, updatedAt, revokedAt, createdBy, expiryAt, viewCount, visibleSections, preferredTitle Shareable talent shortlist configurations SharableTalentViewConfigUsers userId, viewId, workExperience, education, summary, createdAt, updatedAt, yearsOfExperience, interviews, forms, likeCount, dislikeCount, feedback Per-candidate data within shared talent views TalentViewUserEvaluations criteriaId, userId, criteriaScore Per-criteria scores for talent view candidates Domain 5 - Interviews and Assessments ¶ Table Key Columns Notes Forms_Audit formId, companyId, listingId, title, description, guide, evaluationCriteria, assessmentRubricId, items, isArchived, isAuthed, numQuestions, isUnified, allowFormRetakes, maxRetakeAttempts, allowCopyPaste, version, createdAt, updatedAt, createdBy, auditAction, auditTimestamp, prep, assessmentVersionId, feedbackConfig Interview/assessment form definitions. items contains full question list FormSubmissions formResponseId, formId, companyId, userId, responseStatus, formVersion, startedAt, submittedAt, activeTimeSeconds, posthogSessionIds, createdAt, updatedAt, attempt, isLatestSubmission, assessmentVersionId, feedbackSentAt Every interview submission. Tracks time spent ( activeTimeSeconds ) AssessmentRubrics assessmentRubricId, title, createdAt, instructions, sumScores, sumSquareScores, countScores, version, passThreshold Scoring rubric definitions with aggregate statistics AssessmentRubrics_Audit All AssessmentRubrics columns + auditAction, auditTimestamp Rubric change history AssessmentRubricItems_Audit assessmentRubricItemId, assessmentRubricId, criteria, shortName, points, position, format, relatedQuestionIds, version, auditAction, auditTimestamp, webSearch, smartScoring, type, config, createdAt, updatedAt Individual rubric criteria with AI scoring configuration AssessmentEvalState id, submissionId, assessmentType, jobType, status, retryCount, createdAt, reason, triggerSource, triggeredByUserId, modalJobId, durationMs, operationId, assessmentId Grading pipeline execution state AssessmentVersions Versioned assessment configurations Assessment version tracking AssessmentAudits Assessment activity audit trail Audit log for assessment operations GradedRubricItems Per-rubric-item graded scores Individual rubric item scores per submission GradedRubricItems_Audit Audit trail for graded items Score change history InterviewEvals interviewId, communicationScore, technicalScore, qaPairScores Aggregate interview scores by dimension InterviewScores scoreId, userId, interviewId, interviewConfigId, points, createdAt Final interview score per user InterviewIssues issueId, interviewId, issue, source, notes, startPosition, endPosition, reportedBy, createdAt, updatedAt Technical and integrity issues reported during interviews PairwiseComparisons listingId, listingUid, interviewConfigId, winnerResumeId, loserResumeId, reasoning, winnerUserId, loserUserId Bradley-Terry tournament comparisons for candidate ranking MercorScores candidateId, listingId, listingUid, resumeId, evaluationCriteria, interviewConfigId, mScoreRaw, mScoreNormalized, numComparisons, contextualSummary, userId, aggregateFeatureScore Final MercorScore per candidate per listing Domain 6 - Work Trials and Onboarding ¶ Table Key Columns Notes WorkTrial_Audit workTrialId, userId, companyId, listingStepConfigId, status, payableAmount, billableAmount, ciiaaDirect, ciiaaPassthrough, tow, offerLetter, startDate, endDate, payout, payment, paymentMethod, signature, projectId, billingAccountId, createdAt, updatedAt, version, auditAction, auditTimestamp, updatedBy Work trial contract records. Contains signed legal documents and pay amounts WorkTrialConfig workTrialConfigId, title, payableAmount, billableAmount, ciiaaDirect, ciiaaPassthrough, tow, endDate, emailTemplateSubject, emailTemplateBody, emailTemplateSubjectExtension, emailTemplateBodyExtension, interviewIds, formIds, createdAt, updatedAt, deletedAt, companyId, isUnified, projectId Reusable work trial templates OnboardingState id, shortName, name, threshold, createdAt, updatedAt, order Onboarding funnel steps. Sample: interview_completed threshold=1 order=0 OnboardingDocument onboardingDocumentId, onboardingDocument, createdAt, projectId Per-project onboarding materials TierProgress id, createdAt, updatedAt, userId, tierId, planId, status, completedAt, paidAt Contractor tier/level progression tracking PlanAssignments id, createdAt, updatedAt, userId, planId, assignedBy, startDate, endDate, userHours, tasksCompleted, status Assigns contractors to specific earning/task plans Domain 7 - Projects and AI Task Management ¶ Table Key Columns Notes Projects_Audit projectId, name, createdAt, updatedAt, companyId, archivedAt, externalId, onboardingDocumentId, userId, screenshotEnabled, userGroupEmail, description, requireAvailabilityUpdates, skills, projectType, offerExtendedText, annotationPlatform, annotationPlatformIDs, ssotLink, status, notes, version, auditAction, auditTimestamp, taskMetricsDatastore Full project configuration audit trail ProjectIAM id, projectId, userId, roleId, status, assignedBy, version, createdAt, updatedAt Role assignments within projects ProjectIAM_Audit All ProjectIAM columns + auditAction, auditTimestamp Project IAM change history ProjectCustomColumns id, projectId, name, dataType, position, createdBy, createdAt, updatedAt, deletedAt, sqlQuery, source Dynamic metadata columns per project. Some computed via SQL ProjectCustomColumnValueHistory id, customColumnId, jobId, value, changedBy, createdAt History of custom column values ProjectArchetypes archetypeId, projectId, archetypeText, createdAt, updatedAt, version, elements Character/persona definitions for annotation projects ProjectAttributeValues Project attribute key-value pairs Flexible project attribute storage ProjectViewConfig viewId, title, projectId, viewContext, createdByUserId, createdAt, updatedByUserId, updatedAt, deletedAt, roleId, viewType Saved view configurations for project management ProjectIntegrations id, projectId, groupMail, autoProvision, createdAt, updatedAt, oktaGroupId, integrationsData, oktaOwnerGroupId, oktaEPMGroupId, latestGroupBatch, latestBatchMemberCount, projectShortId, workspaceNotificationChannel, ownerGwGroup, epmGwGroup, slackChannelId Project integrations with Okta groups and Slack channels ProjectAutomations Project-specific automation configurations Automation bindings per project ProjectFunctions id, name, description, createdAt, updatedAt Named functions available in project automation TaskDefinitions taskDefId, projectId, rubric, autograder, version, createdAt, updatedAt, task_schema, metadata AI task type definitions with grading rubrics TaskDefinitions_Audit All TaskDefinitions columns + auditAction, auditTimestamp Task definition change history TaskAudits uid, taskDefinitionId, recordId, s3KeyPrefix, authorId, auditorId, status, outcome, autoOutcome, createdAt, updatedAt, dispute, disputedBy Individual task submission reviews with dispute tracking TaskAssignments id, createdAt, updatedAt, jobId, taskId, userId, appliedBy Maps tasks to jobs and users DeliverableBatches id, uid, name, projectId, invoiceLineItemId, status, taskCount, version, isLatest, metadata, createdAt, updatedAt, createdBy Grouped task deliverable batches for invoicing Deliverables deliverableId, jobId, userId, projectId, entityType, entityId, status, createdAt, updatedAt Individual deliverable records Deliverables_Audit All Deliverables columns + isMostRecent, auditAction, auditTimestamp Deliverable change history ProductivityProjectRules id, project_id, description, rules, created_by, is_active, version, created_at Per-project productivity monitoring rule configurations Domain 8 - Jobs and Contracts ¶ Table Key Columns Notes Jobs jobID, contractorID, companyID, status, payableRate, commitment, ciiaa_direct, ciiaa_passthrough, tow, payment, startDate, createdAt, updatedAt, expiresAt, tax_form, expected_hours, title, stripeSubscriptionId, billableRate, version, dismissalDate, insightful, paymentMethod, projectID, checkr, idVerification, uid, payout, offerLetter, listingUID, managerId, signature, backgroundCheck, isLatest, note, referralId, roleId, provisionIdpAccess, safety_waiver, sourceId, confidentiality, billingAccountID, backgroundCheckConfig Core employment contract. Contains pay rates, legal agreements, Stripe subscription Jobs_Audit All Jobs columns + auditAction, auditTimestamp Job contract change history JobEvents jobEventId, jobId, contractorId, actorId, actionType, metadata, createdAt Events on job contracts (status changes, communications). Sample: comm , Contract Reminder JobEventsQueue queueItemId, sourceType, sourceId, payload, renderedPreview, editedPreview, status, response, createdAt, resolvedBy, resolvedAt, jobEventId Queued job events pending processing or review JobEventReasonAssociations jobEventId, reasonId, createdAt Structured reasons associated with job events JobTasks Tasks linked to specific jobs Job-task mapping JobPerformanceMetrics_New jobPerformanceMetricsId, jobId, performanceScore, standardError, jobPerformanceSummary, version, createdAt, updatedAt ML-generated job performance metrics JobPerformanceMetrics_Audit jobPerformanceMetricsId, jobId, version, lvr, lvrReasoning, confidenceLevel, isFraud, wasDismissedEarly, jobSummary, auditAction, auditTimestamp, createdAt, updatedAt Detailed performance metrics audit trail including fraud flags JobPerformanceReviews_New performanceReviewId, jobId, contractorId, companyId, projectName, taskId, score, reviewNotes, performanceReasons, dismissalFlag, dismissalReason, reviewedBy, createdAt, updatedAt, oldReviewId, feedBackFlag Human-reviewed job performance assessments WeeklyProjectFeedback weeklyProjectFeedbackId, userId, jobID, weekStart, rating, feedbackText, submittedAt, updatedAt, createdAt Weekly contractor feedback on their project experience ContractorPerformance_New contractorPerformanceId, contractorId, standardError, performanceScore, performanceSummary, version, createdAt, updatedAt Aggregate contractor performance across all jobs ContractorPerformance_New_Audit All ContractorPerformance_New columns + auditAction, auditTimestamp Contractor performance change history PerformanceReviews performanceReviewId, contractorId, reviewDate, performanceDetails, stars, taskDetails, reviewBy, createdAt, updatedAt, companyId Company-authored contractor performance reviews with star ratings MLExperimentsJobPerformanceReviews Date of review, Account, Project, Reviewer, Work type, Review type, Name, Email, Quality of Work, Engagement, Offboarding Reason, Justification for rating Raw performance data for ML model training Domain 9 - Time Tracking and Productivity Surveillance ¶ Table Key Columns Notes InsightfulScreenshots id, externalId, contractorId, projectId, storageBucket, storageKey, storageUrl, storageProvider, fileExtension, contentType, fileSizeBytes, vendorName, schemaVersion, vendorMetadata, externalIdentifiers, screenshotTimestamp, timestampTranslated, timezoneOffset, timezone, isBlurred, isOriginal, isRemoved, removedAt, externalProductivityScore, computer, hwid, os, osVersion, agentVersion, appName, appFileName, appFilePath, windowTitle, browserUrl, document, browserSite, ip, gateways, windowId, activityId, fragmentId, createdAt, updatedAt Per-screenshot records with full device fingerprint (IP, MAC, HWID), application, URL, and S3 image link Timelog id, externalId, externalProjectId, employeeId, duration, timeStart, timeEnd, timezone, source, taskId, taskName, lineItemUid, adjustmentReason, uid, version, userId, isCompleted, linkFailReason, insightfulCreatedAt, insightfulUpdatedAt, createdAt, updatedAt Work session records synced from Insightful Timelog_Audit All Timelog columns + audit metadata Timelog change history Deductions id, contractId, contractorId, durationToSubtractMs, appName, reasonForDeduction, payoutCycleID, externalProjectId, externalEmployeeId, status, approvedBy, approvedAt, appliedBy, appliedAt, createdAt, createdBy, updatedAt Pay deductions for non-productive time with approval chain Domain 10 - Payments and Financial Infrastructure ¶ Table Key Columns Notes UserPaymentMethods id, userId, provider, providerMethodId, methodType, status, metadata, createdAt, updatedAt, version, countryCode Contractor payment accounts. Sample: stripe , acct_1R0V**** , express_account , onboarded , USA UserPaymentMethods_Audit All UserPaymentMethods columns + auditAction, auditTimestamp Payment method change history MercorUserFinancials id, userId, paymentProvider, providerIdentifier, accountDetails, lastFetchedOn, createdOn, updatedOn Full financial account details including bank routing numbers PaymentLineItems id, version, cycleStartTs, cycleEndTs, totalPayableAmount, totalBillableAmount, status, createdAt, updatedAt, uid, jobUid, dispatchFailureReason, timelogUid, bonusUid, transferId, referralUid, companyId, projectId, contractorId, timeStamp, isLatestVersion, referralId, moneyOutId, eventTime, referralEligibilityId Core payment ledger. Amounts in cents PaymentLineItems_Audit All PaymentLineItems columns + auditAction, auditTimestamp Payment line item change history PaymentLineItems_TransactionalAudit Transactional-level payment audit Fine-grained payment operation audit trail MoneyOut_Audit id, statementId, entityId, userId, entity, externalAccountId, externalTransferId, cycleStartTs, cycleEndTs, totalAmount, paymentMethod, status, createdAt, failureReason, payoutCycleId, auditTimestamp, auditAction, version Outbound payment records WiseDisbursements id, moneyOutId, amount, currency, sequenceNumber, wiseTransferId, wiseQuoteId, status, failureReason, createdAt, updatedAt, accountId International Wise payment records PayoutCycles cycleStartTs, cycleEndTs, id, status, configId, configVersion Pay period definitions PayoutRecords Individual payout transaction records Detailed payout ledger PayoutConfigs payoutConfigId, status, type, configuration, version Payment configuration rules InvoiceLineItems id, name, companyId, invoiceId, sowId, taskCount, rawAmount, adjustedAmount, status, description, metadata, createdAt, updatedAt, createdBy Company invoice line items BillingAccounts Company billing account definitions Client billing account management BillingConfigs id, uid, version, isLatestVersion, rules, projectId, createdAt, updatedAt, createdBy Billing rule configurations (markup, caps) BillingRateCards billingRateCardId, uid, version, isLatestVersion, sowId, formulaType, rateRows, createdAt, updatedAt, createdBy Per-SOW rate card definitions RevenueAdjustments id, companyId, projectId, attestationId, cancelledAdjId, amountCentsUsd, category, revenueRecognitionDate, reason, createdAt, creatorId, isCancellation, formula, labels, aggregationFields, attachments, invoices Revenue adjustments and corrections FinanceLabels Finance label definitions Labels for financial categorization CompanyFinanceLabels companyId, financeLabelId, createdAt, creatorId Finance label assignments to companies ReferralEligibility id, createdAt, updatedAt, referralUid, campaignId, referrerAmount, refereeAmount, referrerLineItemId, refereeLineItemId, criteriaId, onboardingStateId, referralId, entity_id, entity_type, type, jobId, billingAccountId, toolingIdempotencyKey, creatorId Referral payment eligibility and vesting conditions Domain 11 - Referrals and Growth ¶ Table Key Columns Notes Referrals referralId, referredUserId, referringUserId, createdAt, version, uid, status, reason, listingId, campaignId, totalEarned, totalEarningsPotential, state, deleted_at, paidAt, disputeStatus, isActive, referral_cap, referralIdempotencyKey, isPaymentBlocked, isGuaranteedReferral Core referral records with earnings tracking Referrals_Audit All Referrals columns + audit metadata Referral change history ReferralReminder referralId, createdAt, lastSentAt Referral reminder email tracking GuaranteedReferralQuota quotaId, referringUserId, offPlatformUserId, shortenedLink, weekStart, status, createdAt, updatedAt, isEmailSent Guaranteed referral program quota management ReferrerMeta Referrer metadata and configuration Additional referrer attributes OffPlatformCampaigns Campaign definitions for off-platform outreach External recruitment campaign management OffPlatformCampaignSteps campaignStepId, stepNumber, campaignId, campaignType, subject, messageTemplate, parameters, scheduledAt, status, outreachedCandidateIds, failedCandidateIds, createdAt, updatedAt Multi-step outreach sequence steps OffPlatformRecruitingManager id, managerId, offPlatformUserId, listingId, createdAt, updatedAt, updatedBy Off-platform recruiter assignments OffPlatformUsersMapping mappingId, userId, offPlatformUserId, createdAt, updatedAt, referringUserId, status Mapping between platform and off-platform user identities Domain 12 - Communications and Outreach ¶ Table Key Columns Notes Comms commId, groupId, senderId, receiverId, content, type, triggerRef, createdAt, listingReferenceUID In-platform messaging with full message content CommsSent Communication delivery records Message send tracking EmailTemplates emailTemplateId, companyId, subject, content, createdBy, createdAt, updatedAt, isGlobal, tags, isPersonal Email template library AircallComms Phone call logs from Aircall VoIP integration Recruiter call records LinkedinWarmIntros warmIntroId, linkedinUrl, email, referringUserId, listingId, commEvent, status, createdAt, updatedAt, sentAt LinkedIn outreach campaign records PartnerChatThreads threadId, listingId, referralId, partnerId, createdAt Chat threads with referral partners FirstTimeInvites commId, userId, listingId, createdAt, commEvent, refListingUid, contentType, subject, listingIdCount First-contact invitations to candidates AutomationTemplates templateId, name, description, category, handler, sourceType, sourceSql, templateBody, paramsSchema, cron, idempotency, autoApprove, version, createdAt, updatedAt, deletedAt, triggerConfig, config Automated notification/workflow templates Feedback id, user_id, question_text, question_response, rating, device, created_at, updated_at In-app user feedback submissions Domain 13 - Company and Access Management ¶ Table Key Columns Notes Company companyId, name, description, website, externalName, billingModel, logo, brandVisible, billingStartDay, billingEndDay, aboutCompany, universe Client company master records IAM roleId, companyId, status, userId_v4, id, version Company-level role assignments IAM_Audit roleId, companyId, status, userId_v4, id, version, auditAction, auditTimestamp IAM change history. Sample: roleId=ghost , REMOVED IAMOutbox id, resourceType, resourceId, relation, subjectType, subjectId, operation, requestedBy, requestedByService, createdAt, callerToken IAM change outbox for event-driven propagation GodmodeCompanies companyId, createdAt, createdBy, includeInFillRate Companies accessible via internal Godmode admin GodmodeArbitraryCells entityType, entityGmId, acKey, acValueNumber, acValueString, acValueFormula, userId, createdAt, acMetadata Arbitrary Godmode data cells for internal operations Audience id, projectId, companyId, audienceType, slug, anchorType, anchorId, oktaGroupId, googleGroupId, slackGroupId, insightfulTaskId, createdAt, updatedAt, slackChannelId, query Audience definitions linking projects to Okta/Slack/Insightful groups AudienceTargetProviders id, audienceId, name, externalId, type, createdAt, metadata External providers linked to audiences DrivePermission id, driveId, googleGroupId, permissionLevel, googlePermissionId, createdAt, updatedAt Google Drive access permissions for project documents Domain 14 - Skills Certifications and Endorsements ¶ Table Key Columns Notes Skills skillId, name, description, CertificationPolicy, type, parent, createdAt Hierarchical skills taxonomy CertificationPolicies_Audit certificationPolicyId, companyId, name, description, rules, isActive, isUnified, createdAt, icon, isRevokable, requiresApproval, version, auditAction, auditTimestamp, iconColor, showBadge, displayText Certification program definitions Certifications_Audit certificationId, certificationPolicyId, userId, evidence, status, isCertified, earnedAt, note, createdAt, updatedAt, version, auditAction, auditTimestamp Individual earned certifications. evidence contains scoring proof SkillCertifications_Audit uid, userId, skillId, isCertified, version, lastEvaluatedAt, auditedAt, auditAction Per-skill certification status SkillCertificationsEvidence_Audit uid, userId, skillId, isCertified, version, sourceType, sourceId, createdAt, updatedAt, auditedAt, auditAction, score, metadata Evidence backing skill certifications ContractorEndorsements endorsementId, endorsingJobId, endorsedJobId, endorsingUserId, endorsedUserId, contents, tags, createdAt, updatedAt, source, sentiment Peer endorsements with text content and sentiment UserResumeEvaluation evaluationId, workExperienceScore, yearsOfWorkExperience, graduationYear, mScore, inferredRole, workExperienceSkills, resumeEvalScore, awardScore, educationScore, rateAcademicCompetitions, rateCompetitiveProgramming, rateHackathonPerformance, sumScore, technicalSkills, normalisedSumScore, highestDegree, userId ML resume evaluation scores CandidateVouches vouchId, voucherUserId, candidateUserId, candidateEmail, candidateLinkedinId, candidateName, resumeS3Key, resumeHash, howKnowSocialPlatform, howKnowSocially, howKnowWorkedTogether, howKnowStudiedTogether, howKnowOther, reasonSkills, reasonEducation, reasonEmployer, reasonExpertise, reasonOther, createdAt, updatedAt Structured peer vouching with relationship details Domain 15 - Analytics and ML ¶ Table Key Columns Notes DbtFirmSchoolRank firmId, firmName, academicField, nProfiles, avgSchoolRank, medianSchoolRank, priorMeanSchoolRank, ebPriorStrength, ebAvgSchoolRank, firmsInField, firmSchoolRank, firmSchoolRankPercentile Employer prestige scores for ~154,000 firms. Used in resume scoring DbtSchoolRankings academicField, schoolName, schoolScore, schoolRank School prestige rankings by field PosthogAnalytics uuid, userEmail, company, startTimeUtc, endTimeUtc, activetime, inactivetime, startUrl PostHog sessions linked to user email identity SearchAnalytics run_id, run_timestamp, avg_relevance_score, avg_prestige_score, p99_latency_ms, position_weighted_relevance_score, avg_relevant_prestige_score Search quality metrics over time ForecastMetrics entity, id, dt, snapshot_dt, modelVersion, predictedValue ML forecast outputs for capacity and fill rate planning MLExperimentsJobPerformanceReviews Date of review, Account, Project, Reviewer, Work type, Review type, Name, Email, Quality of Work, Engagement, Offboarding Reason, Justification for rating Raw performance review data for ML training TalentViewUserEvaluations criteriaId, userId, criteriaScore Structured per-criteria talent evaluations ProductivityProjectRules id, project_id, description, rules, created_by, is_active, version, created_at Per-project productivity monitoring rule definitions Domain 16 - Infrastructure and DevOps ¶ Table Key Columns Notes IacDeploymentRuns id, runType, environment, status, commitSha, branch, actor, githubRunId, githubRunUrl, prNumber, stacksAffected, resourcesAdded, resourcesChanged, resourcesDestroyed, summary, durationSeconds, startedAt, completedAt, createdAt Terraform deployment records. Exposes GitHub monorepo URLs, engineer usernames, Terraform plan output ProductionDeployment deploymentRecordId, releaseTag, buildHash, deployedAt, deploymentIds, taskDefinitionArns, status, createdAt, updatedAt ECS production deployment records. Contains AWS task definition ARNs PreprodDeployment id, releaseTag, commitSha, deployedAt, loadTestPassed, releaseOwner, status, createdAt, updatedAt Staging deployment records with load test results PreprodDeploymentTest id, test_message, created_at, updated_at Test table for pre-production deployment validation ProductionVersion id, lastVersion, lastReleaseTag, lastBuildHash, updatedAt Single-row pointer to current production version RollbackExecution Rollback event records including affected services Emergency rollback tracking DATABASECHANGELOG ID, AUTHOR, FILENAME, DATEEXECUTED, MD5SUM, DESCRIPTION, COMMENTS, EXECTYPE, LIQUIBASE Liquibase schema migration history. Reveals engineer names, migration filenames DATABASECHANGELOGLOCK Liquibase migration lock state Prevents concurrent schema migrations AgentSandboxes sandboxId, userId, title, agentType, status, backendType, host, stopReason, transcriptRawUrl, transcriptConsolidatedUrl, snapshotId, lastSnapshotId, snapshotStorageKey, acpSessionId, backendId, sandboxToken, claimedAt, expiresAt, createdAt, updatedAt, deletedAt AI coding agent sandbox sessions. transcriptRawUrl links to S3 conversation logs DrivePermission id, driveId, googleGroupId, permissionLevel, googlePermissionId, createdAt, updatedAt Google Drive permission records Domain 17 - Reference and Miscellaneous ¶ Table Key Columns Notes Country id, isoCode3, name, currency, psp, createdAt, updatedAt Country reference table with payment service provider per country TagAssignments_Audit tagAssignmentId, tagId, entityType, entityId, createdAt, updatedAt, version, auditAction, auditTimestamp Tag assignments to entities ShortenedUrls URL shortener records Shortened URL definitions UrlClicks id, urlId, clickedAt, ipHash, userId, country Click tracking on shortened URLs BeelineJobMapping External job platform mapping Maps Mercor jobs to Beeline external system UserManagement Internal user management records Admin user management UserManagementWorkflows User management workflow state Multi-step user management processes ActionsQueue Queued action records General purpose action queue GoldenReviewSample Golden reference samples for review calibration QA calibration data References Professional reference records Additional reference management CatfishAuditLog id, slackUserId, slackUserName, targetEmail, platform, environment, intent, status, errorMessage, slackChannelId, createdAt Internal user lookup audit. Records every time staff look up user data via "Catfish" tool CapacityApplicationLog id, capacityBudgetId, capacityLogId, projectId, actionsTakenJson, status, notes, createdAt Capacity budget application tracking OffPlatformCampaignSteps campaignStepId, stepNumber, campaignId, campaignType, subject, messageTemplate, parameters, scheduledAt, status, outreachedCandidateIds, failedCandidateIds, createdAt, updatedAt Off-platform outreach campaign step execution End of Appendix A Document prepared for security research and educational purposes. All PII has been obfuscated. Published with JotBird