data-quality

2 articles
sort: new top best
clear filter
0 7/10

A technical walkthrough of building a silent data loss detection system using BigQuery's native Storage Write API metadata and built-in ML anomaly detection (AI.DETECT_ANOMALIES), implemented as a dbt incremental model that monitors hundreds of tables without external infrastructure or custom ML training.

BigQuery Storage Write API WRITE_API_TIMELINE_BY_PROJECT AI.DETECT_ANOMALIES TimesFM Nordnet Pub/Sub dbt
robertsahlin.substack.com · boxer_shorts · 1 day ago · details · hn
0 5/10

DataForge is an open-source toolkit for generating deterministic synthetic datasets for LLM tool-calling fine-tuning, featuring 8,500+ lines of code with anti-template detection and quality gates. The accompanying NHA Epistemic Deliberations dataset provides 183 real multi-agent reasoning sessions from 3-7 different LLM providers with convergence measurement and adversarial challenges for training reasoning-focused models.

DataForge NotHumanAllowed Anthropic OpenAI Gemini DeepSeek Grok Qwen 7B PROMETHEUS CASSANDRA ATHENA Geth Consensus
nothumanallowed.com · senza1dio · 1 day ago · details · hn