mlops

2 articles
sort: new top best
clear filter
0 2/10

NVIDIA's AI Cluster Runtime is an open-source project that provides validated, reproducible Kubernetes cluster configurations for GPU-accelerated AI workloads through layered recipes, CLI tooling, and validation mechanisms. It enables consistent deployment across different cloud environments and hardware by capturing exact component versions, dependencies, and configuration parameters.

AI Cluster Runtime NVIDIA Kubernetes Amazon EKS Kubeflow Trainer NVIDIA Dynamo NVIDIA GPU Operator NCCL CNCF Certified Kubernetes AI Conformance Program H100 Blackwell ArgoCD Mark Chmarny Nathan Taber
developer.nvidia.com · mchmarny · 1 day ago · details · hn
0 5/10

SENTINEL is an MCP server that audits AI agent reasoning in real-time before high-stakes decisions execute, using a four-stage pipeline (signal fidelity, pattern classification, reliability scoring, authority gate) to detect reasoning failures, policy staleness, and accuracy drift. The system integrates with agentgateway for governance and Datadog/Braintrust for monitoring, demonstrated in a healthcare use case where an insurance claim agent's accuracy drifted from 84% to 44% undetected.

SENTINEL Andrew Espira agentgateway Solo.io Claude GPT Datadog Braintrust Cleric Aetna UnitedHealthcare MCP CEL
espiradev.org · aespira · 2 days ago · details · hn