Before you let AI agents loose, you'd better know what they're capable of

thenewstack.io · chhum · 18 hours ago · view on HN · security
0 net
Before you let AI agents loose, you’d better know what they’re capable of - The New Stack TNS OK SUBSCRIBE Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development. EMAIL ADDRESS REQUIRED SUBSCRIBE RESUBSCRIPTION REQUIRED It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription. RE-SUBSCRIBE The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy . Welcome and thank you for joining The New Stack community! Please answer a few simple questions to help us deliver the news and resources you are interested in. FIRST NAME REQUIRED LAST NAME REQUIRED COMPANY NAME REQUIRED COUNTRY REQUIRED Select ... United States Canada India United Kingdom Germany France --- Afghanistan Albania Algeria American Samoa Andorra Angola Anguilla Antarctica Antigua and Barbuda Argentina Armenia Aruba Asia/Pacific Region Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan Bolivia Bonaire, Sint Eustatius and Saba Bosnia and Herzegovina Botswana Bouvet Island Brazil British Indian Ocean Territory Brunei Darussalam Bulgaria Burkina Faso Burundi Cambodia Cameroon Canada Cape Verde Cayman Islands Central African Republic Chad Chile China Christmas Island Cocos (Keeling) Islands Colombia Comoros Congo Congo, The Democratic Republic of the Cook Islands Costa Rica Croatia Cuba Curaçao Cyprus Czech Republic Côte d'Ivoire Denmark Djibouti Dominica Dominican Republic Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Falkland Islands (Malvinas) Faroe Islands Fiji Finland France French Guiana French Polynesia French Southern Territories Gabon Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guadeloupe Guam Guatemala Guernsey Guinea Guinea-Bissau Guyana Haiti Heard Island and Mcdonald Islands Holy See (Vatican City State) Honduras Hong Kong Hungary Iceland India Indonesia Iran, Islamic Republic Of Iraq Ireland Isle of Man Israel Italy Jamaica Japan Jersey Jordan Kazakhstan Kenya Kiribati Korea, Republic of Kuwait Kyrgyzstan Laos Latvia Lebanon Lesotho Liberia Libyan Arab Jamahiriya Liechtenstein Lithuania Luxembourg Macao Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Martinique Mauritania Mauritius Mayotte Mexico Micronesia, Federated States of Moldova, Republic of Monaco Mongolia Montenegro Montserrat Morocco Mozambique Myanmar Namibia Nauru Nepal Netherlands Netherlands Antilles New Caledonia New Zealand Nicaragua Niger Nigeria Niue Norfolk Island North Korea North Macedonia Northern Mariana Islands Norway Oman Pakistan Palau Palestinian Territory, Occupied Panama Papua New Guinea Paraguay Peru Philippines Pitcairn Islands Poland Portugal Puerto Rico Qatar Reunion Romania Russian Federation Rwanda Saint Barthélemy Saint Helena Saint Kitts and Nevis Saint Lucia Saint Martin Saint Martin Saint Pierre and Miquelon Saint Vincent and the Grenadines Samoa San Marino Sao Tome and Principe Saudi Arabia Senegal Serbia Serbia and Montenegro Seychelles Sierra Leone Singapore Sint Maarten Slovakia Slovenia Solomon Islands Somalia South Africa South Georgia and the South Sandwich Islands South Sudan Spain Sri Lanka Sudan Suriname Svalbard and Jan Mayen Swaziland Sweden Switzerland Syrian Arab Republic Taiwan Tajikistan Tanzania, United Republic of Thailand Timor-Leste Togo Tokelau Tonga Trinidad and Tobago Tunisia Turkey Turkmenistan Turks and Caicos Islands Tuvalu Uganda Ukraine United Arab Emirates United Kingdom United States United States Minor Outlying Islands Uruguay Uzbekistan Vanuatu Venezuela Vietnam Virgin Islands, British Virgin Islands, U.S. Wallis and Futuna Western Sahara Yemen Zambia Zimbabwe Åland Islands ZIPCODE REQUIRED Great to meet you! Tell us a bit about your job so we can cover the topics you find most relevant. What is your job level? REQUIRED --> Select ... C-Level VP/Director Manager/Supervisor Mid Level or Senior Non-Managerial Staff Entry Level/Junior Staff Freelancer/Contractor Student/Intern Other ... Which of these most closely describes your job role? REQUIRED Select ... Developer/Software Engineer SysAdmin/Operations/SRE Architect Security Professional DevOps Engineer/Team Community Manager/Developer Advocate IT management, including CIO/CISO/CTO Business Development/Marketing/Sales Enthusiast/Hobbyist Other ... How many employees are in the organization you work with? REQUIRED Select ... Self-employed 2-10 11-50 51-250 251-1,000 1,001-10,000 > 10,000 I am not working What option best describes the type of organization you work for? REQUIRED Select ... “End user” organization that primarily uses IT products and services to support their business deliverables Hardware / software vendor or supplier Cloud service provider or managed service provider System integrator or IT consulting firm Other ... Which of the following best describes your organization's primary industry? REQUIRED Select ... Advertising/Marketing Aerospace/Aviation Agriculture Automotive Biotech/Pharmaceutical Business Services (accounting, consulting, etc.) Computers/Information Technology Construction Education Facilities/Service Industry Finance/Financial Services (banking, insurance, etc.) Government Healthcare Human Resources Legal Life sciences (biotech, pharmaceuticals, etc.) Manufacturing Media Non-profit Real Estate Retail/Consumer Goods Telecommunications Transportation/Logistics Travel/Hospitality/Entertainment Utility/Energy Other ... LINKEDIN PROFILE URL Welcome! We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game. What’s next? Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups. Follow TNS on your favorite social media networks. --> Become a TNS follower on LinkedIn . Check out the latest featured and trending stories while you wait for your first TNS newsletter. PREV 1 of 2 NEXT VOXPOP As a JavaScript developer, what non-React tools do you use most often? ✓ Angular 0% ✓ Astro 0% ✓ Svelte 0% ✓ Vue.js 0% ✓ Other 0% ✓ I only use React 0% ✓ I don't use JavaScript 0% Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter: SUBMIT NEW! Try Stackie AI ARCHITECTURE Cloud Native Ecosystem Containers Databases Edge Computing Infrastructure as Code Linux Microservices Open Source Networking Storage ENGINEERING AI AI Engineering API Management Backend development Data Frontend Development Large Language Models Security Software Development WebAssembly OPERATIONS AI Operations CI/CD Cloud Services DevOps Kubernetes Observability Operations Platform Engineering PROGRAMMING C++ Developer tools Go Java JavaScript Programming Languages Python Rust TypeScript CHANNELS Podcasts Ebooks Events Webinars Newsletter TNS RSS Feeds THE NEW STACK About / Contact Sponsors Advertise With Us Contributions PODCASTS EBOOKS EVENTS WEBINARS NEWSLETTER CONTRIBUTE ARCHITECTURE ENGINEERING OPERATIONS PROGRAMMING Cloud Native Ecosystem Containers Databases Edge Computing Infrastructure as Code Linux Microservices Open Source Networking Storage Tetrate launches open source marketplace to simplify Envoy adoption Mar 11th 2026 10:52am, by Adrian Bridgwater Cloud repatriation is hard. Here's how to build a self-service developer platform that works. Mar 4th 2026 9:14am, by TNS Staff Why Kubernetes 1.35 is a game-changer for stateful workload scaling Feb 21st 2026 8:00am, by Janakiram MSV The developer as conductor: Leading an orchestra of AI agents with the feature flag baton Feb 19th 2026 3:31pm, by TNS Staff pg_lake comes to Snowflake Postgres: What it means for open standards Feb 6th 2026 1:00am, by Jelani Harper How to deploy an AI server on your Debian/Ubuntu server Mar 10th 2026 11:00am, by Jack Wallen NanoClaw can stuff each AI agent into its own Docker container to deal with OpenClaw's security mess Mar 7th 2026 10:00am, by David Eastman IT-Tools brings many useful developer tools into one convenient location Mar 6th 2026 12:00pm, by Jack Wallen How WebAssembly plugins simplify Kubernetes extensibility Mar 3rd 2026 2:00pm, by B. Cameron Gain Netdata is a seriously impressive server monitoring tool Feb 26th 2026 10:00am, by Jack Wallen With GridGain acquisition, MariaDB bets on in-memory computing and Apache Ignite Mar 10th 2026 6:47am, by Paul Sawers Moving AI apps from prototype to production requires enterprise-grade postgres infrastructure Mar 9th 2026 7:00am, by Meredith Shubel Why the "bible" of data systems is getting a massive rewrite for 2026 Mar 4th 2026 5:00am, by Cynthia Dunlop Why the secret to scaling AI isn’t a better model, it's a simpler foundation Feb 26th 2026 5:00am, by Ajay Khanna What happens to a database when the user is an AI agent Feb 25th 2026 5:00am, by Max Liu Developers are coding to a moving target, and nobody knows where AI lands next Mar 3rd 2026 7:33am, by Adrian Bridgwater Cloudflare’s new Markdown support shows how the web is evolving for AI agents Mar 2nd 2026 4:30am, by David Eastman React Server Components Vulnerability Found Dec 6th 2025 7:00am, by Loraine Lawson Kubernetes at the Edge: Lessons From GE HealthCare’s Edge Strategy Nov 24th 2025 10:00am, by Vicki Walker Building a Cloud-to-Edge Architecture Across 40K Global Locations Nov 20th 2025 10:00am, by Vicki Walker Why "automated" infrastructure might cost more than you think Feb 24th 2026 4:00am, by Justyn Roberts Why 40% of AI projects will be canceled by 2027 (and how to stay in the other 60%) Feb 13th 2026 6:00am, by Alex Drag Durable Execution: Build reliable software in an unreliable world Feb 2nd 2026 3:23pm, by Charles Humble Terraform challenger Formae expands to more clouds Jan 28th 2026 6:00am, by Joab Jackson IBM HashiCorp 'Sunsets' Terraform's External Language Support Dec 12th 2025 2:00pm, by Joab Jackson Google will soon bring Chrome to ARM64 Linux Mar 12th 2026 1:00pm, by Frederic Lardinois How to deploy an AI server on your Debian/Ubuntu server Mar 10th 2026 11:00am, by Jack Wallen IT-Tools brings many useful developer tools into one convenient location Mar 6th 2026 12:00pm, by Jack Wallen Long-term support for Linux releases gets a new lease on life Mar 6th 2026 9:30am, by Steven J. Vaughan-Nichols AerynOS is a Linux distribution geared toward performance and bulletproof updates Mar 4th 2026 12:00pm, by Jack Wallen Tetrate launches open source marketplace to simplify Envoy adoption Mar 11th 2026 10:52am, by Adrian Bridgwater OpenTelemetry roadmap: Sampling rates and collector improvements ahead Feb 24th 2026 11:00am, by B. Cameron Gain Merging To Test Is Killing Your Microservices Velocity Dec 16th 2025 7:00am, by Arjun Iyer IBM’s Confluent Acquisition Is About Event-Driven AI Dec 11th 2025 6:00am, by Joab Jackson Deploy Agentic AI Workflows With Kubernetes and Terraform Nov 26th 2025 9:00am, by Oladimeji Sowole Galileo releases Agent Control, a centralized guardrails platform for enterprise AI agents Mar 11th 2026 12:48pm, by Steven J. Vaughan-Nichols Tetrate launches open source marketplace to simplify Envoy adoption Mar 11th 2026 10:52am, by Adrian Bridgwater With its latest Phi-4 reasoning model, Microsoft reckons bigger isn’t always better Mar 10th 2026 12:51pm, by Paul Sawers With GridGain acquisition, MariaDB bets on in-memory computing and Apache Ignite Mar 10th 2026 6:47am, by Paul Sawers Is AI killing open-source software? Mar 7th 2026 9:00am, by Steven J. Vaughan-Nichols GSMA Open Gateway offers developers one API for 300+ mobile networks Mar 4th 2026 10:26am, by Adrian Bridgwater How Homepage simplifies monitoring your self-hosted services Feb 6th 2026 8:00am, by Jack Wallen S3 is the new network: Rethinking data architecture for the cloud era Feb 2nd 2026 4:00am, by Max Liu Cisco is using eBPF to rethink firewalls, vulnerability mitigation Jan 26th 2026 9:00am, by Joab Jackson You Might Not Know This, but Your NAS Might Be a Good Docker Server Jan 16th 2026 10:00am, by Jack Wallen S3 is the new network: Rethinking data architecture for the cloud era Feb 2nd 2026 4:00am, by Max Liu Agoda’s secret to 50x scale: Getting the database basics right Jan 28th 2026 7:00am, by Cynthia Dunlop Chainguard EmeritOSS backs MinIO, other orphaned projects Jan 27th 2026 6:15am, by Steven J. Vaughan-Nichols You Might Not Know This, but Your NAS Might Be a Good Docker Server Jan 16th 2026 10:00am, by Jack Wallen Is Sloppy File Sharing Endangering Your Enterprise? Jan 14th 2026 8:00am, by Vicki Walker AI AI Engineering API Management Backend development Data Frontend Development Large Language Models Security Software Development WebAssembly New Perplexity APIs give developers access to agentic workflows and orchestration Mar 12th 2026 12:22pm, by Meredith Shubel Anthropic's Claude can now draw interactive charts and diagrams Mar 12th 2026 11:00am, by Frederic Lardinois Gloo built a faith-based AI platform that already has secular interest Mar 12th 2026 5:00am, by Frederic Lardinois Amazon calls engineers for a “deep dive” internal meeting to discuss "GenAI"-related outages Mar 10th 2026 1:09pm, by Meredith Shubel Nvidia plans NemoClaw launch, an open-source platform for AI agents Mar 10th 2026 11:35am, by Meredith Shubel Before you let AI agents loose, you’d better know what they’re capable of Mar 12th 2026 1:22pm, by Charles Humble SurePath AI advances MCP policy controls to tighten the cable on AI's USB-C Mar 12th 2026 12:54pm, by Adrian Bridgwater Microsoft's VS Code team moved to weekly releases after 10 years of monthly — and credits AI for making it possible Mar 11th 2026 10:38am, by Darryl K. Taft JetBrains names the debt AI agents leave behind Mar 11th 2026 9:57am, by Darryl K. Taft The 2 failures with AI coding that are creating security bottlenecks Mar 11th 2026 9:15am, by Julie Davila Before you let AI agents loose, you’d better know what they’re capable of Mar 12th 2026 1:22pm, by Charles Humble GSMA Open Gateway offers developers one API for 300+ mobile networks Mar 4th 2026 10:26am, by Adrian Bridgwater Your AI strategy is built on layers of API sediment Feb 17th 2026 9:37am, by Charles Humble Solving the Problems That Accompany API Sprawl With AI Jan 15th 2026 1:00pm, by Heather Joslyn 4 Core Principles for Scaling Your API Engineering Practice Jan 13th 2026 10:00am, by Matthias Biehl How To Get DNS Right: A Guide to Common Failure Modes Dec 24th 2025 8:00am, by Sheldon Pereira and Denton Chikura Combining Rust and Python for High-Performance AI Systems Dec 3rd 2025 1:00pm, by Zziwa Raymond Ian How MCP Uses Streamable HTTP for Real-Time AI Tool Interaction Aug 18th 2025 10:34am, by Janakiram MSV A Backend for Frontend: Watt for Node.js Simplifies Operations Aug 14th 2025 6:00am, by Loraine Lawson Human-on-the-Loop: The New AI Control Model That Actually Works Aug 4th 2025 8:00am, by Steve Wilson Why the "bible" of data systems is getting a massive rewrite for 2026 Mar 4th 2026 5:00am, by Cynthia Dunlop How to clone a drive to an image with Clonezilla Mar 3rd 2026 1:00pm, by Jack Wallen Databases weren’t built for agent sprawl – SurrealDB wants to fix it Feb 24th 2026 2:07pm, by Paul Sawers How to ground AI agents in accurate, context-rich data Feb 13th 2026 5:00am, by Todd R. Weiss ShareChat hit a billion features per second, then it had to make it 10x cheaper Feb 12th 2026 6:00am, by Cynthia Dunlop Confluent adds A2A support, anomaly detection, and Queues for Kafka in major platform update Mar 3rd 2026 10:21am, by Jelani Harper Google's Chrome browser moves to a two-week release cycle Mar 3rd 2026 9:00am, by Frederic Lardinois Meta gave React its own foundation. But it's not letting go just yet. Mar 3rd 2026 4:00am, by Paul Sawers The shift left hangover: Why modern platforms are shifting down to cure developer fatigue Jan 30th 2026 6:22pm, by Steve Corndell Mastra empowers web devs to build AI agents in TypeScript Jan 28th 2026 11:00am, by Loraine Lawson "Self-healing" IT? HPE research explores how AI-trained models can catch silent infrastructure failures Mar 11th 2026 9:37am, by Jennifer Riggins How context rot drags down AI and LLM results for enterprises, and how to fix it Mar 9th 2026 9:00am, by Todd R. Weiss Snowflake Cortex Code CLI adds dbt and Apache Airflow support for AI-powered data pipelines Mar 8th 2026 6:00am, by Jelani Harper Prompting vs. RAG vs. fine-tuning: Why it’s not a ladder Jan 29th 2026 10:00am, by Ibrahim Kamal LLMs create a new blind spot in observability Jan 24th 2026 10:00am, by Shahar Azulay SurePath AI advances MCP policy controls to tighten the cable on AI's USB-C Mar 12th 2026 12:54pm, by Adrian Bridgwater The 2 failures with AI coding that are creating security bottlenecks Mar 11th 2026 9:15am, by Julie Davila Anthropic launches a multi-agent code review tool for Claude Code Mar 9th 2026 12:00pm, by Frederic Lardinois NanoClaw can stuff each AI agent into its own Docker container to deal with OpenClaw's security mess Mar 7th 2026 10:00am, by David Eastman Sam Altman wonders: Could the government nationalize artificial general intelligence? Mar 5th 2026 5:00am, by David Cassel The 2 failures with AI coding that are creating security bottlenecks Mar 11th 2026 9:15am, by Julie Davila Vibe code full-stack apps fast with TanStack Start Mar 8th 2026 10:00am, by Jessica Wachtel Open-source coding agents like OpenCode, Cline, and Aider are solving a huge headache for developers Mar 7th 2026 6:00am, by Paul Sawers Nearly half of all companies now use Rust in production, survey finds Mar 6th 2026 10:45am, by Darryl K. Taft Why enterprise software development needs air traffic control Mar 4th 2026 2:35pm, by Emilio Salvador How WebAssembly plugins simplify Kubernetes extensibility Mar 3rd 2026 2:00pm, by B. Cameron Gain WebAssembly is everywhere. Here's how it works Feb 25th 2026 11:00am, by Jessica Wachtel Wasm vs. JavaScript: Who wins at a million rows? Feb 22nd 2026 6:00am, by Jessica Wachtel How WebAssembly and Web Workers prevent UI freezes Feb 7th 2026 9:00am, by Jessica Wachtel WebAssembly vs. JavaScript: Testing Side-by-Side Performance Jan 20th 2026 9:00am, by Jessica Wachtel AI Operations CI/CD Cloud Services DevOps Kubernetes Observability Operations Platform Engineering Before you let AI agents loose, you’d better know what they’re capable of Mar 12th 2026 1:22pm, by Charles Humble Why AI-driven operations are pushing governance beyond a compliance issue and into an operational priority Mar 12th 2026 9:21am, by João Freitas Galileo releases Agent Control, a centralized guardrails platform for enterprise AI agents Mar 11th 2026 12:48pm, by Steven J. Vaughan-Nichols "Self-healing" IT? HPE research explores how AI-trained models can catch silent infrastructure failures Mar 11th 2026 9:37am, by Jennifer Riggins How context rot drags down AI and LLM results for enterprises, and how to fix it Mar 9th 2026 9:00am, by Todd R. Weiss This simple infrastructure gap is holding back AI productivity Feb 22nd 2026 8:00am, by Charlotte Fleming Ramp’s Inspect shows closed-loop AI agents are software’s future Jan 29th 2026 11:00am, by Arjun Iyer QCon chat: Is agentic AI killing continuous integration? Jan 27th 2026 6:00am, by Joab Jackson Async Rust: Pinning demystified Jan 26th 2026 11:00am, by Anshul Gupta A security checklist for your React and Next.js apps Jan 26th 2026 7:00am, by Crystal Morin Runpod report: Qwen has overtaken Meta's Llama as the most-deployed self-hosted LLM Mar 12th 2026 6:00am, by Adrian Bridgwater Snowflake Cortex Code CLI adds dbt and Apache Airflow support for AI-powered data pipelines Mar 8th 2026 6:00am, by Jelani Harper Databases weren’t built for agent sprawl – SurrealDB wants to fix it Feb 24th 2026 2:07pm, by Paul Sawers Rising identity complexity: How CISOs can prevent it from becoming an attacker’s roadmap Feb 19th 2026 12:47pm, by Jay Reddy S3 is the new network: Rethinking data architecture for the cloud era Feb 2nd 2026 4:00am, by Max Liu One developer, team power: The future of AI-driven DevSecOps Mar 5th 2026 2:29pm, by Bryan Ross Observability platform migration guide: Prometheus, OpenTelemetry, and Fluent Bit Feb 26th 2026 7:28am, by Katie Greenley Most platform teams build products, but they don’t know it Feb 24th 2026 9:00am, by Oleg Danilyuk Why "automated" infrastructure might cost more than you think Feb 24th 2026 4:00am, by Justyn Roberts The essential shift every ITOps leader must make to survive an unrelenting stream of incidents Feb 19th 2026 1:46pm, by Ariel Russo Why is your Kubernetes cluster adding nodes when the dashboards look fine? Mar 8th 2026 8:10am, by Yasmin Rajabi Cloud repatriation is hard. Here's how to build a self-service developer platform that works. Mar 4th 2026 9:14am, by TNS Staff How WebAssembly plugins simplify Kubernetes extensibility Mar 3rd 2026 2:00pm, by B. Cameron Gain The agent pull request flood is here. If you run Istio, you’re halfway to solving it. Feb 26th 2026 10:00am, by Arjun Iyer Why your DIY Kubernetes stack won't survive the era of agentic AI Feb 26th 2026 4:00am, by Oren Penso "Self-healing" IT? HPE research explores how AI-trained models can catch silent infrastructure failures Mar 11th 2026 9:37am, by Jennifer Riggins Netdata is a seriously impressive server monitoring tool Feb 26th 2026 10:00am, by Jack Wallen Observability platform migration guide: Prometheus, OpenTelemetry, and Fluent Bit Feb 26th 2026 7:28am, by Katie Greenley OpenTelemetry roadmap: Sampling rates and collector improvements ahead Feb 24th 2026 11:00am, by B. Cameron Gain Prometheus and OpenTelemetry finally play nice Feb 19th 2026 10:00am, by B. Cameron Gain "Self-healing" IT? HPE research explores how AI-trained models can catch silent infrastructure failures Mar 11th 2026 9:37am, by Jennifer Riggins Cursor builds always-on agents to tackle developer task tedium Mar 9th 2026 8:05am, by Adrian Bridgwater OpenClaw is being called a security “Dumpster fire,” but there is a way to stay safe Feb 15th 2026 7:00am, by David Eastman Your RAG System is probably image-blind, but it doesn't have to be Feb 12th 2026 12:00pm, by Tushar Madaan and Kiran Matty How intelligent orchestration transforms software innovation Feb 12th 2026 8:00am, by Manav Khurana Why enterprise software development needs air traffic control Mar 4th 2026 2:35pm, by Emilio Salvador Why traditional ITOps is failing to keep up with the unique nature of AI incidents Mar 4th 2026 10:00am, by Kat Gaines Cloud repatriation is hard. Here's how to build a self-service developer platform that works. Mar 4th 2026 9:14am, by TNS Staff Why your DIY Kubernetes stack won't survive the era of agentic AI Feb 26th 2026 4:00am, by Oren Penso Most platform teams build products, but they don’t know it Feb 24th 2026 9:00am, by Oleg Danilyuk C++ Developer tools Go Java JavaScript Programming Languages Python Rust TypeScript Open source USearch library jumpstarts ScyllaDB vector search Feb 5th 2026 12:00pm, by Jelani Harper AWS WAF vs. Google Cloud Armor: A Multicloud Security Showdown Nov 25th 2025 10:00am, by Advait Patel Goodbye Dashboards: Agents Deliver Answers, Not Just Reports Nov 23rd 2025 9:00am, by Ketan Karkhanis Rust vs. C++: a Modern Take on Performance and Safety Oct 22nd 2025 2:00pm, by Zziwa Raymond Ian Building a Real-Time System Monitor in Rust Terminal Oct 15th 2025 7:05am, by Tinega Onchari Microsoft's VS Code team moved to weekly releases after 10 years of monthly — and credits AI for making it possible Mar 11th 2026 10:38am, by Darryl K. Taft JetBrains names the debt AI agents leave behind Mar 11th 2026 9:57am, by Darryl K. Taft Cursor builds always-on agents to tackle developer task tedium Mar 9th 2026 8:05am, by Adrian Bridgwater IT-Tools brings many useful developer tools into one convenient location Mar 6th 2026 12:00pm, by Jack Wallen OpenAI's Codex is now on Windows Mar 4th 2026 11:01am, by Frederic Lardinois Go Experts: 'I Don't Want to Maintain AI-Generated Code' Sep 28th 2025 6:00am, by David Cassel How To Run Kubernetes Commands in Go: Steps and Best Practices Jun 27th 2025 8:00am, by Sunny Yadav Prepare Your Mac for Go Development Apr 12th 2025 7:00am, by Damon M. Garn Pagoda: A Web Development Starter Kit for Go Programmers Mar 19th 2025 6:10am, by Loraine Lawson Microsoft TypeScript Devs Explain Why They Chose Go Over Rust, C# Mar 18th 2025 7:00am, by David Cassel 62% of enterprises now use Java to power AI apps Feb 10th 2026 12:58pm, by Darryl K. Taft BellSoft bets Java expertise can beat hardened container wave Jan 26th 2026 3:00pm, by Darryl K. Taft Java Developers Get Multiple Paths To Building AI Agents Dec 26th 2025 7:02am, by Darryl K. Taft Your Enterprise AI Strategy Must Start With Java, Not Python Dec 22nd 2025 1:00pm, by Michael Coté Why Bloomberg Chose Vendor-Neutral Java Over Big Tech Oct 2nd 2025 5:00pm, by Darryl K. Taft WebAssembly is everywhere. Here's how it works Feb 25th 2026 11:00am, by Jessica Wachtel Wasm vs. JavaScript: Who wins at a million rows? Feb 22nd 2026 6:00am, by Jessica Wachtel Arcjet reaches v1.0, promises stable security for JavaScript apps Feb 14th 2026 7:00am, by Darryl K. Taft How WebAssembly and Web Workers prevent UI freezes Feb 7th 2026 9:00am, by Jessica Wachtel Mastra empowers web devs to build AI agents in TypeScript Jan 28th 2026 11:00am, by Loraine Lawson Nearly half of all companies now use Rust in production, survey finds Mar 6th 2026 10:45am, by Darryl K. Taft Statistical language R is making a comeback against Python Feb 12th 2026 2:57pm, by Darryl K. Taft 62% of enterprises now use Java to power AI apps Feb 10th 2026 12:58pm, by Darryl K. Taft Memory-Safe Jule language emerges as C/C++ alternative Feb 7th 2026 8:00am, by Darryl K. Taft The 'weird' things that happened when Clickhouse replaced C++ with Rust Feb 4th 2026 7:26am, by B. Cameron Gain Python virtual environments: isolation without the chaos Feb 16th 2026 7:00am, by Jessica Wachtel Statistical language R is making a comeback against Python Feb 12th 2026 2:57pm, by Darryl K. Taft Arcjet's Python SDK Embeds Security in Code Jan 16th 2026 2:00pm, by Darryl K. Taft 2025: The Year of the Return of the Ada Programming Language? Jan 14th 2026 4:00pm, by Darryl K. Taft Experts Hail Anthropic's $1.5M Python Security Commitment Jan 14th 2026 3:00pm, by Darryl K. Taft Nearly half of all companies now use Rust in production, survey finds Mar 6th 2026 10:45am, by Darryl K. Taft Wasm vs. JavaScript: Who wins at a million rows? Feb 22nd 2026 6:00am, by Jessica Wachtel Open source USearch library jumpstarts ScyllaDB vector search Feb 5th 2026 12:00pm, by Jelani Harper The 'weird' things that happened when Clickhouse replaced C++ with Rust Feb 4th 2026 7:26am, by B. Cameron Gain Async Rust: Pinning demystified Jan 26th 2026 11:00am, by Anshul Gupta Mastra empowers web devs to build AI agents in TypeScript Jan 28th 2026 11:00am, by Loraine Lawson Inferno Vet Creates Frontend Framework Built With AI in Mind Dec 10th 2025 11:00am, by Loraine Lawson JavaScript Utility Library Lodash Changing Governance Model Nov 1st 2025 7:00am, by Loraine Lawson Microsoft TypeScript Devs Explain Why They Chose Go Over Rust, C# Mar 18th 2025 7:00am, by David Cassel Go Power: Microsoft's Bold Bet on Faster TypeScript Tools Mar 12th 2025 1:00pm, by Darryl K. Taft and Loraine Lawson 2026-03-12 13:22:11 Before you let AI agents loose, you’d better know what they’re capable of sponsor-naftiko,sponsored-post, AI Engineering / AI Operations / API Management Before you let AI agents loose, you’d better know what they’re capable of Learn how to manage agentic AI risk through systems thinking, contract testing, and sandboxes to ensure autonomous systems stay predictable. Mar 12th, 2026 1:22pm by Charles Humble barsrsind for Unsplash+ Naftiko sponsored this post. For enterprises, agentic AI systems potentially allow staff responsibilities to shift from execution to judgment, oversight, and strategy. This creates new opportunities, but is fraught with risk: Loss of human oversight and control : Agentic systems can take sequences of autonomous actions, such as browsing the web, executing code, calling APIs, and managing files, often with minimal human checkpoints. This creates compounding risk: an early mistake can cascade into significant damage before anyone notices. Enterprises may struggle to audit what actions were taken, when, and why, making accountability and remediation difficult. Security and prompt injection vulnerabilities : Agents that consume external data (web pages, emails, documents) are susceptible to prompt injection attacks, where malicious content in the environment hijacks the agent’s behavior. In an enterprise context, a compromised agent with access to internal systems, databases, or APIs could exfiltrate sensitive data, escalate privileges, or execute destructive actions while appearing to operate normally. Unpredictable and hard-to-reverse actions : Unlike chatbots that only generate text, agentic systems take real-world actions: sending emails, modifying records, making purchases, and deploying code. Mistakes may be difficult or impossible to undo. The combination of broad tool access, long task horizons, and ambiguous instructions creates scenarios in which an agent pursues a goal in a technically correct but operationally catastrophic way. In addition, agentic use can lead to over-reliance on agent judgment for sensitive decisions, data privacy risks from agents ingesting confidential context, and third-party supply chain risk when agents call external services. “An early mistake can cascade into significant damage before anyone notices. Enterprises may struggle to audit what, when, and why actions were taken.” Because agentic AI is so new, we don’t have patterns or best practices to draw on. Instead, as IT professionals, we need to figure out the risk mitigations among us, and hopefully share what we’re learning publicly as we go. Kin Lane , co-founder and chief community officer (CCO) for the open-source API company Naftiko , sees the mitigation of agentic risk in systems-thinking terms. His broad thesis is that testing and mocking is how an enterprise can confidently and safely prepare for agentic systems. Behavior is the specification One of the most well-known heuristics in systems thinking was coined by the late Stafford Beer , a British theorist, consultant, and professor at Manchester Business School. He frequently used the phrase, “The purpose of a system is what it does” — meaning that a system often functions in a way that is at odds with the intentions of those who design, operate, and promote it. “There is, after all,” Beer observed, “no point in claiming that the purpose of a system is to do what it constantly fails to do.” If a system’s purpose is indeed revealed through its behavior rather than its design, then understanding what Donella Meadows refers to as the “and” — the relationships between components — requires being able to observe the system reliably. “The purpose of a system is what it does. There is, after all, no point in claiming that the purpose of a system is to do what it constantly fails to do.” – Stafford Beer Observability tools like Honeycomb use distributed tracing and high-cardinality event data to provide insights into system behavior in production, allowing engineers to query and explore arbitrary dimensions of telemetry (spans, traces, and structured log fields) in real time. But for agentic systems, we also need confidence ahead of production. Lane believes that can be found in the test suite; testing, he says, allows you to intentionally shape behavior and, in turn, the future of an API. “People tend to see API testing as making sure it’s doing what it should,” he tells The New Stack . “But the combination of having a strong sandbox and using contract testing to get a strong feedback loop with your consumers allows iterating on what the future holds. Companies with good testing practice are much better at both inventing and dealing with the future. And they’re able to bring things to life in reliable, reproducible ways that producers and consumers agree on.” It is this mentality that he brings to the work of defining business capabilities — essentially, discrete, composable units that describe what a system can do for a business (like ‘process payments’ or ‘manage inventory’) — for his clients. He builds sandboxes and mocks, thereby allowing their AI agents a safe place to play. For his approach to work, the mocks need to be accurate representations of reality. “I should be able to switch the URL of the mock to production and reliably get live responses with the same structure,” he says. “That acts as a test; can I move from synthetic mock to production without breaking?” Done well, mocks and contract tests reinforce each other. Fragile, unrealistic mocks indicate poor or nonexistent contract testing, not a failure of mocking itself. Without shared visibility between API producers and consumers, mocks and real APIs evolve independently, and that can break trust. When providers publish formal specs and examples, and use them in their own contract tests, everyone stays aligned. To achieve this, Lane prefers open-source tooling, specifically Microcks and OpenAPI , with Bruno for scripting. Shared mocks, shared reality Microcks is an open-source API mocking and testing platform built in Java that has been running for ten years and was donated to the Cloud Native Computing Foundation (CNCF) more than three years ago. Built for enterprise scale, it is language agnostic, supporting REST/OpenAPI, AsyncAPI, Kafka, MQTT, WebSockets, gRPC, GraphQL, and more. That breadth is invaluable since enterprise environments typically layer multiple API standards — which I referred to in a previous article in this series as API sediment — rather than standardize on one. Without a tool like Microcks, “You end up with a dedicated tool for each type of protocol or spec, which is a nightmare,” Microcks co-founder, Yacine Kheddache , told me. “For mocking purposes, developers are creating their own dummy mocks that are not representative of the real business service.” The platform is built around a contract-first philosophy. A primary artefact (such as an OpenAPI spec) is imported, and Microcks uses it to auto-generate mock endpoints with no code required. To avoid bloating the primary spec with hundreds of examples, Microcks supports secondary artefacts, additional example sets sourced from Postman collections, HTTP archive recordings, an AI copilot, or hand-crafted YAML files. Bundling primary and secondary artefacts generates on-the-fly sandboxes that adopters have described as “sandbox as a service.” A notable aspect is how Microcks encourages organization-wide collaboration. “It’s not only developers who are maintaining throwaway local mocks for unit tests. Teams build shared, versioned example datasets that everybody globally around the API service contributes to and reuses.” Kheddache says. This allows parallel development across microservices teams: developers mock their dependencies, work independently, and simply swap in the real service when it becomes available. Large adopters, including Amadeus , that use Microcks to shift-left their mocking and contract testing approach have reported significant gains in development speed . Microcks at BNP Paribas The collaborative, shared ownership model Kheddache describes isn’t theoretical. At BNP Paribas, 32 squads across the French retail banking division are now using Microcks , with over 500 developers and testers actively on the platform, and more than 2.5 million API calls processed through it every week. BNP had a massive legacy mainframe sitting at the heart of core banking operations, which every team had to interact with directly whenever they needed to build or test against it. Slow, expensive, and placing unnecessary strain on infrastructure that costs serious money to run. By mocking those mainframe-backed APIs in Microcks, BNP’s teams could develop and test in parallel without touching the mainframe, and only connect to real services when genuinely necessary. The result, according to the published case study, was that development and testing cycles were cut by two-thirds. There was also an unexpected bonus: significantly reduced mainframe load translated directly into lower energy consumption, a meaningful contribution to the bank’s sustainability targets, and not the kind of outcome you’d typically associate with a testing tool. “The ability to deploy mock sandboxes everywhere, ready to use,” the BNP team noted, “was central to making that scale work.” It’s the kind of outcome Kheddache had in mind when he described the broader promise of the platform. “When they adopt our approach, the wins are incredible,” he told me. “They are able to speed up development, mock all the dependencies, and everybody can work in parallel.” For contract testing, Microcks can act as an API client, firing requests at a real endpoint and verifying it remains compliant with its contract across multiple versions. This is useful for catching breaking changes and certifying backward compatibility. Microcks has also shifted toward supporting individual developers, not just centralized platform teams. A lightweight native binary (compiled via GraalVM) starts in under 200 milliseconds, and Testcontainers bindings exist for Java, Node, Go, and .NET (the last contributed by AXA Insurance). Developers can now run full integration tests locally, using the same shared example datasets as the central environment, closing the ‘works on my laptop’ gap. Each Microcks simulation endpoint now also exposes a Model Context Protocol (MCP) link, making mock APIs accessible to LLMs and AI agents as tools. A second, currently unnamed, project is in development: a runtime proxy that translates OpenAPI, GraphQL, and gRPC specs into MCP servers for production use, with secret management and custom tool shaping to make APIs more reliably usable by language models. AI fragments the feedback loop Typically, in the context of testing and mocking, the more stakeholders using the sandbox, the more likely you are to achieve a comprehensive and reliable mock. Add generative AI into the mix, however, and as with the act of programming itself , we are still trying to understand what a healthy feedback loop looks like. “Historically, healthy feedback came from having all the stakeholders in a workspace, a repo, or a meeting in the same room on the whiteboard. But AI further fragments things, so I’ve got to rethink that, and I don’t have a good answer yet,” Lane says. “That menu of internal and external APIs is what you’re capable of as a company. If you don’t have that in a catalogue, a sandbox, or a registry, so you can test, play, and understand it, your teams aren’t going to know what you’re capable of.” – Kin Lane He is, however, certain that you should have mocks, not only for what you are producing but also for what you are consuming. “That menu of internal and external APIs is what you’re capable of as a company. If you don’t have that in a catalogue, a sandbox, or a registry, so you can test, play, and understand it, your teams aren’t going to know what you’re capable of.” Tools like Microcks make contract-driven mocking easy, but collaboration and shared ownership are what make it work. As agentic systems take on more autonomy, that shared understanding of what a system can, and should, do becomes the foundation on which everything else rests. Embrace your API legacy, integrate your AI future. Naftiko turns API sprawl into a governed capability fabric that teams can depend on using open source, community-driven, and built for engineers who want to move fast while staying aligned with their business. Learn More The latest from Naftiko Hear more from our sponsor Submit TRENDING STORIES YOUTUBE.COM/THENEWSTACK Tech moves fast, don't miss an episode. Subscribe to our YouTube channel to stream all our podcasts, interviews, demos, and more. SUBSCRIBE Group Created with Sketch. Charles Humble is a former software engineer, architect and CTO who has worked as a senior leader and executive of both technology and content groups. He was InfoQ’s editor-in-chief from 2014-2020, and was chief editor for Container Solutions from 2020-2023.... Read more from Charles Humble Naftiko sponsored this post. SHARE THIS STORY --> TRENDING STORIES TNS owner Insight Partners is an investor in: Postman. SHARE THIS STORY --> TRENDING STORIES TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day. SUBSCRIBE The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy . Naftiko Signals analyzes enterprise technology investments to help companies determine the optimal investment level.