LLMs: Using a single Unix-style tool instead of multiple tools/function calling

old.reddit.com · drtse4 · 13 hours ago · view on HN · research
quality 6/10 · good
0 net
AI Summary

A former backend lead at Manus proposes replacing traditional function-calling in LLM agents with a single Unix-style run(command="...") tool that leverages pipes and shell operators, arguing that LLMs are naturally aligned with CLI patterns they've seen extensively in training data and that this approach reduces cognitive load on tool selection while enabling composition.

Entities
Manus Meta Pinix agent-clip LocalLLaMA MorroHsu
I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. : LocalLLaMA jump to content my subreddits edit subscriptions popular - all - users | AskReddit - pics - funny - movies - worldnews - news - todayilearned - nottheonion - explainlikeimfive - mildlyinteresting - DIY - videos - OldSchoolCool - europe - TwoXChromosomes - tifu - Music - books - LifeProTips - dataisbeautiful - aww - science - space - Showerthoughts - askscience - Jokes - Art - IAmA - Futurology - sports - UpliftingNews - food - nosleep - creepy - history - gifs - InternetIsBeautiful - GetMotivated - gadgets - announcements - de_IAmA - WritingPrompts - philosophy - Documentaries - Austria - EarthPorn - photoshopbattles - listentothis - blog more » reddit.com LocalLLaMA comments other discussions (2) Want to join? Log in or sign up in seconds. limit my search to r/LocalLLaMA use the following search parameters to narrow your results: subreddit: subreddit find submissions in "subreddit" author: username find submissions by "username" site: example.com find submissions from "example.com" url: text search for "text" in url selftext: text search for "text" in self post contents self:yes (or self:no) include (or exclude) self posts nsfw:yes (or nsfw:no) include (or exclude) results marked as NSFW e.g. subreddit:aww site:imgur.com dog see the search faq for details. advanced search: by author, subreddit... this post was submitted on 12 Mar 2026 1,567 points (96% upvoted) shortlink: Submit a new link Submit a new text post LocalLLaMA join leave r/LocalLLaMA A subreddit to discuss about Llama, the family of large language models created by Meta AI. Subreddit rules Search by flair + Discussion + Tutorial | Guide + New Model + News + Resources + Other a community for 3 years MODERATORS message the mods 130 · 72 comments Announcing LocalLlama discord server & bot! 413 · 71 comments OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories · 29 comments Saw this somewhere on LinkedIn 😂 138 · 41 comments Omnicoder-9b SLAPS in Opencode 26 · 49 comments Is the 3090 still a good option? 35 · 12 comments Tenstorrent QuietBox 2 Brings RISC-V AI Inference to the Desktop 35 · 18 comments Rick Beato: "How AI Will Fail Like The Music Industry" (and why local LLMs will take over "commercial" ones) 323 · 88 comments Qwen3.5-9B is actually quite good for agentic coding 227 · 75 comments 0:25 llama.cpp + Brave search MCP - not gonna lie, it is pretty addictive · 1 comment The hidden gem of open-source embedding models (text+image+audio): LCO Embedding Welcome to Reddit, the front page of the internet. Become a Redditor and join one of thousands of communities. × 1566 1567 1568 I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. Discussion ( self.LocalLLaMA ) submitted 1 day ago by MorroHsu English is not my first language. I wrote this in Chinese and translated it with AI help. The writing may have some AI flavor, but the design decisions, the production failures, and the thinking that distilled them into principles — those are mine. I was a backend lead at Manus before the Meta acquisition. I've spent the last 2 years building AI agents — first at Manus, then on my own open-source agent runtime ( Pinix ) and agent ( agent-clip ). Along the way I came to a conclusion that surprised me: A single run(command="...") tool with Unix-style commands outperforms a catalog of typed function calls. Here's what I learned. Why *nix Unix made a design decision 50 years ago: everything is a text stream. Programs don't exchange complex binary structures or share memory objects — they communicate through text pipes. Small tools each do one thing well, composed via | into powerful workflows. Programs describe themselves with --help , report success or failure with exit codes, and communicate errors through stderr. LLMs made an almost identical decision 50 years later: everything is tokens. They only understand text, only produce text. Their "thinking" is text, their "actions" are text, and the feedback they receive from the world must be text. These two decisions, made half a century apart from completely different starting points, converge on the same interface model. The text-based system Unix designed for human terminal operators — cat , grep , pipe , exit codes , man pages — isn't just "usable" by LLMs. It's a natural fit . When it comes to tool use, an LLM is essentially a terminal operator — one that's faster than any human and has already seen vast amounts of shell commands and CLI patterns in its training data. This is the core philosophy of the nix Agent: * don't invent a new tool interface. Take what Unix has proven over 50 years and hand it directly to the LLM.** Why a single run The single-tool hypothesis Most agent frameworks give LLMs a catalog of independent tools: tools: [search_web, read_file, write_file, run_code, send_email, ...] Before each call, the LLM must make a tool selection — which one? What parameters? The more tools you add, the harder the selection, and accuracy drops. Cognitive load is spent on "which tool?" instead of "what do I need to accomplish?" My approach: one run(command="...") tool, all capabilities exposed as CLI commands. run(command="cat notes.md") run(command="cat log.txt | grep ERROR | wc -l") run(command="see screenshot.png") run(command="memory search 'deployment issue'") run(command="clip sandbox bash 'python3 analyze.py'") The LLM still chooses which command to use, but this is fundamentally different from choosing among 15 tools with different schemas. Command selection is string composition within a unified namespace — function selection is context-switching between unrelated APIs. LLMs already speak CLI Why are CLI commands a better fit for LLMs than structured function calls? Because CLI is the densest tool-use pattern in LLM training data. Billions of lines on GitHub are full of: ```bash README install instructions pip install -r requirements.txt && python main.py CI/CD build scripts make build && make test && make deploy Stack Overflow solutions cat /var/log/syslog | grep "Out of memory" | tail -20 ``` I don't need to teach the LLM how to use CLI — it already knows. This familiarity is probabilistic and model-dependent, but in practice it's remarkably reliable across mainstream models. Compare two approaches to the same task: ``` Task: Read a log file, count the error lines Function-calling approach (3 tool calls): 1. read_file(path="/var/log/app.log") → returns entire file 2. search_text(text=, pattern="ERROR") → returns matching lines 3. count_lines(text=) → returns number CLI approach (1 tool call): run(command="cat /var/log/app.log | grep ERROR | wc -l") → "42" ``` One call replaces three. Not because of special optimization — but because Unix pipes natively support composition. Making pipes and chains work A single run isn't enough on its own. If run can only execute one command at a time, the LLM still needs multiple calls for composed tasks. So I make a chain parser ( parseChain ) in the command routing layer, supporting four Unix operators: | Pipe: stdout of previous command becomes stdin of next && And: execute next only if previous succeeded || Or: execute next only if previous failed ; Seq: execute next regardless of previous result With this mechanism, every tool call can be a complete workflow : ```bash One tool call: download → inspect curl -sL $URL -o data.csv && cat data.csv | head 5 One tool call: read → filter → sort → top 10 cat access.log | grep "500" | sort | head 10 One tool call: try A, fall back to B cat config.yaml || echo "config not found, using defaults" ``` N commands × 4 operators — the composition space grows dramatically. And to the LLM, it's just a string it already knows how to write. The command line is the LLM's native tool interface. Heuristic design: making CLI guide the agent Single-tool + CLI solves "what to use." But the agent still needs to know "how to use it." It can't Google. It can't ask a colleague. I use three progressive design techniques to make the CLI itself serve as the agent's navigation system. Technique 1: Progressive --help discovery A well-designed CLI tool doesn't require reading documentation — because --help tells you everything. I apply the same principle to the agent, structured as progressive disclosure : the agent doesn't need to load all documentation at once, but discovers details on-demand as it goes deeper. Level 0: Tool Description → command list injection The run tool's description is dynamically generated at the start of each conversation, listing all registered commands with one-line summaries: Available commands: cat — Read a text file. For images use 'see'. For binary use 'cat -b'. see — View an image (auto-attaches to vision) ls — List files in current topic write — Write file. Usage: write [content] or stdin grep — Filter lines matching a pattern (supports -i, -v, -c) memory — Search or manage memory clip — Operate external environments (sandboxes, services) ... The agent knows what's available from turn one, but doesn't need every parameter of every command — that would waste context. Note: There's an open design question here: injecting the full command list vs. on-demand discovery. As commands grow, the list itself consumes context budget. I'm still exploring the right balance. Ideas welcome. Level 1: command (no args) → usage When the agent is interested in a command, it just calls it. No arguments? The command returns its own usage: ``` → run(command="memory") [error] memory: usage: memory search|recent|store|facts|forget → run(command="clip") clip list — list available clips clip — show clip details and commands clip [args...] — invoke a command clip pull [name] — pull file from clip to local clip push — push local file to clip ``` Now the agent knows memory has five subcommands and clip supports list/pull/push. One call, no noise. Level 2: command subcommand (missing args) → specific parameters The agent decides to use memory search but isn't sure about the format? It drills down: ``` → run(command="memory search") [error] memory: usage: memory search [-t topic_id] [-k keyword] → run(command="clip sandbox") Clip: sandbox Commands: clip sandbox bash