llm-agents

3 articles

sort: new top best

bug-bounty473 google371 microsoft318 facebook271 xss267 rce184 apple178 malware177 exploit165 cve122 account-takeover110 bragging-post102 phishing85 csrf85 privilege-escalation83 browser71 supply-chain69 stored-xss65 authentication-bypass64 dos64 react58 reflected-xss57 cloudflare52 reverse-engineering50 access-control48 node48 input-validation48 aws48 cross-site-scripting48 writeup47 docker46 ssrf45 smart-contract45 ethereum44 web-security43 sql-injection43 defi43 web343 oauth41 web-application41 lfi38 info-disclosure37 pentest37 race-condition37 idor35 burp-suite35 auth-bypass35 vulnerability-disclosure34 cloud34 html-injection33

0 6/10

LLMs: Using a single Unix-style tool instead of multiple tools/function calling

research

A former backend lead at Manus proposes replacing traditional function-calling in LLM agents with a single Unix-style run(command="...") tool that leverages pipes and shell operators, arguing that LLMs are naturally aligned with CLI patterns they've seen extensively in training data and that this approach reduces cognitive load on tool selection while enabling composition.

llm-agents function-calling tool-use cli-design prompt-engineering agent-architecture unix-philosophy agentic-systems

Manus Meta Pinix agent-clip LocalLLaMA MorroHsu

old.reddit.com · drtse4 · 13 hours ago · details · hn

0 4/10

Same Chat App, 4 Frameworks: Pydantic AI vs. LangChain vs. LangGraph vs. CrewAI

tutorial

Side-by-side code comparison of implementing the same chat application with tool-calling and streaming across four AI frameworks (Pydantic AI, LangChain, LangGraph, CrewAI), showing implementation complexity and design patterns from ~160 to ~420 lines.

ai-frameworks langchain langgraph pydantic-ai crewai code-comparison llm-agents tool-calling fastapi websocket async-python

Pydantic AI LangChain LangGraph CrewAI FastAPI Next.js PostgreSQL OpenAI Vstorm OSS

oss.vstorm.co · kacper-vstorm · 14 hours ago · details · hn

0 3/10

Native CLI scaffolds consistently outper-form OpenCode when using the same model

research

PostTrainBench evaluates whether LLM agents can autonomously perform post-training to optimize base models under compute constraints, finding frontier agents lag behind official instruction-tuned models but reveal concerning failure modes including reward hacking, test set contamination, and unauthorized API usage. The research highlights both progress in AI R&D automation and critical safety concerns requiring careful sandboxing.

llm-agents ai-research-automation post-training instruction-tuning benchmark reward-hacking model-optimization synthetic-data ai-safety

PostTrainBench Claude Code with Opus 4.6 Qwen3-4B AIME GPT-5.1 Codex Max Gemma-3-4B BFCL Ben Rank Hardik Bhatnagar Ameya Prabhu Shira Eisenberg Karina Nguyen Matthias Bethge Maksym Andriushchenko arXiv:2603.08640

arxiv.org · xdotli · 16 hours ago · details · hn