pre-training

1 article

sort: new top best

bug-bounty458 google364 microsoft314 facebook272 xss250 apple179 malware176 rce165 exploit141 cve111 account-takeover104 bragging-post101 phishing84 privilege-escalation81 csrf81 supply-chain68 stored-xss65 authentication-bypass63 dos63 browser62 reflected-xss57 react54 cloudflare51 reverse-engineering49 cross-site-scripting48 input-validation48 aws48 docker47 node47 access-control47 smart-contract45 web343 ethereum43 sql-injection43 web-security42 ssrf42 defi42 web-application41 oauth37 writeup37 race-condition36 burp-suite35 vulnerability-disclosure34 info-disclosure34 idor34 html-injection33 cloud33 auth-bypass33 lfi32 smart-contract-vulnerability32

0 2/10

Training Language Models via Neural Cellular Automata

research

This paper proposes using Neural Cellular Automata (NCA)—synthetic data generated from learned transition rules on grids—as pre-training data for language models, achieving 6% perplexity gains and 1.6× faster convergence than natural language pre-training at equivalent scale. The key insight is that NCA sequences force models to develop in-context rule inference capabilities purely from structural patterns without semantic shortcuts, resulting in more transferable representations to downstream language tasks.

language-models synthetic-data pre-training neural-cellular-automata transformer in-context-learning research machine-learning

Neural Cellular Automata (NCA) OpenWebText OpenWebMath CodeParrot C4 GSM8K HumanEval BigBench-Lite Conway's Game of Life

hanseungwook.github.io · Anon84 · 17 hours ago · details · hn