Hugging Face Storage Buckets Storage Bucket

huggingface.co · tamnd · 16 hours ago · view on HN · tool
quality 2/10 · low quality
0 net
AI Summary

Hugging Face introduces Storage Buckets, an AI-native object storage service using Xet's content-defined chunking for deduplication, offering per-TB pricing with built-in CDN and designed to streamline ML workflows without Git overhead.

Entities
Hugging Face Xet AWS S3 Backblaze Overdrive GCP AWS
Storage - Hugging Face Hugging Face Storage Buckets Storage Buckets AI-native object storage designed for scale, speed, and team workflows. → Create a Bucket or Get a Storage package Storage Storage built for AI teams Store models, datasets, and artifacts with simple per-TB pricing. Xet deduplication. Included CDN. No git overhead. Per-TB pricing with built-in CDN and deduplication speedups. No Git constraints: commit-free sync and fast object updates. Designed for ML workflows: datasets, checkpoints, model artifacts. terminal - bash # Create a storage bucket $ hf buckets create acme-corp/training-data ✓ Bucket created: hf://buckets/acme-corp/training-data ✓ Visibility: private · Region: us-east-1 # Sync training data to the bucket $ hf sync ./checkpoints/ hf://buckets/acme-corp/training-data Scanning local files... 12,847 files ( 2.4 TB ) Xet dedup: 62% deduplicated : uploading 912 GB (saved 1.5 TB) ████████████████████ █████ 78% 714/912 GB · 2.1 GB/s · ETA 1m 34s Xet Technology Next-gen large-scale storage for AI Xet uses content-defined chunking to break files into byte-level chunks and deduplicates across your entire bucket. When you retrain a model and only 5% of weights change, only that 5% is re-uploaded. Raw + processed dataset: stored once, billed once* 4x less data per upload, verified with real-world workloads *Requires Enterprise or Enterprise Plus plan Traditional S3 Upload 8 / 8 chunks uploaded VS XET Deduplicated Upload 1 / 8 chunks uploaded Gray = already stored · Purple = only the changed chunk Pricing Transparent, volume-based pricing Simple per-TB pricing that scales with usage. Egress and CDN are included at no extra cost. AWS S3 $23 Backblaze Overdrive $15 HF Hub $8–12 Base $12 /TB/mo Public repositories $18 /TB/mo Private repositories 50TB+ 20% off $10 /TB/mo Public repositories $16 /TB/mo Private repositories 200TB+ 25% off $9 /TB/mo Public repositories $14 /TB/mo Private repositories 500TB+ 33% off $8 /TB/mo Public repositories $12 /TB/mo Private repositories Data Storage Assemble training data at any scale Pour raw data from every source into a single bucket: crawls, annotations, synthetic outputs, partner datasets. No git overhead, no commit queues, no file-count limits. When training begins, your data is already there, streamed to GPUs via the included CDN. Immediate availability on upload, no queued commits Batch API processes thousands of files in a single call Raw + processed datasets with dedup = no double billing* *Requires Enterprise or Enterprise Plus plan crawl-2026-jan/ 48 TB · 2.1M files synced annotations-v3/ 12 TB · 890K files synced synthetic-pairs/ 6 TB · 340K files 75% Xet dedup: 66 TB stored → billed for 41 TB* CDN Built-in CDN for blazing fast access Every bucket includes a CDN. Warm localized cache close to where you compute for ultra fast streaming and downloads. Egress is included up to a generous 8:1 ratio of your total storage. Pre-warm cache in any cloud region you need Our CDN is deployed inside GCP and AWS networks Egress included up to 8:1 your storage More providers coming soon US-EAST EU-WEST ASIA SA-EAST Coding Agents Give your coding agents persistent storage Coding agents run in ephemeral environments, but their outputs shouldn't vanish. Checkpoints, benchmark results, generated datasets: one hf sync command in your agent's bash tool is all it takes. Pre-warmed CDN and no git overhead for fast reads and writes Persist artifacts across ephemeral CI runs and terminal sessions Install the official HF CLI skill and your agent knows every command terminal - bash # Agent creates a bucket and syncs experiment outputs ● Bash( hf buckets create training-artifacts-v2 ) └ Bucket created: hf://buckets/acme/training-artifacts-v2 ● Bash( hf buckets sync ./experiment_outputs/ hf://buckets/acme/training-artifacts-v2 ) └ Sync plan: ./experiment_outputs/ → hf://buckets/acme/training-artifacts-v2 Uploads: 16 Downloads: 0 ● All done. Bucket acme/training-artifacts-v2 synced (14.3 GB total): - 3 checkpoints (model.safetensors, optimizer.pt, config.json) - 6 parquet shards (data/train/ and data/eval/) - 1 training log Bucket is live at hf://buckets/acme/training-artifacts-v2 Get started with HF Storage Buckets HF Storage Buckets Start with buckets, sync your AI data, and unlock object storage built for ML workflows. → Create a Bucket or Get a Storage package Need this for your organization? Need dedicated governance and shared quotas? A Hugging Face Enterprise plan includes storage access at scale.