🔹 True Single-GPU Extreme Speed ⚡️ No need to rely on traditional workarounds like KV-cache, quantization, sparse/linear attention, or TinyVAE. Helios hits an end-to-end 19.5 FPS on a single H100!
Training is also highly accessible: an 80GB VRAM can fit four 14B models.
🔹 Solving Long-Video "Drift" from the Core 🎥 Tired of visual drift and repetitive loops? We ditched traditional hacks (like error banks, self-forcing, or keyframe sampling).
Instead, our innovative training strategy simulates & eliminates drift directly, keeping minute-long videos incredibly coherent with stunning quality. ✨
🔹 3 Model Variants for Full Coverage 🛠️ With a unified architecture natively supporting T2V, I2V, and V2V, we are open-sourcing 3 flavors:
1️⃣ Base: Single-stage denoising for extreme high-fidelity. 2️⃣ Mid: Pyramid denoising + CFG-Zero for the perfect balance of quality & throughput. 3️⃣ Distilled: Adversarial Distillation (DMD) for ultra-fast, few-step generation.
🔹 Day-0 Ecosystem Ready 🌍 We wanted deployment to be a breeze from the second we launched. Helios drops with comprehensive Day-0 hardware and framework support:
Let's keep the momentum for small models. I just published dot. It's the first pretrained causal model that is trained on math/symbols rather than english. The goal is to get an agnostic fewshot meta learner that learns from reality itself instead of language.
It's already decent at some tasks, with next version coming in a few weeks.
Nvidia is on a roll lately. Nemotron 3 Nano is my new fav local model, but here's the real flex: they published the entire evaluation setup. Configs, prompts, logs, all of it. This is how you do open models 🔥
Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.
Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.
Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.
Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.
now you can use google's magenta-realtime model to generate 48k samples based on your input audio (or other model outputs...there's 4 to play with now).
just duplicate my hf space, turn on an L4/L40s and throw the url into the plugin.
i've got a few finetunes you can switch to as well. or you can push your finetune to the hub and play around.
the space: thecollabagepatch/magenta-retry (you can also use the html web tester to play around with realtime generation on the L40s)
Multilingual Tokenization Showdown Analyzing 12 LLM Tokenizers Across 204 Languages.
First, I've created a dataset with Wikipedia's "Cat" article text in 272 languages: Norod78/WikiCat-Multilingual
For each language entry with at least 100 words, I tokenized the text using 12 tokenizers and calculated the "Characters per token" ratio and "Word per token" ratio. The higher this ratio is, the more information each token represents on average for that language (and perhaps allowing the llm to potentially learn more per-parameter if trained on a dataset of that language).
I hope I interpreted the results correctly, I've made the code available on GitHub so you can re-create the raw results jsonl with this repo: https://github.com/Norod/wikicat-tokenizer-eval
The first project, as far as I know, that focuses purely on few-shot prompting results rather than zero-shot like usually done with decoder-only transformer models. This model excels at few-shot tasks compared to most 0.6b and even bigger models. It also outperforms the base model on some popular language modeling benchmarks.