Activity Feed

AI & ML interests

None defined yet.

Recent Activity

BestWishYsh 
posted an update 17 days ago
view post
Post
3320
🚀 Introducing Helios: a 14B real-time long-video generation model!

It’s completely wild—faster than 1.3B models and achieves this without using self-forcing. Welcome to the new era of video generation! 😎👇

💻 Code: https://github.com/PKU-YuanGroup/Helios
🏠 Page: https://pku-yuangroup.github.io/Helios-Page
📄 Paper: Helios: Real Real-Time Long Video Generation Model (2603.04379)

🔹 True Single-GPU Extreme Speed ⚡️
No need to rely on traditional workarounds like KV-cache, quantization, sparse/linear attention, or TinyVAE. Helios hits an end-to-end 19.5 FPS on a single H100!

Training is also highly accessible: an 80GB VRAM can fit four 14B models.

🔹 Solving Long-Video "Drift" from the Core 🎥
Tired of visual drift and repetitive loops? We ditched traditional hacks (like error banks, self-forcing, or keyframe sampling).

Instead, our innovative training strategy simulates & eliminates drift directly, keeping minute-long videos incredibly coherent with stunning quality. ✨

🔹 3 Model Variants for Full Coverage 🛠️
With a unified architecture natively supporting T2V, I2V, and V2V, we are open-sourcing 3 flavors:

1️⃣ Base: Single-stage denoising for extreme high-fidelity.
2️⃣ Mid: Pyramid denoising + CFG-Zero for the perfect balance of quality & throughput.
3️⃣ Distilled: Adversarial Distillation (DMD) for ultra-fast, few-step generation.

🔹 Day-0 Ecosystem Ready 🌍
We wanted deployment to be a breeze from the second we launched. Helios drops with comprehensive Day-0 hardware and framework support:

✅ Huawei Ascend-NPU
✅ HuggingFace Diffusers
✅ vLLM-Omni
✅ SGLang-Diffusion

Try it out and let us know what you think!
  • 6 replies
·
appvoid 
posted an update 20 days ago
view post
Post
2485
Let's keep the momentum for small models. I just published dot. It's the first pretrained causal model that is trained on math/symbols rather than english. The goal is to get an agnostic fewshot meta learner that learns from reality itself instead of language.

It's already decent at some tasks, with next version coming in a few weeks.


appvoid/dot
  • 5 replies
·
appvoid 
posted an update 21 days ago
view post
Post
235
Are you ready for some ●s? Tomorrow will be a good day.
  • 4 replies
·
appvoid 
posted an update 28 days ago
view post
Post
911
granite-4.0-350m, rwkv7-g1d-0.4b and LFM2-350M are currently the best sub 0.5b models currently for fewshot, simple language tasks

no one is saying this:

if you need the absolute speed + small size + quality, granite 350m is the current king
  • 3 replies
·
victor 
posted an update about 2 months ago
view post
Post
1708
Interesting article: use Claude Code to help open models write CUDA kernels (for eg) by turning CC traces into Skills. They made a library out of it 👀

https://huggingface.co/blog/upskill
victor 
posted an update 3 months ago
KingNish 
posted an update 3 months ago
view post
Post
3280
Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1
KingNish 
posted an update 3 months ago
thecollabagepatch 
posted an update 4 months ago
view post
Post
444
hey musicians

hf continues to make the anti-suno device possible with gary4juce, the VST for your DAW that doesn't try to replace you.

v2 just released. https://thepatch.gumroad.com/l/gary4juce (pay what you want)

now you can use google's magenta-realtime model to generate 48k samples based on your input audio (or other model outputs...there's 4 to play with now).

just duplicate my hf space, turn on an L4/L40s and throw the url into the plugin.

i've got a few finetunes you can switch to as well. or you can push your finetune to the hub and play around.

the space: thecollabagepatch/magenta-retry (you can also use the html web tester to play around with realtime generation on the L40s)
  • 13 replies
·
appvoid 
posted an update 5 months ago
view post
Post
330
What's the best model out there below 700m parameters that is good at few shot tasks? I want to test it against arco-3
Norod78 
posted an update 5 months ago
view post
Post
1833
Multilingual Tokenization Showdown
Analyzing 12 LLM Tokenizers Across 204 Languages.

First, I've created a dataset with Wikipedia's "Cat" article text in 272 languages:
Norod78/WikiCat-Multilingual

For each language entry with at least 100 words, I tokenized the text using 12 tokenizers and calculated the "Characters per token" ratio and "Word per token" ratio. The higher this ratio is, the more information each token represents on average for that language (and perhaps allowing the llm to potentially learn more per-parameter if trained on a dataset of that language).

You can see a slideshow summary of the results here:
https://norod.github.io/wikicat-tokenizer-eval/tokenizer-slideshow.html

I hope I interpreted the results correctly, I've made the code available on GitHub so you can re-create the raw results jsonl with this repo:
https://github.com/Norod/wikicat-tokenizer-eval

Post on X:
https://x.com/Norod78/status/1984366900550266999

appvoid 
posted an update 5 months ago
view post
Post
278
Introducing arco-3

The first project, as far as I know, that focuses purely on few-shot prompting results rather than zero-shot like usually done with decoder-only transformer models. This model excels at few-shot tasks compared to most 0.6b and even bigger models. It also outperforms the base model on some popular language modeling benchmarks.

appvoid/arco-3

Try it yourself!