Dev Mode Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

nielsr submitted a paper 3 days ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

nielsr submitted a paper 7 days ago

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

nielsr submitted a paper 8 days ago

Omnilingual MT: Machine Translation for 1,600 Languages

View all activity

Severian

posted an update about 22 hours ago

Post

1363

I’ve been working on a new mathematical approach to real-time video compositing and background removal, and I wanted to share a live demo.

Traditionally, real-time keyers either use 3D color-space bounding boxes (which struggle with semi-transparent hair and motion blur) or heavy Machine Learning models (which require massive GPU compute and often suffer from temporal "jitter" on the edges).

I wanted to see if I could solve this using purely deterministic math so it could run client-side in a standard browser.

The engine uses a custom mathematical framework I call CMT SRL SEFA. Instead of looking at raw color values or guessing semantics like an AI, it treats the video feed as complex-encoded sequences. It uses harmonic frequencies to map phase geometry and applies a "Stability Cost Function" to find the global minimum stability. In short: it isolates the foreground from the background by measuring signal complexity and structural contradictions.

Give it a try using your own messy plates and such. As I am not a VFX artist, I am curious to hear thoughts and what should be improved upon and made better

https://severian-cmt-sefa-realtime-vfx-keyer.hf.space/

1 reply

nielsr

submitted a paper to Daily Papers 3 days ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

Paper • 2603.19209 • Published 7 days ago • 4

nielsr

submitted a paper to Daily Papers 7 days ago

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

Paper • 2603.14482 • Published 11 days ago • 22

fffiloni

posted an update 8 days ago

Post

3905

I brought DALL·E mini back to life 🤖🎨

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model — still weird, still fun 😄

3 replies

nielsr

submitted a paper to Daily Papers 8 days ago

Omnilingual MT: Machine Translation for 1,600 Languages

Paper • 2603.16309 • Published 9 days ago • 19

fffiloni

posted an update 13 days ago

Post

456

A clearer demo for TADA (now multilingual) 🔊🌍

I improved the public demo for TADA — a generative framework for speech modeling via text–acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

• load the model
• prepare a reference voice (optionally with transcript or Whisper auto-transcription)
• generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

👉 fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)

nielsr

authored a paper 13 days ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 14 days ago • 63

xianbao

submitted a paper to Daily Papers 14 days ago

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Paper • 2603.10444 • Published 16 days ago • 10

nielsr

submitted a paper to Daily Papers about 1 month ago

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

Paper • 2602.17807 • Published Feb 19 • 6

Tonic

posted an update about 1 month ago

Post

3389

🤔 Who would win ?

- a fully subsidized ai lab
OR
- 3 random students named

kurakurai ?

demo : Tonic/fr-on-device

if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .

4 replies

mariagrandury

authored 2 papers about 1 month ago

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

Paper • 2510.10159 • Published Oct 11, 2025 • 3

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Paper • 2511.04703 • Published Nov 3, 2025 • 8

nielsr

submitted a paper to Daily Papers about 1 month ago

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Paper • 2602.11389 • Published Feb 11 • 7

Tonic

posted an update about 1 month ago

Post

3307

🙋🏻‍♂️hello my lovelies ,

it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.

repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw

you can also run it locally and see for yourself :

docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest

just a few quite minor details i'll take care of but i wanted to share here first

2 replies

1aurent

authored a paper about 1 month ago

Ministral 3

Paper • 2601.08584 • Published Jan 13 • 58

nielsr

submitted a paper to Daily Papers about 2 months ago

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

Paper • 2601.17950 • Published Jan 25 • 4

nielsr

submitted a paper to Daily Papers 2 months ago

TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration

Paper • 2601.04544 • Published Jan 8 • 6

nielsr

submitted a paper to Daily Papers 3 months ago

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published Dec 22, 2025 • 12

KingNish

posted an update 4 months ago

Post

3317

Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1

KingNish

posted an update 4 months ago

Post

2738

I tested Muon vs MuonClip vs Muon+AdamW for fine-tuning LLMs
Just published a blog on that, Read here 👉 https://huggingface.co/blog/KingNish/optimizer-part1

1 reply

AI & ML interests

Recent Activity

Team members 145

dev-mode-explorers's activity