Open to Collab

7 117 263

Muhammad Umair

umair894

AI & ML interests

Multimodal Reidentification | Feature Upscaling | Object Tracking |PhD UESTC

Recent Activity

upvoted a paper 3 days ago

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

liked a model 4 days ago

IDEA-Research/RexSeek-3B

liked a Space 6 days ago

pablovela5620/sam3d-body-rerun

View all activity

Organizations

upvoted a paper 3 days ago

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published 4 days ago • 150

liked a model 4 days ago

IDEA-Research/RexSeek-3B

Image-Text-to-Text • 4B • Updated Mar 14 • 413 • 10

liked a Space 6 days ago

SAM3D Body with Rerun

🧍

updated a Space 6 days ago

AI Brainiac's Digital Den 🧠

🐳

Explore AI research and connect with a PhD expert

published a Space 6 days ago

AI Brainiac's Digital Den 🧠

🐳

Explore AI research and connect with a PhD expert

New activity in facebook/sam3 9 days ago

Access to this repo has been rejected

🤝 1

#27 opened 9 days ago by

umair894

liked a Space 11 days ago

SAM3 Video Segmentation

🐠

Track and label objects in videos using text prompts or clicks

upvoted a paper 13 days ago

MedSAM3: Delving into Segment Anything with Medical Concepts

Paper • 2511.19046 • Published 14 days ago • 48

liked a Space 14 days ago

CUA - Computer Use Agent 2.0

🤖

116

Generate captions for images

upvoted a paper 14 days ago

Insights from the ICLR Peer Review and Rebuttal Process

Paper • 2511.15462 • Published 19 days ago • 6

upvoted a paper 15 days ago

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 18 days ago • 108

upvoted a paper 17 days ago

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published 18 days ago • 106

liked a Space 19 days ago

EdgeTAM

🚀

On-Device Track Anything Model

upvoted a paper 20 days ago

Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published 25 days ago • 93

liked a Space 28 days ago

Miragic Virtual Try On

👕

492

Try on complete outfits with our virtual try-on technology

upvoted a paper 28 days ago

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16 • 104

reacted to prithivMLmods's post with 🔥 28 days ago

Post

2868

Introducing Photo-Mate-v2, based on FLUX.1-Kontext-dev, for advanced image manipulation tasks. It supports transforming scenes into top-down/bottom-up perspectives, CAM-right/left-view and its reverse, as well as general kontext-specified object removal. Below is the list of demos and adapters.🔥🤗

➤ Spaces [Demo] : https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2

Kontext-Adapters :
✦ Kontext-Bottom-Up-View: prithivMLmods/Kontext-Bottom-Up-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Top-Down-View: prithivMLmods/Kontext-Top-Down-View
✦ Kontext-CAM-Left-View: prithivMLmods/Kontext-CAM-Left-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Unblur-Upscale: prithivMLmods/Kontext-Unblur-Upscale
✦ Kontext-0811-exp: prithivMLmods/Kontext-0811-exp

Photo-Mate Collection:
✦ Kontext CAM Angles: https://huggingface.co/collections/prithivMLmods/kontext-cam-angles
✦ i2i - Kontext (exp): https://huggingface.co/collections/prithivMLmods/i2i-kontext-exp
✦ LZO-1 (Lossless Zoom Operator): https://huggingface.co/collections/prithivMLmods/lzo-1-lossless-zoom-operator

Related-Apps:
✦ Photo-Mate [Version 1.0]: prithivMLmods/Photo-Mate-i2i
✦ Image Generation Apps [Collection]: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!
@prithivMLmods

upvoted a paper about 1 month ago

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 121

reacted to prithivMLmods's post with 🤗 about 1 month ago

Post

2586

A small blog post titled - Hall of Multimodal OCR VLMs and Demonstrations has been published on ↗️ https://huggingface.co/blog/prithivMLmods/multimodal-ocr-vlms on behalf of

strangervisionhf

It discusses the latest trends in OCR models, the multilingual support offered by modern OCR systems, their unique capabilities, OCR benchmark model comparisons, transformer-based implementations, and strategies for streamlining transformers compatibility.

reacted to Norod78's post with 🔥 about 1 month ago

Post

1673

Multilingual Tokenization Showdown
Analyzing 12 LLM Tokenizers Across 204 Languages.

First, I've created a dataset with Wikipedia's "Cat" article text in 272 languages:
Norod78/WikiCat-Multilingual

For each language entry with at least 100 words, I tokenized the text using 12 tokenizers and calculated the "Characters per token" ratio and "Word per token" ratio. The higher this ratio is, the more information each token represents on average for that language (and perhaps allowing the llm to potentially learn more per-parameter if trained on a dataset of that language).

You can see a slideshow summary of the results here:
https://norod.github.io/wikicat-tokenizer-eval/tokenizer-slideshow.html

I hope I interpreted the results correctly, I've made the code available on GitHub so you can re-create the raw results jsonl with this repo:
https://github.com/Norod/wikicat-tokenizer-eval

Post on X:
https://x.com/Norod78/status/1984366900550266999

Muhammad Umair

AI & ML interests

Recent Activity

Organizations

umair894's activity

SAM3D Body with Rerun

AI Brainiac's Digital Den 🧠

AI Brainiac's Digital Den 🧠

Access to this repo has been rejected

SAM3 Video Segmentation

CUA - Computer Use Agent 2.0

EdgeTAM

Miragic Virtual Try On