2 84 29

Pham Van Linh

phamvanlinh143

AI & ML interests

OCR, AI, DL

Recent Activity

liked a dataset 2 days ago

llamaindex/ParseBench

liked a model 7 days ago

baidu/Qianfan-OCR

liked a model 7 days ago

tencent/HunyuanOCR

View all activity

Organizations

None yet

upvoted an article 11 days ago

Article

Welcome Gemma 4: Frontier multimodal intelligence on device

13 days ago

•

841

upvoted 5 articles about 1 month ago

Article

KV Cache from scratch in nanoVLM

Jun 4, 2025

•

115

Article

Unlocking Longer Generation with Key-Value Cache Quantization

May 16, 2024

•

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

Feb 20

•

501

Article

Continuous batching from first principles

Nov 25, 2025

•

356

Article

Mixture of Experts (MoEs) in Transformers

Feb 26

•

153

upvoted 3 articles 2 months ago

Article

2. Attention Optimizations: From Standard Attention to FlashAttention

Feb 9

•

Article

VLM-OCR Recipes on GPU Infrastructure

Jan 15

•

Article

My Journey Into Vision Models

Apr 12, 2025

•

upvoted 3 papers 2 months ago

Reinforcement Learning via Self-Distillation

Paper • 2601.20802 • Published Jan 28 • 43

Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 51

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 156

upvoted an article 2 months ago

Article

Performant local mixture-of-experts CPU inference with GPU acceleration in llama.cpp

Jan 30

•

upvoted 2 articles 3 months ago

Article

LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family

Jan 19

•

Article

The Optimal Architecture for Small Language Models

Dec 26, 2025

•

120

upvoted 5 articles 4 months ago

Article

Tokenization in Transformers v5: Simpler, Clearer, and More Modular

Dec 18, 2025

•

124

Article

Shrinking Giants: The Quantization Mathematics Making LLMs Accessible

May 3, 2025

•

Article

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Aug 17, 2022

•

129

Article

Everything You Need to Know about Knowledge Distillation

Mar 6, 2025

•

Article

Mastering Tensor Dimensions in Transformers

Jan 12, 2025

•

159

Pham Van Linh

AI & ML interests

Recent Activity

Organizations

phamvanlinh143's activity

Welcome Gemma 4: Frontier multimodal intelligence on device

KV Cache from scratch in nanoVLM

Unlocking Longer Generation with Key-Value Cache Quantization

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

Continuous batching from first principles

Mixture of Experts (MoEs) in Transformers

2. Attention Optimizations: From Standard Attention to FlashAttention

VLM-OCR Recipes on GPU Infrastructure

My Journey Into Vision Models

Performant local mixture-of-experts CPU inference with GPU acceleration in llama.cpp

LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family

The Optimal Architecture for Small Language Models

Tokenization in Transformers v5: Simpler, Clearer, and More Modular

Shrinking Giants: The Quantization Mathematics Making LLMs Accessible

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Everything You Need to Know about Knowledge Distillation

Mastering Tensor Dimensions in Transformers