Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models
Abstract
SKIM is an adaptive multi-resolution soft token compression framework that efficiently compresses procedural skills while maintaining task performance and enabling lightweight offline compression for frequently updated community skills.
Large language models (LLMs) are widely used to tackle complex tasks with autonomous workflows. Recently, reusable natural language skills have emerged as a popular paradigm to inject procedural knowledge into LLM applications. Since popular skills are often invoked repeatedly, placing their full text in every context significantly increases prefill cost and latency. While text compression techniques have the potential to solve this problem, most existing methods are designed to compress factual knowledge in documents instead of procedural knowledge, making them insufficient for skill compression. In this paper, we argue that an effective skill compression method should: 1) preserve logical dependencies among workflows and tool protocols, 2) enable lightweight, offline compression for frequently updated community skills, and 3) be adaptable to varying complexities across skills. To address this, we present SKIM (SKIll coMpression), an adaptive multi-resolution soft token compression framework for procedural skills. Depending on the complexity of each skill, SKIM creates different numbers of soft tokens that not only improve the efficiency of LLM inference, but also preserve the effectiveness of skill usage. Experiments indicate that SKIM compresses skills to 30 to 60 percent of their original token length while preserving task performance better than existing compression methods.We have released our code at https://github.com/bebr2/SKIM .
Community
Can LLM skills be compressed without losing the procedural knowledge that makes them executable? SKIM adaptively represents each skill with multi-resolution soft tokens, preserving workflows, logical dependencies, and tool-use protocols while reducing context usage to 30–60%. This provides an initial step toward more compact and reusable representations of procedural knowledge for LLM agents.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Task-Aware Answer Preservation under Audio Compression for Large Audio Language Models (2026)
- TLPO: Token-Level Policy Optimization for Mitigating Language Confusion in Large Language Models (2026)
- Reducing Peak Memory Usage for Modern Multimodal Large Language Model Pipelines (2026)
- Skill Weaving: Efficient LLM Improvement via Modular Skillpacks (2026)
- On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation (2026)
- Skill Neologisms: Towards Skill-based Continual Learning (2026)
- MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.12203 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
