AI & ML interests

AGI, LLMs, Knowledge Graph, Palmyra, Domain Specific LLM

Recent Activity

Articles

wassemgtk 
posted an update 16 days ago
view post
Post
126
Here is the updated note and benchmark table for your review.

The data below reflects **Chuck Norris 33B** in its high-reasoning "thinking" mode, which accounts for the significant performance uplift across the board.

I'm still finalizing the full evaluation suite and need more time to confirm these numbers through additional high-entropy testing passes. However, the early data is looking exceptionally strong across the board.

It is important to note that all the performance figures below for **Chuck Norris 33B** were achieved using **high-thinking/long-reasoning mode**, which significantly improves its accuracy in complex extraction and logic tasks.
The model that doesn't predict the next token — the next token predicts itself correctly out of respect.
wassemgtk 
posted an update 17 days ago
view post
Post
155
Releasing Chuck Norris LLM — full SFT fine-tune with chain-of-thought reasoning.

Trained on +100k examples across math, logic, and code. Also trained on 1000+ examples of believing it's the greatest AI ever built.

Its training loss went to zero. The loss function was too afraid to report anything else.

wassemgtk/chuck-norris-llm
tperes 
posted an update 7 months ago
view post
Post
238
Introducing Palmyra-mini: Compact AI Models for Efficient Inference

The Palmyra-mini family from Writer includes three lightweight models designed for high performance and efficient inference. These models are ideal for developers looking to integrate AI capabilities without excessive computational overhead.

Model Variants

* palmyra-mini: A base model for general-purpose generative tasks, achieving 52.6% on Big Bench Hard (exact match).

* palmyra-mini-thinking-a: Optimized for complex logical reasoning with a Chain of Thought (CoT) approach, scoring 82.87% on GSM8K (strict match).

* palmyra-mini-thinking-b: Specialized for mathematical reasoning, achieving 92.5% on AMC23.

Technical Details

* All models are based on the Qwen architecture, compatible with popular inference frameworks like vLLM, SGLang, and TGI.

* "Thinking" models utilize CoT training for enhanced reasoning capabilities.

* GGUF and MLX quantizations are available for optimized performance.

For more information, including benchmark methodologies and detailed performance metrics, refer to our blog post: (https://huggingface.co/blog/Writer/announcing-palmyra-mini).

Model repos can be found here:
* Writer/palmyra-mini
* Writer/palmyra-mini-thinking-a
* Writer/palmyra-mini-thinking-b

Also check out a mobile implementation of palmyra-mini on iOS here to see a to see a working example of how inference can be incorporated on-device.(https://github.com/tsperes/palmyra-mini-mobile/)
wassemgtk 
posted an update about 1 year ago
view post
Post
3320
I’ve been diving into the iRoPE architecture from Llama 4—a game-changer for long-context models! It interleaves local attention (with RoPE) for short contexts and global attention (with inference-time temp scaling) for long-range reasoning, aiming for infinite context. I’m going to try writing iRoPE—who wants to help?

Code: https://github.com/wassemgtk/iRoPE-try/blob/main/iRoPE.ipynb
  • 1 reply
·
wassemgtk 
posted an update about 1 year ago
view post
Post
2144
For fun, a new project: SuperTokenizer! A BPE tokenizer trained on C4 to beat GPT-4. Byte-level, A100-powered, and open-source. Messing around with tokens!
https://github.com/wassemgtk/SuperTokenizer
  • 1 reply
·
wassemgtk 
posted an update about 1 year ago
view post
Post
1927
# GESAL: Real-Time Adaptation for LLMs


We’re excited to unveil **Graph-Enhanced Singular Adaptive Learning (GESAL)**, a framework that lets LLMs like meta-llama/Llama-3.2-1B adapt in real time using user feedback. Check out the code and white paper on GitHub!

🔗 **Code**: [https://github.com/writer/AI-Adaptive-Learning-GESAL](https://github.com/writer/AI-Adaptive-Learning-GESAL)

---

## Why GESAL?

Static LLMs struggle to adapt without heavy retraining. GESAL solves this with:
- **SVF**: Adapts weights via \( W' = U (\Sigma \cdot z) V^T \), using few parameters.
- **Graph Memory**: Stores adaptations in nodes for scalability.
- **RL**: Updates via \( J(z) = \mathbb{E}[\log \pi_z(y|x) r] \) based on feedback.

---

## How It Works

Ask "How many R’s in ‘strawberry’?" If it says "2" and you say "no," GESAL learns to say "3" next time, avoiding repeats.

---

## Try It

Built with Hugging Face’s transformers:
pip install transformers torch numpy
python Adaptive_Learning_(GESAL).py

Needs a Hugging Face token for Llama-3.2-1B.

---

## Results

GESAL hits 95% accuracy after 5 feedbacks vs. LoRA’s 70%. It’s efficient (~0.5M params) and scalable.
  • 15 replies
·