Model Card for OlympicCoder-7B

OlympicCoder-7B is a code model that achieves strong performance on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics.

Repository: https://github.com/huggingface/open-r1
Blog post: https://huggingface.co/blog/open-r1/update-3

Model description

Model type: A 7B parameter model fine-tuned on a decontaminated version of the codeforces dataset.
Language(s) (NLP): Primarily English
License: apache-2.0
Finetuned from model: Qwen/Qwen2.5-Coder-7B-Instruct

Evaluation

We compare the performance of OlympicCoder models on two main benchmarks for competitive coding:

IOI'2024: 6 very challenging problems from the 2024 International Olympiad in Informatics. Models are allowed up to 50 submissions per problem.
LiveCodeBench: Python programming problems source from platforms like CodeForces and LeetCoder. We use the v4_v5 subset of livecodebench/code_generation_lite, which corresponds to 268 problems. We use lighteval to evaluate models on LiveCodeBench using the sampling parameters described here.

The OlympicCoder models were post-trained exclusively on C++ solutions generated by DeepSeek-R1. As a result the performance on LiveCodeBench should be considered to be partially out-of-domain, since this expects models to output solutions in Python.

IOI'24

LiveCodeBench

Usage

Here's how you can run the model using the pipeline() function from 🤗 Transformers:

# pip install transformers
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="open-r1/OlympicCoder-7B", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {"role": "user", "content": "Write a python program to calculate the 10th Fibonacci number"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=8000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
#<|im_start|>user
#Write a python program to calculate the 10th fibonacci number<|im_end|>
#<|im_start|>assistant
#<think>Okay, I need to write a Python program that calculates the 10th Fibonacci number. Hmm, the Fibonacci sequence starts with 0 and 1. Each subsequent number is the sum of the two preceding ones. So the sequence goes: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, and so on. ...

To ensure that the model consistently outputs a long chain-of-thought, we have edited the chat template to prefill the first assistant turn with a <think> token. As a result, the outputs from this model will not show the opening <think> token if you use the model's generate() method. To apply reinforcement learning with a format reward, either prepend the <think> token to the model's completions or amend the chat template to remove the prefill.