PACT - Personalised AI Coding Tutor

This model is a fine-tuned version of Qwen 2.5 7B Instruct, designed to act as a Socratic coding tutor for Python programming.

Model Description

PACT (Personalised AI Coding Tutor) is trained to:

  • Provide guiding hints rather than direct solutions
  • Ask Socratic questions to help students discover answers themselves
  • Identify errors without revealing the fix
  • Be encouraging and supportive in tone

Training Details

  • Base model: Qwen/Qwen2.5-7B-Instruct
  • Method: QLoRA (4-bit quantization with LoRA rank-16 adapters)
  • Dataset: 227 synthetic examples of coding errors with Socratic hints
  • Training: 3 epochs, batch size 16, learning rate 2e-4
  • Hardware: NVIDIA RTX 4090 (24GB VRAM)
  • Framework: HuggingFace Transformers + PEFT + TRL

Dataset Creation Process

The training dataset was generated through:

  1. Sampling LeetCode problems (Easy/Medium/Hard)
  2. Using Claude Sonnet 4.5 to generate realistic student errors
  3. Validating with GPT-5.2 to ensure quality (79.1% pass rate)
  4. Formatting for Qwen 2.5 Instruct chat template

Dataset Error types include:

  • Logic errors (48.5%)
  • Edge case failures (19.4%)
  • Off-by-one errors (18.5%)
  • Missing base cases (7.5%)
  • Wrong algorithms (3.5%)

The model will be evaluated on:

  • Code Leakage Rate (CLR): Percentage of responses containing executable code (target: <5%)
  • Guiding Question Rate (GQR): Percentage using Socratic questions (target: >70%)
  • Direct Answer Rate (DAR): Percentage revealing solutions directly (target: <10%)
  • Error Identification Accuracy (EIA): Correctly identifying the actual bug (target: >85%)
  • Factual Correctness Rate (FCR): Technical accuracy of hints (target: >95%)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "AndreiSobo/pact-qwen-tutor",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("AndreiSobo/pact-qwen-tutor")

# Example conversation (matching training format)
messages = [
    {
        "role": "system", 
        "content": "You are PACT, a Socratic Python coding tutor. Help students learn through guided questions and hints, not direct answers."
    },
    {
        "role": "user", 
        "content": """Problem: Two Sum

Given an array of integers nums and an integer target, return indices of the two numbers that add up to target.

Example 1:

Input: nums = [2,7,11,15], target = 9
Output: [0,1]
Explanation: Because nums[0] + nums[1] == 9, we return [0, 1].

Example 2:

Input: nums = [3,2,4], target = 6
Output: [2,1]

Constraints:
- 2 <= nums.length <= 104
- -109 <= nums[i] <= 109
- Only one valid answer exists.

My code:
```python
def twoSum(nums, target):
    for i in range(len(nums)):
        for j in range(len(nums)):
            if nums[i] + nums[j] == target:
                return [i, j]
```

It runs but gives wrong output for some test cases.

Can you give me a hint?"""
    }
]

# Generate response
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    do_sample=True,
    top_p=0.9
)

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Alternatively, using notebooks can separate setup and inference

Setup:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

model = AutoModelForCausalLM.from_pretrained(
    "AndreiSobo/pact-qwen-tutor",
    quantization_config=quant_config,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("AndreiSobo/pact-qwen-tutor")

Inference:

# Importantly, messages must have the same structure as the JSONL object the model was trained on
messages = [
    {
        "role": "system", 
        "content": "You are PACT, a Socratic Python coding tutor. Help students learn through guided questions and hints, not direct answers."
    },
    {
        "role": "user", 
        "content": """Problem: Two Sum

Given an array of integers nums and an integer target, return indices of the two numbers that add up to target.

Example 1:

Input: nums = [2,7,11,15], target = 9
Output: [0,1]
Explanation: Because nums[0] + nums[1] == 9, we return [0, 1].

Example 2:

Input: nums = [3,2,4], target = 6
Output: [2,1]

Constraints:
- 2 <= nums.length <= 104
- -109 <= nums[i] <= 109
- Only one valid answer exists.

My code:
```python
def twoSum(nums, target):
    for i in range(len(nums)):
        for j in range(len(nums)):
            if nums[i] + nums[j] == target:
                return [i, j]
```

It runs but gives wrong output for some test cases.

Can you give me a hint?"""
    }
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Expected Output Style

The model will respond with Socratic hints like:

  • "Think about what happens when i equals j in your nested loop..."
  • "What indices are you comparing when both loops point to the same position?"
  • "Consider whether the same element can be used twice in your solution..."

Input Format Requirements

For best results, user queries must have the same structure as its training data:

  1. Problem title: Problem: [Name]
  2. Full description: Include examples and constraints
  3. Student code: Wrapped in ```python code blocks
  4. Issue description: Brief explanation of the problem
  5. Question: End with "Can you give me a hint?" or similar

Intended Use

This model is designed for educational purposes, specifically to:

  • Help students learn Python programming through guided discovery
  • Assist in debugging common coding errors
  • Encourage critical thinking and problem-solving skills
  • Provide formative feedback on coding assignments

Not intended for:

  • Production code generation
  • Automated grading systems
  • Replacing human instruction entirely

Limitations

  • Language: Focused on Python;
  • Problem domains: Optimized for algorithmic/LeetCode-style problems
  • Error types: Trained on common student mistakes; may not handle edge cases well
  • Context length: Limited to 2048 tokens per conversation
  • Socratic quality: May occasionally be too vague

Ethical Considerations

  • Students should be encouraged to attempt problems independently before seeking hints
  • Educators should review model responses for accuracy before sharing with students
  • This tool supplements, not replaces, traditional learning resources

Citation

If you use this model in your research or educational materials, please cite:

@misc{pact2026,
  author = {Sobo, Andrei},
  title = {PACT: Personalised AI Coding Tutor - A Socratic Fine-Tuned Qwen 2.5 Model},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/AndreiSobo/pact-qwen-tutor}
}

Acknowledgements

  • Base Model: Qwen Team at Alibaba Cloud
  • Synthetic Data Generation: Anthropic Claude Sonnet 4.5, OpenAI GPT-5.2
  • Source Dataset: LeetCode problems (newfacade/LeetCodeDataset)
  • Framework: HuggingFace Transformers, PEFT, TRL, bitsandbytes

License

This model inherits the Apache 2.0 license from Qwen 2.5 7B Instruct.

Contact

For questions, issues, or feedback:

Downloads last month
12
Safetensors
Model size
8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AndreiSobo/pact-qwen-tutor

Base model

Qwen/Qwen2.5-7B
Finetuned
(2760)
this model
Quantizations
1 model

Dataset used to train AndreiSobo/pact-qwen-tutor