Legal Contract Ensemble Classifier 🏛️⚖️

State-of-the-art 2-model ensemble for automated contract clause risk classification

🎯 Model Description

This ensemble combines two specialized transformer models to achieve 97.74% accuracy in classifying legal contract clauses into risk categories. The model helps legal professionals quickly identify potentially problematic clauses in contracts.

Architecture

2-Model Ensemble with Probability Averaging:

Legal-BERT-Base (nlpaueb/legal-bert-base-uncased)
- Fine-tuned on legal domain text
- 110M parameters
- Validation F1: 91.84%
DeBERTa-v3-Base (microsoft/deberta-v3-base)
- Advanced disentangled attention mechanism
- 184M parameters
- Validation F1: 91.71%

Ensemble Method: Simple probability averaging Total Size: ~1.1 GB

📊 Performance Metrics

Metric	Score
Accuracy	97.74%
Macro F1	97.84%
Weighted F1	97.74%
Error Rate	2.26%

Per-Class Performance

Class	Precision	Recall	F1-Score	Support
Safe/Standard	98.39%	95.31%	96.83%	64
Unilateral Termination	97.78%	100.00%	98.88%	44
Unlimited Liability	94.29%	97.06%	95.65%	34
Non-Compete	100.00%	100.00%	100.00%	35

Confusion Matrix

text Predicted Safe Unilat Unlim NonComp Actual Safe 61 1 2 0 Unilat 0 44 0 0 Unlim 1 0 33 0 NonComp 0 0 0 35

text

🏷️ Classification Categories

Label	Category	Description	Risk Level
0	Safe/Standard	Standard legal clauses with reasonable, balanced terms	🟢 Low
1	Unilateral Termination	Clauses allowing one-sided contract termination without cause	🟡 Medium
2	Unlimited Liability	Clauses with uncapped liability exposure	🔴 High
3	Non-Compete	Restrictive non-compete agreements limiting future employment	🟠 Medium-High

🚀 Quick Start

Installation

pip install transformers torch numpy

Usage

import sys import os

Add model directory to path sys.path.insert(0, "path/to/model/directory")

from ensemble_model import SimpleLegalEnsemble

Load ensemble ensemble = SimpleLegalEnsemble( model_dir=".", # Current directory device='auto' # Automatically use CUDA if available )

Single prediction clause = "The Company shall be liable for all damages without any limitation whatsoever." result = ensemble.predict(clause)

print(f"Category: {result['label']}") print(f"Confidence: {result['confidence']:.2%}") print(f"All Scores: {result['all_scores']}")

Output: { 'label': 'Unlimited Liability', 'label_id': 2, 'confidence': 0.9825, 'all_scores': { 'Safe/Standard': 0.0045, 'Unilateral Termination': 0.0089, 'Unlimited Liability': 0.9825, 'Non-Compete': 0.0041 }, 'individual_models': { 'legal_bert': { 'prediction': 'Unlimited Liability', 'confidence': 0.9756 }, 'deberta': { 'prediction': 'Unlimited Liability', 'confidence': 0.9894 } } }

Batch Prediction

clauses = [ "Liability is limited to $100,000.", "Either party may terminate at any time.", "Company accepts unlimited liability.", "Employee shall not compete for 2 years." ]

results = ensemble.predict_batch(clauses, batch_size=8, show_progress=True)

for clause, result in zip(clauses, results): print(f"{clause[:50]}... → {result['label']} ({result['confidence']:.2%})")

📁 Repository Structure

. ├── ensemble_model.py # Main ensemble class ├── model_metadata.json # Model configuration and metrics ├── README.md # This file ├── requirements.txt # Python dependencies ├── example_usage.py # Usage examples ├── legal_bert_base/ # Legal-BERT model files │ ├── config.json │ ├── model.safetensors # 440 MB │ └── tokenizer files └── deberta_v3/ # DeBERTa model files ├── config.json ├── model.safetensors # 371 MB └── tokenizer files

🔧 Training Details

Dataset

Training Samples: 1,398 (with augmentation)
Validation Samples: 177
Original Samples: 827
Augmentation Techniques:
- Synonym replacement
- Contextual word substitution
- Back-translation
- Random deletion
- Random word swapping
- Sentence shuffling

Training Configuration

Loss Function: Focal Loss + Label Smoothing (0.1)
Optimizer: AdamW
Learning Rate: 1.18e-5
Batch Size: 8
Epochs: 15 (with early stopping)
Warmup Ratio: 0.109
Weight Decay: 0.0086
Dropout: 0.173

Hardware

GPU: NVIDIA Tesla T4/V100
Training Time: ~2 hours (all models)
Inference Speed: ~12 samples/second (batch size 8)

💡 Use Cases

Contract Review Automation
- Automatically flag risky clauses in vendor contracts
- Prioritize contracts for legal review
Due Diligence
- Rapid analysis of large contract volumes during M&A
- Risk assessment for contract portfolios
Legal Tech Applications
- Contract management platforms
- Legal research tools
- Compliance monitoring systems
Educational Tools
- Teaching contract law principles
- Training paralegals and legal assistants

⚠️ Limitations

Domain Specificity: Trained on English legal contracts; may not generalize to other languages or legal systems
Edge Cases: Performance may vary on highly specialized or ambiguous clauses
Context Length: Limited to 512 tokens (~300-400 words per clause)
Not Legal Advice: This model is a tool for analysis, not a replacement for professional legal review

📝 Citation

@software{legal_contract_ensemble_2025, title = {Legal Contract Ensemble Classifier}, author = {Nikhil-AI-Labs}, year = {2025}, version = {1.0.0}, url = {https://huggingface.co/Nikhil-AI-Labs/legal-contract-classifier-best}, note = {97.74% accuracy ensemble model for contract clause classification} }

📜 License

Apache 2.0 License - See LICENSE file for details

🙏 Acknowledgments

Base Models:
- nlpaueb/legal-bert-base-uncased
- microsoft/deberta-v3-base
Frameworks: Hugging Face Transformers, PyTorch

📧 Contact

For questions, issues, or collaboration:

Hugging Face: @Nikhil-AI-Labs
Repository Issues: Open an issue

Developed with ❤️ for the legal AI community

🤗 Model • 📊 Performance • 📖 Docs

Downloads last month: -; Downloads are not tracked for this model. How to track

Spaces using Nikhil-AI-Labs/legal-contract-classifier-best 2

Evaluation results

Accuracy
self-reported

0.977
Macro F1
self-reported

0.978