Faster R-CNN with RoPE-ViT Backbone for Object Detection

This model is a Faster R-CNN object detection model with a RoPE-ViT (Vision Transformer with Rotary Position Embeddings) backbone, trained on the COCO dataset.

Model Description

  • Architecture: Faster R-CNN
  • Backbone: RoPE-ViT Tiny
  • Dataset: COCO
  • Task: Object Detection
  • Framework: MMDetection

Training Results

Metric Value
bbox_mAP 0.0680
bbox_mAP_50 0.1510
bbox_mAP_75 0.0530
bbox_mAP_s (small) 0.0360
bbox_mAP_m (medium) 0.1260
bbox_mAP_l (large) 0.0640

Usage

from mmdet.apis import init_detector, inference_detector

config_file = 'faster_rcnn_rope_vit_tiny_coco.py'
checkpoint_file = 'best_coco_bbox_mAP_epoch_12.pth'

# Initialize the model
model = init_detector(config_file, checkpoint_file, device='cuda:0')

# Inference on an image
result = inference_detector(model, 'demo.jpg')

Training Configuration

The model was trained with the following configuration:

  • Input size: 512x512
  • Training epochs: 12
  • Optimizer: SGD with momentum
  • Learning rate scheduler: Step decay

Citation

If you use this model, please cite:

@misc{rope-vit-detection,
  author = {VLG IITR},
  title = {Faster R-CNN with RoPE-ViT for Object Detection},
  year = {2026},
  publisher = {Hugging Face},
}

License

This model is released under the Apache 2.0 license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support