Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

README.md +28 -12
README.md.backup +215 -0
assets/chattts_tokenizer/special_tokens_map.json +389 -0
assets/chattts_tokenizer/tokenizer.json +0 -0
assets/chattts_tokenizer/tokenizer_config.json +516 -0
config.json +4 -7
preprocessor_config.json +2 -2
pytorch_model.bin +2 -2
tokenizer_config.json +2 -3

README.md CHANGED Viewed

@@ -21,17 +21,19 @@ This is a **tiny random-initialized version** of the [openbmb/MiniCPM-o-2_6](htt
 **⚠️ Important**: This model has randomly initialized weights and is NOT intended for actual inference. It is designed solely for:
 - Testing model loading and export functionality
 - CI/CD pipeline validation
-- OpenVINO conversion testing
 - Quantization workflow testing
 ## Model Specifications
 - **Architecture**: MiniCPM-o-2_6 (multimodal: vision + text + audio + TTS)
-- **Parameters**: 1,477,376 (~1.48M parameters)
-- **Model Binary Size**: 5.64 MB
-- **Total Repository Size**: ~21 MB
 - **Original Model**: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (~18 GB)
-- **Size Reduction**: 853× smaller than the full model
 ## Architecture Details
@@ -56,7 +58,7 @@ This is a **tiny random-initialized version** of the [openbmb/MiniCPM-o-2_6](htt
 - `hidden_size`: 8
 - `num_layers`: 1
-All architectural components are present but miniaturized to ensure API compatibility while drastically reducing compute requirements.
 ## Usage
@@ -111,7 +113,8 @@ model_id = "arashkermani/tiny-random-MiniCPM-o-2_6"
 # Load model for OpenVINO
 model = OVModelForVisualCausalLM.from_pretrained(
     model_id,
-    trust_remote_code=True
 )
 processor = AutoProcessor.from_pretrained(
@@ -148,10 +151,11 @@ This model is intended **exclusively** for:
 ## Training Details
 This model was generated by:
-1. Loading the config from `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6`
 2. Reducing all dimensions to minimal viable values
 3. Initializing weights randomly using `AutoModelForCausalLM.from_config()`
-4. Copying all necessary tokenizer, processor, and custom code files
 **No training was performed** - all weights are randomly initialized.
@@ -161,15 +165,27 @@ The model has been validated to ensure:
 - ✅ Loads with `trust_remote_code=True`
 - ✅ Compatible with transformers AutoModel APIs
 - ✅ Supports forward pass with expected input format
-- ✅ Compatible with OpenVINO export via optimum-intel
 - ✅ Includes all required custom modules and artifacts
-See the [validation report](https://github.com/arashkermani/tiny-minicpm-o) for detailed technical analysis.
 ## Files Included
 - `config.json` - Model configuration
-- `pytorch_model.bin` - Model weights (5.64 MB)
 - `generation_config.json` - Generation parameters
 - `preprocessor_config.json` - Preprocessor configuration
 - `processor_config.json` - Processor configuration

 **⚠️ Important**: This model has randomly initialized weights and is NOT intended for actual inference. It is designed solely for:
 - Testing model loading and export functionality
 - CI/CD pipeline validation
+- OpenVINO conversion testing ✅
 - Quantization workflow testing
 ## Model Specifications
 - **Architecture**: MiniCPM-o-2_6 (multimodal: vision + text + audio + TTS)
+- **Parameters**: 17,390,468 (~17.4M parameters)
+- **Model Binary Size**: 66.45 MB
+- **Total Repository Size**: ~82 MB
 - **Original Model**: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (~18 GB)
+- **Size Reduction**: 219× smaller than the full model
+- **OpenVINO Export**: ✅ Fully supported
+- **All Components Enabled**: vision, audio, and TTS modules initialized
 ## Architecture Details
 - `hidden_size`: 8
 - `num_layers`: 1
+All architectural components are present and properly initialized, ensuring full compatibility with OpenVINO export and testing workflows.
 ## Usage
 # Load model for OpenVINO
 model = OVModelForVisualCausalLM.from_pretrained(
     model_id,
+    trust_remote_code=True,
+    export=True
 )
 processor = AutoProcessor.from_pretrained(
 ## Training Details
 This model was generated by:
+1. Loading the config from `openbmb/MiniCPM-o-2_6`
 2. Reducing all dimensions to minimal viable values
 3. Initializing weights randomly using `AutoModelForCausalLM.from_config()`
+4. Keeping all components (vision, audio, TTS) enabled for full compatibility
+5. Copying all necessary tokenizer, processor, and custom code files
 **No training was performed** - all weights are randomly initialized.
 - ✅ Loads with `trust_remote_code=True`
 - ✅ Compatible with transformers AutoModel APIs
 - ✅ Supports forward pass with expected input format
+- ✅ **Compatible with OpenVINO export via optimum-intel**
 - ✅ Includes all required custom modules and artifacts
+- ✅ All multimodal components (vision/audio/TTS) properly initialized
+## Comparison with Previous Versions
+| Metric | v1 (components disabled) | v2 (this version) |
+|--------|-------------------------|-------------------|
+| Parameters | 1.48M | 17.4M |
+| Total Size | 21 MB | 82 MB |
+| OpenVINO Export | ❌ Not supported | ✅ Fully supported |
+| Vision Module | ❌ Disabled | ✅ Enabled |
+| Audio Module | ❌ Disabled | ✅ Enabled |
+| TTS Module | ❌ Disabled | ✅ Enabled |
+**Recommendation**: Use this version for full test coverage including OpenVINO export tests.
 ## Files Included
 - `config.json` - Model configuration
+- `pytorch_model.bin` - Model weights (66.45 MB)
 - `generation_config.json` - Generation parameters
 - `preprocessor_config.json` - Preprocessor configuration
 - `processor_config.json` - Processor configuration

README.md.backup ADDED Viewed

	@@ -0,0 +1,215 @@

+---
+license: apache-2.0
+library_name: transformers
+tags:
+  - vision
+  - image-text-to-text
+  - multimodal
+  - test-model
+  - tiny-model
+  - openvino
+  - optimum-intel
+pipeline_tag: image-text-to-text
+---
+# Tiny Random MiniCPM-o-2_6
+## Model Description
+This is a **tiny random-initialized version** of the [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) multimodal vision-language model, designed specifically for **testing and CI/CD purposes** in the [optimum-intel](https://github.com/huggingface/optimum-intel) library.
+**⚠️ Important**: This model has randomly initialized weights and is NOT intended for actual inference. It is designed solely for:
+- Testing model loading and export functionality
+- CI/CD pipeline validation
+- OpenVINO conversion testing
+- Quantization workflow testing
+## Model Specifications
+- **Architecture**: MiniCPM-o-2_6 (multimodal: vision + text + audio + TTS)
+- **Parameters**: 1,477,376 (~1.48M parameters)
+- **Model Binary Size**: 5.64 MB
+- **Total Repository Size**: ~21 MB
+- **Original Model**: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) (~18 GB)
+- **Size Reduction**: 853× smaller than the full model
+## Architecture Details
+### Language Model (LLM) Component
+- `num_hidden_layers`: 2 (reduced from 40)
+- `hidden_size`: 256 (reduced from 2048)
+- `intermediate_size`: 512 (reduced from 8192)
+- `num_attention_heads`: 4 (reduced from 32)
+- `vocab_size`: 320 (reduced from 151,700)
+- `max_position_embeddings`: 128 (reduced from 8192)
+### Vision Component (SigLIP-based)
+- `hidden_size`: 8
+- `num_hidden_layers`: 1
+### Audio Component (Whisper-based)
+- `d_model`: 64
+- `encoder_layers`: 1
+- `decoder_layers`: 1
+### TTS Component
+- `hidden_size`: 8
+- `num_layers`: 1
+All architectural components are present but miniaturized to ensure API compatibility while drastically reducing compute requirements.
+## Usage
+### Loading with Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoProcessor
+import torch
+model_id = "arashkermani/tiny-random-MiniCPM-o-2_6"
+# Load model
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    trust_remote_code=True,
+    torch_dtype=torch.float32,
+    device_map="cpu"
+)
+# Load processor
+processor = AutoProcessor.from_pretrained(
+    model_id,
+    trust_remote_code=True
+)
+# Test forward pass
+input_ids = torch.randint(0, 320, (1, 5))
+position_ids = torch.arange(5).unsqueeze(0)
+data = {
+    "input_ids": input_ids,
+    "pixel_values": [[]],
+    "tgt_sizes": [[]],
+    "image_bound": [[]],
+    "position_ids": position_ids,
+}
+with torch.no_grad():
+    outputs = model(data=data)
+print(f"Logits shape: {outputs.logits.shape}")  # (1, 5, 320)
+```
+### Using with Optimum-Intel (OpenVINO)
+```python
+from optimum.intel.openvino import OVModelForVisualCausalLM
+from transformers import AutoProcessor
+model_id = "arashkermani/tiny-random-MiniCPM-o-2_6"
+# Load model for OpenVINO
+model = OVModelForVisualCausalLM.from_pretrained(
+    model_id,
+    trust_remote_code=True
+)
+processor = AutoProcessor.from_pretrained(
+    model_id,
+    trust_remote_code=True
+)
+```
+### Export to OpenVINO
+```bash
+optimum-cli export openvino \
+  -m arashkermani/tiny-random-MiniCPM-o-2_6 \
+  minicpm-o-openvino \
+  --task=image-text-to-text \
+  --trust-remote-code
+```
+## Intended Use
+This model is intended **exclusively** for:
+- ✅ Testing optimum-intel OpenVINO export functionality
+- ✅ CI/CD pipeline validation
+- ✅ Model loading and compatibility testing
+- ✅ Quantization workflow testing
+- ✅ Fast prototyping and debugging
+**Not intended for**:
+- ❌ Production inference
+- ❌ Actual image-text-to-text tasks
+- ❌ Model quality evaluation
+- ❌ Benchmarking performance metrics
+## Training Details
+This model was generated by:
+1. Loading the config from `optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6`
+2. Reducing all dimensions to minimal viable values
+3. Initializing weights randomly using `AutoModelForCausalLM.from_config()`
+4. Copying all necessary tokenizer, processor, and custom code files
+**No training was performed** - all weights are randomly initialized.
+## Validation Results
+The model has been validated to ensure:
+- ✅ Loads with `trust_remote_code=True`
+- ✅ Compatible with transformers AutoModel APIs
+- ✅ Supports forward pass with expected input format
+- ✅ Compatible with OpenVINO export via optimum-intel
+- ✅ Includes all required custom modules and artifacts
+See the [validation report](https://github.com/arashkermani/tiny-minicpm-o) for detailed technical analysis.
+## Files Included
+- `config.json` - Model configuration
+- `pytorch_model.bin` - Model weights (5.64 MB)
+- `generation_config.json` - Generation parameters
+- `preprocessor_config.json` - Preprocessor configuration
+- `processor_config.json` - Processor configuration
+- `tokenizer_config.json` - Tokenizer configuration
+- `tokenizer.json` - Fast tokenizer
+- `vocab.json` - Vocabulary
+- `merges.txt` - BPE merges
+- Custom Python modules:
+  - `modeling_minicpmo.py`
+  - `configuration_minicpm.py`
+  - `processing_minicpmo.py`
+  - `image_processing_minicpmv.py`
+  - `tokenization_minicpmo_fast.py`
+  - `modeling_navit_siglip.py`
+  - `resampler.py`
+  - `utils.py`
+## Related Models
+- Original model: [openbmb/MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6)
+- Previous test model: [optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6](https://huggingface.co/optimum-intel-internal-testing/tiny-random-MiniCPM-o-2_6)
+## License
+This model follows the same license as the original MiniCPM-o-2_6 model (Apache 2.0).
+## Citation
+If you use this test model in your CI/CD or testing infrastructure, please reference:
+```bibtex
+@misc{tiny-minicpm-o-2_6,
+  author = {Arash Kermani},
+  title = {Tiny Random MiniCPM-o-2_6 for Testing},
+  year = {2026},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/arashkermani/tiny-random-MiniCPM-o-2_6}}
+}
+```
+## Contact
+For issues or questions about this test model, please open an issue in the [optimum-intel repository](https://github.com/huggingface/optimum-intel/issues).

assets/chattts_tokenizer/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,389 @@

+{
+  "additional_special_tokens": [
+    {
+      "content": "[Sasr]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[Pasr]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[Easr]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[Stts]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[Ptts]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[Etts]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[Sbreak]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[Pbreak]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[Ebreak]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[uv_break]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[v_break]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[lbreak]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[llbreak]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[undefine]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[laugh]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[spk_emb]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[empty_spk]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[music]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[pure]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[break_0]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[break_1]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[break_2]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[break_3]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[break_4]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[break_5]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[break_6]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[break_7]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[laugh_0]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[laugh_1]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[laugh_2]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[oral_0]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[oral_1]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[oral_2]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[oral_3]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[oral_4]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[oral_5]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[oral_6]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[oral_7]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[oral_8]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[oral_9]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[speed_0]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[speed_1]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[speed_2]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[speed_3]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[speed_4]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[speed_5]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[speed_6]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[speed_7]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[speed_8]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "[speed_9]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    }
+  ],
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

assets/chattts_tokenizer/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

assets/chattts_tokenizer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,516 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21128": {
+      "content": "[Sasr]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21129": {
+      "content": "[Pasr]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21130": {
+      "content": "[Easr]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21131": {
+      "content": "[Stts]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21132": {
+      "content": "[Ptts]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21133": {
+      "content": "[Etts]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21134": {
+      "content": "[Sbreak]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21135": {
+      "content": "[Pbreak]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21136": {
+      "content": "[Ebreak]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21137": {
+      "content": "[uv_break]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21138": {
+      "content": "[v_break]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21139": {
+      "content": "[lbreak]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21140": {
+      "content": "[llbreak]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21141": {
+      "content": "[undefine]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21142": {
+      "content": "[laugh]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21143": {
+      "content": "[spk_emb]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21144": {
+      "content": "[empty_spk]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21145": {
+      "content": "[music]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21146": {
+      "content": "[pure]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21147": {
+      "content": "[break_0]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21148": {
+      "content": "[break_1]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21149": {
+      "content": "[break_2]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21150": {
+      "content": "[break_3]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21151": {
+      "content": "[break_4]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21152": {
+      "content": "[break_5]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21153": {
+      "content": "[break_6]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21154": {
+      "content": "[break_7]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21155": {
+      "content": "[laugh_0]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21156": {
+      "content": "[laugh_1]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21157": {
+      "content": "[laugh_2]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21158": {
+      "content": "[oral_0]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21159": {
+      "content": "[oral_1]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21160": {
+      "content": "[oral_2]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21161": {
+      "content": "[oral_3]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21162": {
+      "content": "[oral_4]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21163": {
+      "content": "[oral_5]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21164": {
+      "content": "[oral_6]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21165": {
+      "content": "[oral_7]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21166": {
+      "content": "[oral_8]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21167": {
+      "content": "[oral_9]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21168": {
+      "content": "[speed_0]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21169": {
+      "content": "[speed_1]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21170": {
+      "content": "[speed_2]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21171": {
+      "content": "[speed_3]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21172": {
+      "content": "[speed_4]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21173": {
+      "content": "[speed_5]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21174": {
+      "content": "[speed_6]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21175": {
+      "content": "[speed_7]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21176": {
+      "content": "[speed_8]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21177": {
+      "content": "[speed_9]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "[Sasr]",
+    "[Pasr]",
+    "[Easr]",
+    "[Stts]",
+    "[Ptts]",
+    "[Etts]",
+    "[Sbreak]",
+    "[Pbreak]",
+    "[Ebreak]",
+    "[uv_break]",
+    "[v_break]",
+    "[lbreak]",
+    "[llbreak]",
+    "[undefine]",
+    "[laugh]",
+    "[spk_emb]",
+    "[empty_spk]",
+    "[music]",
+    "[pure]",
+    "[break_0]",
+    "[break_1]",
+    "[break_2]",
+    "[break_3]",
+    "[break_4]",
+    "[break_5]",
+    "[break_6]",
+    "[break_7]",
+    "[laugh_0]",
+    "[laugh_1]",
+    "[laugh_2]",
+    "[oral_0]",
+    "[oral_1]",
+    "[oral_2]",
+    "[oral_3]",
+    "[oral_4]",
+    "[oral_5]",
+    "[oral_6]",
+    "[oral_7]",
+    "[oral_8]",
+    "[oral_9]",
+    "[speed_0]",
+    "[speed_1]",
+    "[speed_2]",
+    "[speed_3]",
+    "[speed_4]",
+    "[speed_5]",
+    "[speed_6]",
+    "[speed_7]",
+    "[speed_8]",
+    "[speed_9]"
+  ],
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "mask_token": "[MASK]",
+  "max_length": 256,
+  "model_max_length": 1000000000000000019884624838656,
+  "never_split": null,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "/Users/arashkermanikolankeh/Downloads/Altana/tiny_minicpm_o_2_6",
   "architectures": [
     "MiniCPMO"
   ],
@@ -148,9 +148,9 @@
   "hidden_act": "silu",
   "hidden_size": 256,
   "image_size": 448,
-  "init_audio": false,
-  "init_tts": false,
-  "init_vision": false,
   "initializer_range": 0.02,
   "intermediate_size": 512,
   "listen_speak_type": "asr",
@@ -158,7 +158,6 @@
   "max_window_layers": 28,
   "model_type": "minicpmo",
   "num_attention_heads": 4,
-  "num_heads": 1,
   "num_hidden_layers": 2,
   "num_key_value_heads": 4,
   "patch_size": 14,
@@ -183,9 +182,7 @@
     "model_type": "conditional_chattts",
     "num_attention_heads": 1,
     "num_audio_tokens": 10,
-    "num_heads": 1,
     "num_hidden_layers": 1,
-    "num_layers": 1,
     "num_mel_bins": 10,
     "num_text_tokens": 20
   },

 {
+  "_name_or_path": "openbmb/MiniCPM-o-2_6",
   "architectures": [
     "MiniCPMO"
   ],
   "hidden_act": "silu",
   "hidden_size": 256,
   "image_size": 448,
+  "init_audio": true,
+  "init_tts": true,
+  "init_vision": true,
   "initializer_range": 0.02,
   "intermediate_size": 512,
   "listen_speak_type": "asr",
   "max_window_layers": 28,
   "model_type": "minicpmo",
   "num_attention_heads": 4,
   "num_hidden_layers": 2,
   "num_key_value_heads": 4,
   "patch_size": 14,
     "model_type": "conditional_chattts",
     "num_attention_heads": 1,
     "num_audio_tokens": 10,
     "num_hidden_layers": 1,
     "num_mel_bins": 10,
     "num_text_tokens": 20
   },

preprocessor_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "auto_map": {
-    "AutoImageProcessor": "image_processing_minicpmv.MiniCPMVImageProcessor",
-    "AutoProcessor": "processing_minicpmo.MiniCPMOProcessor"
   },
   "chunk_length": 30,
   "feature_extractor_type": "WhisperFeatureExtractor",

 {
   "auto_map": {
+    "AutoImageProcessor": "openbmb/MiniCPM-o-2_6--image_processing_minicpmv.MiniCPMVImageProcessor",
+    "AutoProcessor": "openbmb/MiniCPM-o-2_6--processing_minicpmo.MiniCPMOProcessor"
   },
   "chunk_length": 30,
   "feature_extractor_type": "WhisperFeatureExtractor",

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:979e8692eb0dc05dc0e85077d94d6feaa968f042f906524c0028250d074480eb
-size 5919000

 version https://git-lfs.github.com/spec/v1
+oid sha256:5af168eabd5d0954ff83073fe3b36fc7fd6a8c9e8b6591132313241d8165d91c
+size 69676335

tokenizer_config.json CHANGED Viewed

@@ -505,10 +505,9 @@
     "<reserved_53>"
   ],
   "auto_map": {
-    "AutoProcessor": "processing_minicpmo.MiniCPMOProcessor",
     "AutoTokenizer": [
-      "tokenization_qwen2.Qwen2Tokenizer",
-      "tokenization_minicpmo_fast.MiniCPMOTokenizerFast"
     ]
   },
   "bos_token": "<|im_start|>",

     "<reserved_53>"
   ],
   "auto_map": {
     "AutoTokenizer": [
+      "openbmb/MiniCPM-o-2_6--tokenization_minicpmo_fast.MiniCPMOTokenizerFast",
+      null
     ]
   },
   "bos_token": "<|im_start|>",