Update README.md

9d74af8 verified over 1 year ago

4.49 kB

	---
	license: gpl-3.0
	datasets:
	- Mxode/BiST
	language:
	- en
	- zh
	pipeline_tag: translation
	library_name: transformers
	---
	# NanoTranslator-M2

	English \| [简体中文](README_zh-CN.md)

	## Introduction

	This is the medium-2 model of the NanoTranslator, currently supported only in English to Chinese.

	The ONNX version of the model is also available in the repository.

	All models are collected in the [NanoTranslator Collection](https://huggingface.co/collections/Mxode/nanotranslator-66e1de2ba352e926ae865bd2).

	\| \| P. \| Arch. \| Act. \| V. \| H. \| I. \| L. \| A.H. \| K.H. \| Tie \|
	\| :--: \| :-----: \| :--: \| :--: \| :--: \| :-----: \| :---: \| :------: \| :--: \| :--: \| :--: \|
	\| [XXL](https://huggingface.co/Mxode/NanoTranslator-XXL) \| 100 \| LLaMA \| SwiGLU \| 16000 \| 768 \| 4096 \| 8 \| 24 \| 8 \| True \|
	\| [XL](https://huggingface.co/Mxode/NanoTranslator-XL) \| 78 \| LLaMA \| GeGLU \| 16000 \| 768 \| 4096 \| 6 \| 24 \| 8 \| True \|
	\| [L](https://huggingface.co/Mxode/NanoTranslator-L) \| 49 \| LLaMA \| GeGLU \| 16000 \| 512 \| 2816 \| 8 \| 16 \| 8 \| True \|
	\| [M2](https://huggingface.co/Mxode/NanoTranslator-M2) \| 22 \| Qwen2 \| GeGLU \| 4000 \| 432 \| 2304 \| 6 \| 24 \| 8 \| True \|
	\| [M](https://huggingface.co/Mxode/NanoTranslator-M) \| 22 \| LLaMA \| SwiGLU \| 8000 \| 256 \| 1408 \| 16 \| 16 \| 4 \| True \|
	\| [S](https://huggingface.co/Mxode/NanoTranslator-S) \| 9 \| LLaMA \| SwiGLU \| 4000 \| 168 \| 896 \| 16 \| 12 \| 4 \| True \|
	\| [XS](https://huggingface.co/Mxode/NanoTranslator-XS) \| 2 \| LLaMA \| SwiGLU \| 2000 \| 96 \| 512 \| 12 \| 12 \| 4 \| True \|

	- P. - Parameters (in million)
	- V. - vocab size
	- H. - hidden size
	- I. - intermediate size
	- L. - num layers
	- A.H. - num attention heads
	- K.H. - num kv heads
	- Tie - tie word embeddings



	## How to use

	Prompt format as follows：

	```
	<\|im_start\|> {English Text} <\|endoftext\|>
	```

	### Directly using transformers

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_path = 'Mxode/NanoTranslator-M2'

	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(model_path)

	def translate(text: str, model, **kwargs):
	generation_args = dict(
	max_new_tokens = kwargs.pop("max_new_tokens", 512),
	do_sample = kwargs.pop("do_sample", True),
	temperature = kwargs.pop("temperature", 0.55),
	top_p = kwargs.pop("top_p", 0.8),
	top_k = kwargs.pop("top_k", 40),
	eos_token_id = kwargs.pop("eos_token_id", tokenizer.eos_token_id),
	**kwargs
	)

	prompt = "<\|im_start\|>" + text + "<\|endoftext\|>"
	model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

	generated_ids = model.generate(model_inputs.input_ids, **generation_args)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	return response

	text = "I love to watch my favorite TV series."

	response = translate(text, model, max_new_tokens=64, do_sample=False)
	print(response)
	```


	### ONNX

	It has been measured that reasoning with ONNX models will be 2-10 times faster than reasoning directly with transformers models.

	You should switch to [onnx branch](https://huggingface.co/Mxode/NanoTranslator-M2/tree/onnx) manually and download to local.

	reference docs:

	- [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
	- [Inference pipelines with the ONNX Runtime accelerator](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines)

	Using ORTModelForCausalLM

	```python
	from optimum.onnxruntime import ORTModelForCausalLM
	from transformers import AutoTokenizer

	model_path = "your/folder/to/onnx_model"

	ort_model = ORTModelForCausalLM.from_pretrained(model_path)
	tokenizer = AutoTokenizer.from_pretrained(model_path)

	text = "I love to watch my favorite TV series."

	response = translate(text, ort_model, max_new_tokens=64, do_sample=False, eos_token_id=tokenizer.eos_token_id)
	print(response)
	```

	Using pipeline

	```python
	from optimum.pipelines import pipeline

	model_path = "your/folder/to/onnx_model"
	pipe = pipeline("text-generation", model=model_path, accelerator="ort")

	text = "I love to watch my favorite TV series."

	response = pipe(text, max_new_tokens=64, do_sample=False, eos_token_id=2)
	response
	```