richarderkhov/shibing624_-_chinese-text-correction-1.5b-gguf overview
中文文本纠错模型chinese-text-correction-1.5b:用于拼写纠错、语法纠错 shibing624/chinese-text-correction-1.5b evaluate test data: The overall performance of CSC test: |inputtext|predicttext| |:--- |:--- | |文本纠错:\n少先队员因该为老人让坐。|少先队员应该为老人让座。| # Models | Name | Base Model | Download | |-----------------|-------------------|-----------------------------------------------------------------------| | chinese-text-correction-1.5b | Qwen/Qwen2.5-1.5B-Instruct | 🤗 Hugging Face | | chinese-text-correction-1.5b-lora | Qwen/Qwen2.5-1.5B-Instruct | 🤗 Hugging Face | | chinese-text-correction-7b | Qwen/Qwen2.5-7B-Instruct | 🤗 Hugging Face | | chinese-text-correction-7b-lora | Qwen/Qwen2.5-7B-Instruct | 🤗 Hugging Face | ### 评估结果 | Model Name | Model Link | Base Model | Avg | SIGHAN-2015 | EC-LAW | MCSC | GPU/CPU | QPS | |:-----------------|:------------------------------------------------------------------------------------------------------------------------|:---------------------------|:-----------|:------------|:-------|:-------|:--------|:--------| | Kenlm-CSC | shibing624/chinese-kenlm-klm | kenlm | 0.3409 | 0.3147 | 0.3763 | 0.3317 | CPU | 9 | | Mengzi-T5-CSC | shibing624/mengzi-t5-base-chinese-correction | mengzi-t5-base | 0.3984 | 0.7758 | 0.3156 | 0.1039 | GPU | 214 | | ERNIE-CSC | PaddleNLP/ernie-csc | PaddlePaddle/ernie-1.0-base-zh | 0.4353 | 0.8383 | 0.3357 | 0.1318 | GPU | 114 | | MacBERT-CSC | shibing624/macbert4csc-base-chinese | hfl/chinese-macbert-base | 0.3993 | 0.8314 | 0.1610 | 0.2055 | GPU | 224 | | ChatGLM3-6B-CSC | shibing624/chatglm3-6b-csc-chinese-lora | THUDM/chatglm3-6b | 0.4538 | 0.6572 | 0.4369 | 0.2672 | GPU | 3 | | Qwen2.5-1.5B-CTC | shibing624/chinese-text-correction-1.5b | Qwen/Qwen2.5-1.5B-Instruct | 0.6802 | 0.3032 | 0.7846 | 0.9529 | GPU | 6 | | Qwen2.5-7B-CTC | shibing624/chinese-text-correction-7b | Qwen/Qwen2.5-7B-Instruct | 0.8225 | 0.4917 | 0.9798 | 0.9959 | GPU | 3 |
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| chinese-text-correction-1.5b.IQ4_NL.gguf | GGUF | IQ4_NL | 897.88 MB | Download |
| chinese-text-correction-1.5b.IQ4_XS.gguf | GGUF | IQ4_XS | 860.39 MB | Download |
| chinese-text-correction-1.5b.Q2_K.gguf | GGUF | Q2_K | 644.97 MB | Download |
| chinese-text-correction-1.5b.Q3_K.gguf | GGUF | Q3_K | 786.00 MB | Download |
| chinese-text-correction-1.5b.Q3_K_L.gguf | GGUF | Q3_K_L | 839.39 MB | Download |
| chinese-text-correction-1.5b.Q3_K_M.gguf | GGUF | Q3_K_M | 786.00 MB | Download |
| chinese-text-correction-1.5b.Q3_K_S.gguf | GGUF | Q3_K_S | 725.69 MB | Download |
| chinese-text-correction-1.5b.Q4_0.gguf | GGUF | — | 891.64 MB | Download |
| chinese-text-correction-1.5b.Q4_1.gguf | GGUF | — | 969.74 MB | Download |
| chinese-text-correction-1.5b.Q4_K.gguf | GGUF | Q4_K | 940.37 MB | Download |
| chinese-text-correction-1.5b.Q4_K_M.gguf | GGUF | Q4_K_M | 940.37 MB | Download |
| chinese-text-correction-1.5b.Q4_K_S.gguf | GGUF | Q4_K_S | 896.75 MB | Download |
| chinese-text-correction-1.5b.Q5_0.gguf | GGUF | — | 1.02 GB | Download |
| chinese-text-correction-1.5b.Q5_1.gguf | GGUF | — | 1.10 GB | Download |
| chinese-text-correction-1.5b.Q5_K.gguf | GGUF | Q5_K | 1.05 GB | Download |
| chinese-text-correction-1.5b.Q5_K_M.gguf | GGUF | Q5_K_M | 1.05 GB | Download |
| chinese-text-correction-1.5b.Q5_K_S.gguf | GGUF | Q5_K_S | 1.02 GB | Download |
| chinese-text-correction-1.5b.Q6_K.gguf | GGUF | Q6_K | 1.19 GB | Download |
| chinese-text-correction-1.5b.Q8_0.gguf | GGUF | — | 1.53 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"frontmatter": {},
"hero_image_url": "https://huggingface.co/shibing624/chinese-text-correction-1.5b-lora/resolve/main/eval_loss_1.5b.png",
"summary": "中文文本纠错模型chinese-text-correction-1.5b:用于拼写纠错、语法纠错 shibing624/chinese-text-correction-1.5b evaluate test data: The overall performance of CSC **test**: |input_text|predict_text| |:--- |:--- | |文本纠错:\\n少先队员因该为老人让坐。|少先队员应该为老人让座。| # Models | Name | Base Model | Download | |-----------------|-------------------|-----------------------------------------------------------------------| | chinese-text-correction-1.5b | Qwen/Qwen2.5-1.5B-Instruct | 🤗 Hugging Face | | chinese-text-correction-1.5b-lora | Qwen/Qwen2.5-1.5B-Instruct | 🤗 Hugging Face | | chinese-text-correction-7b | Qwen/Qwen2.5-7B-Instruct | 🤗 Hugging Face | | chinese-text-correction-7b-lora | Qwen/Qwen2.5-7B-Instruct | 🤗 Hugging Face | ### 评估结果 | Model Name | Model Link | Base Model | Avg | SIGHAN-2015 | EC-LAW | MCSC | GPU/CPU | QPS | |:-----------------|:------------------------------------------------------------------------------------------------------------------------|:---------------------------|:-----------|:------------|:-------|:-------|:--------|:--------| | Kenlm-CSC | shibing624/chinese-kenlm-klm | kenlm | 0.3409 | 0.3147 | 0.3763 | 0.3317 | CPU | 9 | | Mengzi-T5-CSC | shibing624/mengzi-t5-base-chinese-correction | mengzi-t5-base | 0.3984 | 0.7758 | 0.3156 | 0.1039 | GPU | 214 | | ERNIE-CSC | PaddleNLP/ernie-csc | PaddlePaddle/ernie-1.0-base-zh | 0.4353 | 0.8383 | 0.3357 | 0.1318 | GPU | 114 | | MacBERT-CSC | shibing624/macbert4csc-base-chinese | hfl/chinese-macbert-base | 0.3993 | 0.8314 | 0.1610 | 0.2055 | GPU | **224** | | ChatGLM3-6B-CSC | shibing624/chatglm3-6b-csc-chinese-lora | THUDM/chatglm3-6b | 0.4538 | 0.6572 | 0.4369 | 0.2672 | GPU | 3 | | Qwen2.5-1.5B-CTC | shibing624/chinese-text-correction-1.5b | Qwen/Qwen2.5-1.5B-Instruct | 0.6802 | 0.3032 | 0.7846 | 0.9529 | GPU | 6 | | Qwen2.5-7B-CTC | shibing624/chinese-text-correction-7b | Qwen/Qwen2.5-7B-Instruct | **0.8225** | 0.4917 | 0.9798 | 0.9959 | GPU | 3 |",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "Quantization made by Richard Erkhov.\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\n\nchinese-text-correction-1.5b - GGUF\n- Model creator: https://huggingface.co/shibing624/\n- Original model: https://huggingface.co/shibing624/chinese-text-correction-1.5b/\n\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [chinese-text-correction-1.5b.Q2_K.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q2_K.gguf) | Q2_K | 0.63GB |\n| [chinese-text-correction-1.5b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q3_K_S.gguf) | Q3_K_S | 0.71GB |\n| [chinese-text-correction-1.5b.Q3_K.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q3_K.gguf) | Q3_K | 0.77GB |\n| [chinese-text-correction-1.5b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q3_K_M.gguf) | Q3_K_M | 0.77GB |\n| [chinese-text-correction-1.5b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q3_K_L.gguf) | Q3_K_L | 0.82GB |\n| [chinese-text-correction-1.5b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.IQ4_XS.gguf) | IQ4_XS | 0.84GB |\n| [chinese-text-correction-1.5b.Q4_0.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q4_0.gguf) | Q4_0 | 0.87GB |\n| [chinese-text-correction-1.5b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.IQ4_NL.gguf) | IQ4_NL | 0.88GB |\n| [chinese-text-correction-1.5b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q4_K_S.gguf) | Q4_K_S | 0.88GB |\n| [chinese-text-correction-1.5b.Q4_K.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q4_K.gguf) | Q4_K | 0.92GB |\n| [chinese-text-correction-1.5b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q4_K_M.gguf) | Q4_K_M | 0.92GB |\n| [chinese-text-correction-1.5b.Q4_1.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q4_1.gguf) | Q4_1 | 0.95GB |\n| [chinese-text-correction-1.5b.Q5_0.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q5_0.gguf) | Q5_0 | 1.02GB |\n| [chinese-text-correction-1.5b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q5_K_S.gguf) | Q5_K_S | 1.02GB |\n| [chinese-text-correction-1.5b.Q5_K.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q5_K.gguf) | Q5_K | 1.05GB |\n| [chinese-text-correction-1.5b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q5_K_M.gguf) | Q5_K_M | 1.05GB |\n| [chinese-text-correction-1.5b.Q5_1.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q5_1.gguf) | Q5_1 | 1.1GB |\n| [chinese-text-correction-1.5b.Q6_K.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q6_K.gguf) | Q6_K | 1.19GB |\n| [chinese-text-correction-1.5b.Q8_0.gguf](https://huggingface.co/RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf/blob/main/chinese-text-correction-1.5b.Q8_0.gguf) | Q8_0 | 1.53GB |\n\n\n\n\nOriginal model description:\n---\nlibrary_name: transformers\nbase_model: Qwen/Qwen2.5-1.5B-Instruct\nlicense: apache-2.0\ndatasets:\n- shibing624/chinese_text_correction\nlanguage:\n- zh\nmetrics:\n- f1\ntags:\n- text-generation-inference\nwidget:\n- text: \"文本纠错:\\n少先队员因该为老人让坐。\"\n---\n\n\n\n# Chinese Text Correction Model\n中文文本纠错模型chinese-text-correction-1.5b:用于拼写纠错、语法纠错\n\n`shibing624/chinese-text-correction-1.5b` evaluate test data:\n\nThe overall performance of CSC **test**:\n\n|input_text|predict_text|\n|:--- |:--- |\n|文本纠错:\\n少先队员因该为老人让坐。|少先队员应该为老人让座。|\n\n# Models\n\n| Name | Base Model | Download |\n|-----------------|-------------------|-----------------------------------------------------------------------|\n| chinese-text-correction-1.5b | Qwen/Qwen2.5-1.5B-Instruct | [🤗 Hugging Face](https://huggingface.co/shibing624/chinese-text-correction-1.5b) |\n| chinese-text-correction-1.5b-lora | Qwen/Qwen2.5-1.5B-Instruct | [🤗 Hugging Face](https://huggingface.co/shibing624/chinese-text-correction-1.5b-lora) |\n| chinese-text-correction-7b | Qwen/Qwen2.5-7B-Instruct | [🤗 Hugging Face](https://huggingface.co/shibing624/chinese-text-correction-7b) |\n| chinese-text-correction-7b-lora | Qwen/Qwen2.5-7B-Instruct | [🤗 Hugging Face](https://huggingface.co/shibing624/chinese-text-correction-7b-lora) |\n\n\n### 评估结果\n- 评估指标:F1\n- CSC(Chinese Spelling Correction): 拼写纠错模型,表示模型可以处理音似、形似、语法等长度对齐的错误纠正\n- CTC(CHinese Text Correction): 文本纠错模型,表示模型支持拼写、语法等长度对齐的错误纠正,还可以处理多字、少字等长度不对齐的错误纠正\n- GPU:Tesla V100,显存 32 GB\n\n| Model Name | Model Link | Base Model | Avg | SIGHAN-2015 | EC-LAW | MCSC | GPU/CPU | QPS |\n|:-----------------|:------------------------------------------------------------------------------------------------------------------------|:---------------------------|:-----------|:------------|:-------|:-------|:--------|:--------|\n| Kenlm-CSC | [shibing624/chinese-kenlm-klm](https://huggingface.co/shibing624/chinese-kenlm-klm) | kenlm | 0.3409 | 0.3147 | 0.3763 | 0.3317 | CPU | 9 |\n| Mengzi-T5-CSC | [shibing624/mengzi-t5-base-chinese-correction](https://huggingface.co/shibing624/mengzi-t5-base-chinese-correction) | mengzi-t5-base | 0.3984 | 0.7758 | 0.3156 | 0.1039 | GPU | 214 |\n| ERNIE-CSC | [PaddleNLP/ernie-csc](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/legacy/examples/text_correction/ernie-csc) | PaddlePaddle/ernie-1.0-base-zh | 0.4353 | 0.8383 | 0.3357 | 0.1318 | GPU | 114 |\n| MacBERT-CSC | [shibing624/macbert4csc-base-chinese](https://huggingface.co/shibing624/macbert4csc-base-chinese) | hfl/chinese-macbert-base | 0.3993 | 0.8314 | 0.1610 | 0.2055 | GPU | **224** |\n| ChatGLM3-6B-CSC | [shibing624/chatglm3-6b-csc-chinese-lora](https://huggingface.co/shibing624/chatglm3-6b-csc-chinese-lora) | THUDM/chatglm3-6b | 0.4538 | 0.6572 | 0.4369 | 0.2672 | GPU | 3 |\n| Qwen2.5-1.5B-CTC | [shibing624/chinese-text-correction-1.5b](https://huggingface.co/shibing624/chinese-text-correction-1.5b) | Qwen/Qwen2.5-1.5B-Instruct | 0.6802 | 0.3032 | 0.7846 | 0.9529 | GPU | 6 |\n| Qwen2.5-7B-CTC | [shibing624/chinese-text-correction-7b](https://huggingface.co/shibing624/chinese-text-correction-7b) | Qwen/Qwen2.5-7B-Instruct | **0.8225** | 0.4917 | 0.9798 | 0.9959 | GPU | 3 |\n\n## Usage (pycorrector)\n\n本项目开源在`pycorrector`项目:[pycorrector](https://github.com/shibing624/pycorrector),可支持大模型微调后用于文本纠错,通过如下命令调用:\n\nInstall package:\n```shell\npip install -U pycorrector\n```\n\n```python\nfrom pycorrector.gpt.gpt_corrector import GptCorrector\n\nif __name__ == '__main__':\n error_sentences = [\n '真麻烦你了。希望你们好好的跳无',\n '少先队员因该为老人让坐',\n '机七学习是人工智能领遇最能体现智能的一个分知',\n '一只小鱼船浮在平净的河面上',\n '我的家乡是有明的渔米之乡',\n ]\n m = GptCorrector(\"shibing624/chinese-text-correction-1.5b\")\n\n batch_res = m.correct_batch(error_sentences)\n for i in batch_res:\n print(i)\n print()\n```\n\n## Usage (HuggingFace Transformers)\nWithout [pycorrector](https://github.com/shibing624/pycorrector), you can use the model like this: \n\nFirst, you pass your input through the transformer model, then you get the generated sentence.\n\nInstall package:\n```\npip install transformers \n```\n\n```python\n# pip install transformers\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\ncheckpoint = \"shibing624/chinese-text-correction-1.5b\"\n\ndevice = \"cuda\" # for GPU usage or \"cpu\" for CPU usage\ntokenizer = AutoTokenizer.from_pretrained(checkpoint)\nmodel = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)\n\ninput_content = \"文本纠错:\\n少先队员因该为老人让坐。\"\n\nmessages = [{\"role\": \"user\", \"content\": input_content}]\ninput_text=tokenizer.apply_chat_template(messages, tokenize=False)\n\nprint(input_text)\n\ninputs = tokenizer.encode(input_text, return_tensors=\"pt\").to(device)\noutputs = model.generate(inputs, max_new_tokens=1024, temperature=0, do_sample=False, repetition_penalty=1.08)\n\nprint(tokenizer.decode(outputs[0]))\n```\n\noutput:\n```shell\n少先队员应该为老人让座。\n```\n\n\n模型文件组成:\n```\nshibing624/chinese-text-correction-1.5b\n|-- added_tokens.json\n|-- config.json\n|-- generation_config.json\n|-- merges.txt\n|-- model.safetensors\n|-- model.safetensors.index.json\n|-- README.md\n|-- special_tokens_map.json\n|-- tokenizer_config.json\n|-- tokenizer.json\n`-- vocab.json\n```\n\n#### 训练参数:\n\n- num_epochs: 8\n- batch_size: 4\n- steps: 36000\n- eval_loss: 0.14\n- base model: Qwen/Qwen2.5-1.5B-Instruct\n- train data: [shibing624/chinese_text_correction](https://huggingface.co/datasets/shibing624/chinese_text_correction)\n- train time: 9 days 8 hours\n- eval_loss: \n- train_loss: \n\n### 训练数据集\n#### 中文纠错数据集\n\n- 数据:[shibing624/chinese_text_correction](https://huggingface.co/datasets/shibing624/chinese_text_correction)\n\n\n如果需要训练Qwen的纠错模型,请参考[https://github.com/shibing624/pycorrector](https://github.com/shibing624/pycorrector) 或者 [https://github.com/shibing624/MedicalGPT](https://github.com/shibing624/MedicalGPT)\n\n## Citation\n\n```latex\n@software{pycorrector,\n author = {Xu Ming},\n title = {pycorrector: Implementation of language model finetune},\n year = {2024},\n url = {https://github.com/shibing624/pycorrector},\n}\n```\n\n\n\n",
"related_quantizations": []
},
"tags": [
"gguf",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 0,
"downloads": 286,
"gated": false,
"private": false,
"last_modified": "2024-10-27T19:52:03.000Z",
"created_at": "2024-10-27T18:18:30.000Z",
"pipeline_tag": "",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "671e83f65c3da2e7b50c76a2",
"id": "RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf",
"modelId": "RichardErkhov/shibing624_-_chinese-text-correction-1.5b-gguf",
"sha": "28f0d26916b7ac5fe7a1e9101bbe97e215b461f6",
"createdAt": "2024-10-27T18:18:30.000Z",
"lastModified": "2024-10-27T19:52:03.000Z",
"author": "RichardErkhov",
"downloads": 286,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "",
"siblings_count": 21
}