GraySoft
Projects Models About FAQ Contact Download guIDE โ†’
Model Intelligence Sheet

richarderkhov/survivi_-_llama-3-syne-gguf overview

Quantization made by Richard Erkhov. Github Discord Request more models Llama-3-SynE - GGUF | Name | Quant method | Size | | ---- | ---- | ---- | | Llama-3-SynE.Q2K.gguf | Q2K | 2.96GB | | Llama-3-SynE.IQ3XS.gguf | IQ3XS | 3.28GB | | Llama-3-SynE.IQ3S.gguf | IQ3S | 3.43GB | | Llama-3-SynE.Q3KS.gguf | Q3KS | 3.41GB | | Llama-3-SynE.IQ3M.gguf | IQ3M | 3.52GB | | Llama-3-SynE.Q3K.gguf | Q3K | 3.74GB | | Llama-3-SynE.Q3KM.gguf | Q3KM | 3.74GB | | Llama-3-SynE.Q3KL.gguf | Q3KL | 4.03GB | | Llama-3-SynE.IQ4XS.gguf | IQ4XS | 4.18GB | | Llama-3-SynE.Q40.gguf | Q40 | 4.34GB | | Llama-3-SynE.IQ4NL.gguf | IQ4NL | 4.38GB | | Llama-3-SynE.Q4KS.gguf | Q4KS | 4.37GB | | Llama-3-SynE.Q4K.gguf | Q4K | 4.58GB | | Llama-3-SynE.Q4KM.gguf | Q4KM | 4.58GB | | Llama-3-SynE.Q41.gguf | Q41 | 4.78GB | | Llama-3-SynE.Q50.gguf | Q50 | 5.21GB | | Llama-3-SynE.Q5KS.gguf | Q5KS | 5.21GB | | Llama-3-SynE.Q5K.gguf | Q5K | 5.34GB | | Llama-3-SynE.Q5KM.gguf | Q5KM | 5.34GB | | Llama-3-SynE.Q51.gguf | Q51 | 5.65GB | | Llama-3-SynE.Q6K.gguf | Q6K | 6.14GB | | Llama-3-SynE.Q80.gguf | Q80 | 7.95GB | Original model description: --- language: datasets: libraryname: transformers pipelinetag: text-generation --- ๐Ÿ“„ Report &nbsp | &nbsp ๐Ÿค— Model on Hugging Face&nbsp | &nbsp ๐Ÿ“Š CPT Dataset ๐Ÿ” English&nbsp | &nbsp็ฎ€ไฝ“ไธญๆ–‡ --> ๐Ÿ“„ Report &nbsp | &nbsp ๐Ÿ’ป GitHub Repo ๐Ÿ” English&nbsp | &nbsp็ฎ€ไฝ“ไธญๆ–‡ Here is the Llama-3-SynE model. The continual pre-training dataset is also available here. ๐Ÿ“„ Report &nbsp | &nbsp ๐Ÿ’ป GitHub Repo ๐Ÿ” English&nbsp | &nbsp็ฎ€ไฝ“ไธญๆ–‡ Here is the continual pre-training dataset. The Llama-3-SynE model is available here. --> ---

ggufarxiv:2407.18743endpoints_compatibleregion:us
richarderkhov/survivi_-_llama-3-syne-gguf visual
Downloads
1,384
Likes
0
Pipeline
โ€”
Library
โ€”
Visibility
Public
Access
Open

Repository Files & Downloads

22 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
Llama-3-SynE.IQ3_M.gguf GGUF IQ3_M 3.52 GB Download
Llama-3-SynE.IQ3_S.gguf GGUF IQ3_S 3.43 GB Download
Llama-3-SynE.IQ3_XS.gguf GGUF IQ3_XS 3.28 GB Download
Llama-3-SynE.IQ4_NL.gguf GGUF IQ4_NL 4.38 GB Download
Llama-3-SynE.IQ4_XS.gguf GGUF IQ4_XS 4.18 GB Download
Llama-3-SynE.Q2_K.gguf GGUF Q2_K 2.96 GB Download
Llama-3-SynE.Q3_K.gguf GGUF Q3_K 3.74 GB Download
Llama-3-SynE.Q3_K_L.gguf GGUF Q3_K_L 4.03 GB Download
Llama-3-SynE.Q3_K_M.gguf GGUF Q3_K_M 3.74 GB Download
Llama-3-SynE.Q3_K_S.gguf GGUF Q3_K_S 3.41 GB Download
Llama-3-SynE.Q4_0.gguf GGUF โ€” 4.34 GB Download
Llama-3-SynE.Q4_1.gguf GGUF โ€” 4.78 GB Download
Llama-3-SynE.Q4_K.gguf GGUF Q4_K 4.58 GB Download
Llama-3-SynE.Q4_K_M.gguf GGUF Q4_K_M 4.58 GB Download
Llama-3-SynE.Q4_K_S.gguf GGUF Q4_K_S 4.37 GB Download
Llama-3-SynE.Q5_0.gguf GGUF โ€” 5.21 GB Download
Llama-3-SynE.Q5_1.gguf GGUF โ€” 5.65 GB Download
Llama-3-SynE.Q5_K.gguf GGUF Q5_K 5.34 GB Download
Llama-3-SynE.Q5_K_M.gguf GGUF Q5_K_M 5.34 GB Download
Llama-3-SynE.Q5_K_S.gguf GGUF Q5_K_S 5.21 GB Download
Llama-3-SynE.Q6_K.gguf GGUF Q6_K 6.14 GB Download
Llama-3-SynE.Q8_0.gguf GGUF โ€” 7.95 GB Download

Model Details Live

Model Slug
richarderkhov/survivi_-_llama-3-syne-gguf
Author
RichardErkhov
Pipeline Task
โ€”
Library
โ€”
Created
2025-05-06
Last Modified
2025-05-06
Gated
No
Private
No
HF SHA
d4a9a43705588a13532c49f118b602a3cb0d63d5
License
Unknown
Language
Unknown
Base Model
Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "frontmatter": {},
    "hero_image_url": "https://github.com/RUC-GSAI/Llama-3-SynE/blob/main/assets/llama-3-syne-logo.png",
    "summary": "Quantization made by Richard Erkhov. Github Discord Request more models Llama-3-SynE - GGUF | Name | Quant method | Size | | ---- | ---- | ---- | | Llama-3-SynE.Q2_K.gguf | Q2_K | 2.96GB | | Llama-3-SynE.IQ3_XS.gguf | IQ3_XS | 3.28GB | | Llama-3-SynE.IQ3_S.gguf | IQ3_S | 3.43GB | | Llama-3-SynE.Q3_K_S.gguf | Q3_K_S | 3.41GB | | Llama-3-SynE.IQ3_M.gguf | IQ3_M | 3.52GB | | Llama-3-SynE.Q3_K.gguf | Q3_K | 3.74GB | | Llama-3-SynE.Q3_K_M.gguf | Q3_K_M | 3.74GB | | Llama-3-SynE.Q3_K_L.gguf | Q3_K_L | 4.03GB | | Llama-3-SynE.IQ4_XS.gguf | IQ4_XS | 4.18GB | | Llama-3-SynE.Q4_0.gguf | Q4_0 | 4.34GB | | Llama-3-SynE.IQ4_NL.gguf | IQ4_NL | 4.38GB | | Llama-3-SynE.Q4_K_S.gguf | Q4_K_S | 4.37GB | | Llama-3-SynE.Q4_K.gguf | Q4_K | 4.58GB | | Llama-3-SynE.Q4_K_M.gguf | Q4_K_M | 4.58GB | | Llama-3-SynE.Q4_1.gguf | Q4_1 | 4.78GB | | Llama-3-SynE.Q5_0.gguf | Q5_0 | 5.21GB | | Llama-3-SynE.Q5_K_S.gguf | Q5_K_S | 5.21GB | | Llama-3-SynE.Q5_K.gguf | Q5_K | 5.34GB | | Llama-3-SynE.Q5_K_M.gguf | Q5_K_M | 5.34GB | | Llama-3-SynE.Q5_1.gguf | Q5_1 | 5.65GB | | Llama-3-SynE.Q6_K.gguf | Q6_K | 6.14GB | | Llama-3-SynE.Q8_0.gguf | Q8_0 | 7.95GB | Original model description: --- language: datasets: library_name: transformers pipeline_tag: text-generation ---     ๐Ÿ“„  Report &nbsp | &nbsp ๐Ÿค— Model on Hugging Face&nbsp | &nbsp ๐Ÿ“Š CPT Dataset   ๐Ÿ” English&nbsp | &nbsp็ฎ€ไฝ“ไธญๆ–‡  -->     ๐Ÿ“„  Report &nbsp | &nbsp ๐Ÿ’ป GitHub Repo   ๐Ÿ” English&nbsp | &nbsp็ฎ€ไฝ“ไธญๆ–‡  > Here is the Llama-3-SynE model. The continual pre-training dataset is also available here.     ๐Ÿ“„  Report &nbsp | &nbsp ๐Ÿ’ป GitHub Repo   ๐Ÿ” English&nbsp | &nbsp็ฎ€ไฝ“ไธญๆ–‡  > Here is the continual pre-training dataset. The Llama-3-SynE model is available here. --> ---",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "Quantization made by Richard Erkhov.\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\n\nLlama-3-SynE - GGUF\n- Model creator: https://huggingface.co/survivi/\n- Original model: https://huggingface.co/survivi/Llama-3-SynE/\n\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [Llama-3-SynE.Q2_K.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q2_K.gguf) | Q2_K | 2.96GB |\n| [Llama-3-SynE.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.IQ3_XS.gguf) | IQ3_XS | 3.28GB |\n| [Llama-3-SynE.IQ3_S.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.IQ3_S.gguf) | IQ3_S | 3.43GB |\n| [Llama-3-SynE.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q3_K_S.gguf) | Q3_K_S | 3.41GB |\n| [Llama-3-SynE.IQ3_M.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.IQ3_M.gguf) | IQ3_M | 3.52GB |\n| [Llama-3-SynE.Q3_K.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q3_K.gguf) | Q3_K | 3.74GB |\n| [Llama-3-SynE.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q3_K_M.gguf) | Q3_K_M | 3.74GB |\n| [Llama-3-SynE.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q3_K_L.gguf) | Q3_K_L | 4.03GB |\n| [Llama-3-SynE.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.IQ4_XS.gguf) | IQ4_XS | 4.18GB |\n| [Llama-3-SynE.Q4_0.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q4_0.gguf) | Q4_0 | 4.34GB |\n| [Llama-3-SynE.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.IQ4_NL.gguf) | IQ4_NL | 4.38GB |\n| [Llama-3-SynE.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q4_K_S.gguf) | Q4_K_S | 4.37GB |\n| [Llama-3-SynE.Q4_K.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q4_K.gguf) | Q4_K | 4.58GB |\n| [Llama-3-SynE.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q4_K_M.gguf) | Q4_K_M | 4.58GB |\n| [Llama-3-SynE.Q4_1.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q4_1.gguf) | Q4_1 | 4.78GB |\n| [Llama-3-SynE.Q5_0.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q5_0.gguf) | Q5_0 | 5.21GB |\n| [Llama-3-SynE.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q5_K_S.gguf) | Q5_K_S | 5.21GB |\n| [Llama-3-SynE.Q5_K.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q5_K.gguf) | Q5_K | 5.34GB |\n| [Llama-3-SynE.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q5_K_M.gguf) | Q5_K_M | 5.34GB |\n| [Llama-3-SynE.Q5_1.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q5_1.gguf) | Q5_1 | 5.65GB |\n| [Llama-3-SynE.Q6_K.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q6_K.gguf) | Q6_K | 6.14GB |\n| [Llama-3-SynE.Q8_0.gguf](https://huggingface.co/RichardErkhov/survivi_-_Llama-3-SynE-gguf/blob/main/Llama-3-SynE.Q8_0.gguf) | Q8_0 | 7.95GB |\n\n\n\n\nOriginal model description:\n---\nlanguage:\n- en\n- zh\ndatasets:\n- survivi/Llama-3-SynE-Dataset\nlibrary_name: transformers\npipeline_tag: text-generation\n---\n\n<!-- <p align=\"center\">\n  <img src=\"https://github.com/RUC-GSAI/Llama-3-SynE/blob/main/assets/llama-3-syne-logo.png\" width=\"400\"/>\n</p>\n\n<p align=\"center\">\n ๐Ÿ“„ <a href=\"https://arxiv.org/abs/2407.18743\"> Report </a>&nbsp | &nbsp ๐Ÿค— <a href=\"https://huggingface.co/survivi/Llama-3-SynE\">Model on Hugging Face</a>&nbsp | &nbsp ๐Ÿ“Š <a href=\"https://huggingface.co/datasets/survivi/Llama-3-SynE-Dataset\">CPT Dataset</a>\n</p>\n\n<p align=\"center\">\n ๐Ÿ” <a href=\"https://github.com/RUC-GSAI/Llama-3-SynE/blob/main/README.md\">English</a>&nbsp | &nbsp<a href=\"https://github.com/RUC-GSAI/Llama-3-SynE/blob/main/README_zh.md\">็ฎ€ไฝ“ไธญๆ–‡</a>\n</p> -->\n\n<p align=\"center\">\n  <img src=\"https://cdn-uploads.huggingface.co/production/uploads/651a29d566e78720a78317ec/I2rqZ19OY2qvW1V6nOakg.png\" width=\"400\"/>\n</p>\n\n<p align=\"center\">\n ๐Ÿ“„ <a href=\"https://arxiv.org/abs/2407.18743\"> Report </a>&nbsp | &nbsp ๐Ÿ’ป <a href=\"https://github.com/RUC-GSAI/Llama-3-SynE\">GitHub Repo</a>\n</p>\n\n<p align=\"center\">\n ๐Ÿ” <a href=\"https://huggingface.co/survivi/Llama-3-SynE/blob/main/README.md\">English</a>&nbsp | &nbsp<a href=\"https://huggingface.co/survivi/Llama-3-SynE/blob/main/README_zh.md\">็ฎ€ไฝ“ไธญๆ–‡</a>\n</p>\n\n> Here is the Llama-3-SynE model. The continual pre-training dataset is also available [here](https://huggingface.co/datasets/survivi/Llama-3-SynE-Dataset).\n\n<!-- <p align=\"center\">\n  <img src=\"https://cdn-uploads.huggingface.co/production/uploads/651a29d566e78720a78317ec/I2rqZ19OY2qvW1V6nOakg.png\" width=\"400\"/>\n</p>\n\n<p align=\"center\">\n ๐Ÿ“„ <a href=\"https://arxiv.org/abs/2407.18743\"> Report </a>&nbsp | &nbsp ๐Ÿ’ป <a href=\"https://github.com/RUC-GSAI/Llama-3-SynE\">GitHub Repo</a>\n</p>\n\n<p align=\"center\">\n ๐Ÿ” <a href=\"https://huggingface.co/datasets/survivi/Llama-3-SynE-Dataset/blob/main/README.md\">English</a>&nbsp | &nbsp<a href=\"https://huggingface.co/datasets/survivi/Llama-3-SynE-Dataset/blob/main/README_zh.md\">็ฎ€ไฝ“ไธญๆ–‡</a>\n</p>\n\n> Here is the continual pre-training dataset. The Llama-3-SynE model is available [here](https://huggingface.co/survivi/Llama-3-SynE). -->\n\n---\n\n## News\n\n- ๐ŸŒŸ๐ŸŒŸ `2024/12/17`: We released the [code](https://github.com/RUC-GSAI/Llama-3-SynE/blob/main/src) used for continual pre-training and data preparation. The code contains detailed documentation comments.\n- โœจโœจ `2024/08/12`: We released the [continual pre-training dataset](https://huggingface.co/datasets/survivi/Llama-3-SynE-Dataset).\n- โœจโœจ `2024/08/10`: We released the [Llama-3-SynE model](https://huggingface.co/survivi/Llama-3-SynE).\n- โœจ `2024/07/26`: We released the [technical report](https://arxiv.org/abs/2407.18743), welcome to check it out!\n\n<p align=\"center\">\n  <img src=\"https://cdn-uploads.huggingface.co/production/uploads/651a29d566e78720a78317ec/NyF6C4JJ98E9PXxJ3R7mr.png\" width=\"800\"/>\n</p>\n\n## Model Introduction\n\n**Llama-3-SynE** (<ins>Syn</ins>thetic data <ins>E</ins>nhanced Llama-3) is a significantly enhanced version of [Llama-3 (8B)](https://github.com/meta-llama/llama3), achieved through continual pre-training (CPT) to improve its **Chinese language ability and scientific reasoning capability**. By employing a meticulously designed data mixture and curriculum strategy, Llama-3-SynE successfully enhances new abilities while maintaining the original modelโ€™s performance. This enhancement process involves utilizing existing datasets and synthesizing high-quality datasets specifically designed for targeted tasks.\n\nKey features of Llama-3-SynE include:\n\n- **Enhanced Chinese Language Capabilities**: Achieved through topic-based data mixture and perplexity-based data curriculum.\n- **Improved Scientific Reasoning**: Utilizing synthetic datasets to enhance multi-disciplinary scientific knowledge.\n- **Efficient CPT**: Only consuming around 100 billion tokens, making it a cost-effective solution.\n\n## Model List\n\n| Model        | Type | Seq Length | Download                                                      |\n| :----------- | :--- | :--------- | :------------------------------------------------------------ |\n| Llama-3-SynE | Base | 8K         | [๐Ÿค— Huggingface](https://huggingface.co/survivi/Llama-3-SynE) |\n\n## BenchMark\n\nWe divide all evaluation benchmarks into two groups. The first group is _major benchmarks_, which aim to evaluate the comprehensive capacities of LLMs. Note that we include commonly used math and code benchmarks in this group because it is standard practice to use these benchmarks for evaluating various general-purpose LLMs.\n\nThe second group is _scientific benchmarks_, which have a broader coverage of multidisciplinary scientific knowledge.\n\nWe report the eight-shot performance on GSM8K, ASDiv, and MAWPS, five-shot for C-Eval, CMMLU, MMLU, MATH, GaoKao, SciQ, SciEval, SAT-Math, and AQUA-RAT, three-shot for MBPP.\nFor HumanEval and ARC, we report the zero-shot evaluation performance. The best and second best are in **bold** and <ins>underlined</ins>, respectively.\n\n### Major Benchmarks\n\n| **Models**              | **MMLU**         | **C-Eval**       | **CMMLU**        | **MATH**         | **GSM8K**        | **ASDiv**        | **MAWPS**        | **SAT-Math**     | **HumanEval**    | **MBPP**         |\n| :---------------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- |\n| Llama-3-8B              | **66.60**        | 49.43            | 51.03            | 16.20            | 54.40            | 72.10            | 89.30            | 38.64            | <ins>36.59</ins> | **47.00**        |\n| DCLM-7B                 | 64.01            | 41.24            | 40.89            | 14.10            | 39.20            | 67.10            | 83.40            | <ins>41.36</ins> | 21.95            | 32.60            |\n| Mistral-7B-v0.3         | 63.54            | 42.74            | 43.72            | 12.30            | 40.50            | 67.50            | 87.50            | 40.45            | 25.61            | 36.00            |\n| Llama-3-Chinese-8B      | 64.10            | <ins>50.14</ins> | <ins>51.20</ins> | 3.60             | 0.80             | 1.90             | 0.60             | 36.82            | 9.76             | 14.80            |\n| MAmmoTH2-8B             | 64.89            | 46.56            | 45.90            | **34.10**        | **61.70**        | **82.80**        | <ins>91.50</ins> | <ins>41.36</ins> | 17.68            | 38.80            |\n| Galactica-6.7B          | 37.13            | 26.72            | 25.53            | 5.30             | 9.60             | 40.90            | 51.70            | 23.18            | 7.31             | 2.00             |\n| **Llama-3-SynE (ours)** | <ins>65.19</ins> | **58.24**        | **57.34**        | <ins>28.20</ins> | <ins>60.80</ins> | <ins>81.00</ins> | **94.10**        | **43.64**        | **42.07**        | <ins>45.60</ins> |\n\n> On **Chinese evaluation benchmarks** (such as C-Eval and CMMLU), Llama-3-SynE significantly outperforms the base model Llama-3 (8B), indicating that our method is very effective in improving Chinese language capabilities.\n\n> On **English evaluation benchmarks** (such as MMLU, MATH, and code evaluation benchmarks), Llama-3-SynE demonstrates comparable or better performance than the base model, indicating that our method effectively addresses the issue of catastrophic forgetting during the CPT process.\n\n### Scientific Benchmarks\n\n\"PHY\", \"CHE\", and \"BIO\" denote the physics, chemistry, and biology sub-tasks of the corresponding benchmarks.\n\n| **Models**              | **SciEval PHY**  | **SciEval CHE**  | **SciEval BIO**  | **SciEval Avg.** | **SciQ**         | **GaoKao MathQA** | **GaoKao CHE**   | **GaoKao BIO**   | **ARC Easy**     | **ARC Challenge** | **ARC Avg.**     | **AQUA-RAT**     |\n| :---------------------- | :--------------- | :--------------- | :--------------- | :--------------- | :--------------- | :---------------- | :--------------- | :--------------- | :--------------- | :---------------- | :--------------- | :--------------- |\n| Llama-3-8B              | 46.95            | 63.45            | 74.53            | 65.47            | 90.90            | 27.92             | 32.85            | 43.81            | 91.37            | 77.73             | 84.51            | <ins>27.95</ins> |\n| DCLM-7B                 | **56.71**        | 64.39            | 72.03            | 66.25            | **92.50**        | 29.06             | 31.40            | 37.14            | 89.52            | 76.37             | 82.94            | 20.08            |\n| Mistral-7B-v0.3         | 48.17            | 59.41            | 68.89            | 61.51            | 89.40            | 30.48             | 30.92            | 41.43            | 87.33            | 74.74             | 81.04            | 23.23            |\n| Llama-3-Chinese-8B      | 48.17            | 67.34            | 73.90            | <ins>67.34</ins> | 89.20            | 27.64             | 30.43            | 38.57            | 88.22            | 70.48             | 79.35            | 27.56            |\n| MAmmoTH2-8B             | 49.39            | **69.36**        | <ins>76.83</ins> | **69.60**        | 90.20            | **32.19**         | <ins>36.23</ins> | <ins>49.05</ins> | **92.85**        | **84.30**         | **88.57**        | 27.17            |\n| Galactica-6.7B          | 34.76            | 43.39            | 54.07            | 46.27            | 71.50            | 23.65             | 27.05            | 24.76            | 65.91            | 46.76             | 56.33            | 20.87            |\n| **Llama-3-SynE (ours)** | <ins>53.66</ins> | <ins>67.81</ins> | **77.45**        | **69.60**        | <ins>91.20</ins> | <ins>31.05</ins>  | **51.21**        | **69.52**        | <ins>91.58</ins> | <ins>80.97</ins>  | <ins>86.28</ins> | **28.74**        |\n\n> On **scientific evaluation benchmarks** (such as SciEval, GaoKao, and ARC), Llama-3-SynE significantly outperforms the base model, particularly showing remarkable improvement in Chinese scientific benchmarks (for example, a 25.71% improvement in the GaoKao biology subtest).\n\n## Quick Start\n\nUse the transformers backend for inference:\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\n\nmodel_path = \"survivi/Llama-3-SynE\"\ntokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_path, torch_dtype=torch.bfloat16, trust_remote_code=True\n)\nmodel.to(\"cuda:0\")\nmodel.eval()\nprompt = \"Hello world!\"\ninputs = tokenizer(prompt, return_tensors=\"pt\")\ninputs = inputs.to(\"cuda\")\npred = model.generate(\n    **inputs,\n    max_new_tokens=2048,\n    repetition_penalty=1.05,\n    temperature=0.5,\n    top_k=5,\n    top_p=0.85,\n    do_sample=True\n)\npred = pred[0][len(inputs.input_ids[0]) :]\noutput = tokenizer.decode(pred, skip_special_tokens=True)\nprint(output)\n```\n\nUse the vLLM backend for inference:\n\n```python\nfrom transformers import AutoTokenizer\nfrom vllm import LLM, SamplingParams\n\nmodel_path = \"survivi/Llama-3-SynE\"\ntokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)\nsampling_params = SamplingParams(\n    max_tokens=2048,\n    repetition_penalty=1.05,\n    temperature=0.5,\n    top_k=5,\n    top_p=0.85,\n)\nllm = LLM(\n    model=model_path,\n    tensor_parallel_size=1,\n    trust_remote_code=True,\n)\nprompt = \"Hello world!\"\noutput = llm.generate(prompt, sampling_params)\noutput = output[0].outputs[0].text\nprint(output)\n```\n\n## License\n\nThis project is built upon Meta's Llama-3 model. The use of Llama-3-SynE model weights must follow the Llama-3 [license agreement](https://github.com/meta-llama/llama3/blob/main/LICENSE). The code in this open-source repository follows the [Apache 2.0](LICENSE) license.\n\n## Citation\n\nIf you find our work helpful, please consider citing the following paper:\n\n```\n@article{jie2024llama3syne,\n  title={Towards Effective and Efficient Continual Pre-training of Large Language Models},\n  author={Chen, Jie and Chen, Zhipeng and Wang, Jiapeng and Zhou, Kun and Zhu, Yutao and Jiang, Jinhao and Min, Yingqian and Zhao, Wayne Xin and Dou, Zhicheng and Mao, Jiaxin and others},\n  journal={arXiv preprint arXiv:2407.18743},\n  year={2024}\n}\n```\n\n\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "arxiv:2407.18743",
    "endpoints_compatible",
    "region:us"
  ],
  "likes": 0,
  "downloads": 1384,
  "gated": false,
  "private": false,
  "last_modified": "2025-05-06T03:16:31.000Z",
  "created_at": "2025-05-06T00:15:56.000Z",
  "pipeline_tag": "",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "681954bc3f189295fcad643e",
  "id": "RichardErkhov/survivi_-_Llama-3-SynE-gguf",
  "modelId": "RichardErkhov/survivi_-_Llama-3-SynE-gguf",
  "sha": "d4a9a43705588a13532c49f118b602a3cb0d63d5",
  "createdAt": "2025-05-06T00:15:56.000Z",
  "lastModified": "2025-05-06T03:16:31.000Z",
  "author": "RichardErkhov",
  "downloads": 1384,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 24
}