Model Intelligence Sheet

richarderkhov/discoresearch_-_llama3-german-8b-32k-gguf overview

This version of the model refers to the long-context extension version described below Llama3-German-8B-v0.1 is a large language model based on Meta's Llama3-8B. It is specialized for the German language through continuous pretraining on 65 billion high-quality tokens, similar to previous LeoLM or Occiglot models. Llama3 itself was trained on 15T tokens, of which only <1T were multilingual, resulting in suboptimal performance in German with reduced linguistic capabilities and frequent grammatical errors, motivating the necessity for continued pretraining. Benchmark results on our model show minimal degradation in English performance, despite the absence of replay during training. Importantly, Llama3-German-8B-v0.1 demonstrates strong improvements in German, particularly on the Hellaswag benchmark, which measures linguistic understanding and general reasoning. DiscoResearch/Llama3-German-8B-v0.1 is the result of a joint effort between DiscoResearch and Occiglot with support from the DFKI (German Research Center for Artificial Intelligence) and hessian.Ai. Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest dataset release, as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.

ggufarxiv:2404.10830endpoints_compatibleregion:us

richarderkhov/discoresearch_-_llama3-german-8b-32k-gguf visual

Downloads

Likes

Pipeline

—

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

22 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Llama3-German-8B-32k.IQ3_M.gguf	GGUF	IQ3_M	3.52 GB	Download
Llama3-German-8B-32k.IQ3_S.gguf	GGUF	IQ3_S	3.43 GB	Download
Llama3-German-8B-32k.IQ3_XS.gguf	GGUF	IQ3_XS	3.28 GB	Download
Llama3-German-8B-32k.IQ4_NL.gguf	GGUF	IQ4_NL	4.38 GB	Download
Llama3-German-8B-32k.IQ4_XS.gguf	GGUF	IQ4_XS	4.18 GB	Download
Llama3-German-8B-32k.Q2_K.gguf	GGUF	Q2_K	2.96 GB	Download
Llama3-German-8B-32k.Q3_K.gguf	GGUF	Q3_K	3.74 GB	Download
Llama3-German-8B-32k.Q3_K_L.gguf	GGUF	Q3_K_L	4.03 GB	Download
Llama3-German-8B-32k.Q3_K_M.gguf	GGUF	Q3_K_M	3.74 GB	Download
Llama3-German-8B-32k.Q3_K_S.gguf	GGUF	Q3_K_S	3.41 GB	Download
Llama3-German-8B-32k.Q4_0.gguf	GGUF	—	4.34 GB	Download
Llama3-German-8B-32k.Q4_1.gguf	GGUF	—	4.78 GB	Download
Llama3-German-8B-32k.Q4_K.gguf	GGUF	Q4_K	4.58 GB	Download
Llama3-German-8B-32k.Q4_K_M.gguf	GGUF	Q4_K_M	4.58 GB	Download
Llama3-German-8B-32k.Q4_K_S.gguf	GGUF	Q4_K_S	4.37 GB	Download
Llama3-German-8B-32k.Q5_0.gguf	GGUF	—	5.21 GB	Download
Llama3-German-8B-32k.Q5_1.gguf	GGUF	—	5.65 GB	Download
Llama3-German-8B-32k.Q5_K.gguf	GGUF	Q5_K	5.34 GB	Download
Llama3-German-8B-32k.Q5_K_M.gguf	GGUF	Q5_K_M	5.34 GB	Download
Llama3-German-8B-32k.Q5_K_S.gguf	GGUF	Q5_K_S	5.21 GB	Download
Llama3-German-8B-32k.Q6_K.gguf	GGUF	Q6_K	6.14 GB	Download
Llama3-German-8B-32k.Q8_0.gguf	GGUF	—	7.95 GB	Download

Model Details Live

Model Slug

richarderkhov/discoresearch_-_llama3-german-8b-32k-gguf

Author

RichardErkhov

Pipeline Task

—

Library

—

Created

2024-08-21

Last Modified

2024-08-21

Gated

Private

HF SHA

743d93af58c277ff93b63a236334914d7b6e17f8

License

Unknown

Language

Unknown

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "frontmatter": {},
    "hero_image_url": "base_model_evals.png",
    "summary": "This version of the model refers to the long-context extension version described below Llama3-German-8B-v0.1 is a large language model based on Meta's Llama3-8B. It is specialized for the German language through continuous pretraining on 65 billion high-quality tokens, similar to previous LeoLM or Occiglot models. Llama3 itself was trained on 15T tokens, of which only <1T were multilingual, resulting in suboptimal performance in German with reduced linguistic capabilities and frequent grammatical errors, motivating the necessity for continued pretraining. Benchmark results on our model show minimal degradation in English performance, despite the absence of replay during training. Importantly, Llama3-German-8B-v0.1 demonstrates strong improvements in German, particularly on the Hellaswag benchmark, which measures linguistic understanding and general reasoning. DiscoResearch/Llama3-German-8B-v0.1 is the result of a joint effort between DiscoResearch and Occiglot with support from the DFKI (German Research Center for Artificial Intelligence) and hessian.Ai. Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest dataset release, as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "Quantization made by Richard Erkhov.\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\n\nLlama3-German-8B-32k - GGUF\n- Model creator: https://huggingface.co/DiscoResearch/\n- Original model: https://huggingface.co/DiscoResearch/Llama3-German-8B-32k/\n\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [Llama3-German-8B-32k.Q2_K.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q2_K.gguf) | Q2_K | 2.96GB |\n| [Llama3-German-8B-32k.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.IQ3_XS.gguf) | IQ3_XS | 3.28GB |\n| [Llama3-German-8B-32k.IQ3_S.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.IQ3_S.gguf) | IQ3_S | 3.43GB |\n| [Llama3-German-8B-32k.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q3_K_S.gguf) | Q3_K_S | 3.41GB |\n| [Llama3-German-8B-32k.IQ3_M.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.IQ3_M.gguf) | IQ3_M | 3.52GB |\n| [Llama3-German-8B-32k.Q3_K.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q3_K.gguf) | Q3_K | 3.74GB |\n| [Llama3-German-8B-32k.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q3_K_M.gguf) | Q3_K_M | 3.74GB |\n| [Llama3-German-8B-32k.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q3_K_L.gguf) | Q3_K_L | 4.03GB |\n| [Llama3-German-8B-32k.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.IQ4_XS.gguf) | IQ4_XS | 4.18GB |\n| [Llama3-German-8B-32k.Q4_0.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q4_0.gguf) | Q4_0 | 4.34GB |\n| [Llama3-German-8B-32k.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.IQ4_NL.gguf) | IQ4_NL | 4.38GB |\n| [Llama3-German-8B-32k.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q4_K_S.gguf) | Q4_K_S | 4.37GB |\n| [Llama3-German-8B-32k.Q4_K.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q4_K.gguf) | Q4_K | 4.58GB |\n| [Llama3-German-8B-32k.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q4_K_M.gguf) | Q4_K_M | 4.58GB |\n| [Llama3-German-8B-32k.Q4_1.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q4_1.gguf) | Q4_1 | 4.78GB |\n| [Llama3-German-8B-32k.Q5_0.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q5_0.gguf) | Q5_0 | 5.21GB |\n| [Llama3-German-8B-32k.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q5_K_S.gguf) | Q5_K_S | 5.21GB |\n| [Llama3-German-8B-32k.Q5_K.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q5_K.gguf) | Q5_K | 5.34GB |\n| [Llama3-German-8B-32k.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q5_K_M.gguf) | Q5_K_M | 5.34GB |\n| [Llama3-German-8B-32k.Q5_1.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q5_1.gguf) | Q5_1 | 5.65GB |\n| [Llama3-German-8B-32k.Q6_K.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q6_K.gguf) | Q6_K | 6.14GB |\n| [Llama3-German-8B-32k.Q8_0.gguf](https://huggingface.co/RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf/blob/main/Llama3-German-8B-32k.Q8_0.gguf) | Q8_0 | 7.95GB |\n\n\n\n\nOriginal model description:\n---\nlicense: llama3\nlanguage:\n- de\nlibrary_name: transformers\n---\n\n# Llama3-German-8B-32k (version 0.1)\n\nThis version of the model refers to the long-context extension version described [below](https://huggingface.co/DiscoResearch/Llama3-German-8B-32k#long-context-extension)\n\nLlama3-German-8B-v0.1 is a large language model based on [Meta's Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B). It is specialized for the German language through continuous pretraining on 65 billion high-quality tokens, similar to previous [LeoLM](https://huggingface.co/LeoLM) or [Occiglot](https://huggingface.co/collections/occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01) models.\n\nLlama3 itself was trained on 15T tokens, of which only <1T were multilingual, resulting in suboptimal performance in German with reduced linguistic capabilities and frequent grammatical errors, motivating the necessity for continued pretraining. Benchmark results on our model show minimal degradation in English performance, despite the absence of replay during training. Importantly, Llama3-German-8B-v0.1 demonstrates strong improvements in German, particularly on the Hellaswag benchmark, which measures linguistic understanding and general reasoning. \n\n[DiscoResearch/Llama3-German-8B-v0.1](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729) is the result of a joint effort between [DiscoResearch](https://huggingface.co/DiscoResearch) and [Occiglot](https://huggingface.co/occiglot) with support from the [DFKI](https://www.dfki.de/web/) (German Research Center for Artificial Intelligence) and [hessian.Ai](https://hessian.ai). Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest [dataset release](https://huggingface.co/datasets/occiglot/occiglot-fineweb-v0.5), as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.\n\n## How to use\nThis is a base model and should probably be subject to finetuning before use. See our [collection](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729) for various finetuned and long-context versions.\n\n## Model Training and Hyperparameters\nThe model was trained on 128 GPUs on [hessian.Ai 42](hessian.ai) for ~60 hours. See detailed hyperparameters below.\n\n| Parameter         | Value                             |\n|-------------------|-----------------------------------|\n| Sequence Length   | 8192 tokens                       |\n| Learning Rate     | 1.5e-5 to 1.5e-6 (cosine schedule)|\n| Batch Size        | 4194304 (512*8192) tokens         |\n| Micro Batch Size  | 4*8192 tokens                     |\n| Training Steps    | 15500                             |\n| Warmup Steps      | 155 (1%)                          |\n| Weight Decay      | 0.05                              |\n| Optimizer         | AdamW                             |\n\n\n## Data Collection and Preprocessing\n\nFor pre-training, we used 65B German tokens from the [occiglot-fineweb-0.5](https://huggingface.co/datasets/occiglot/occiglot-fineweb-v0.5) dataset. \nThe data comprises multiple curated datasets from [LLM-Datasets](https://github.com/malteos/llm-datasets) as well as 12 [Common-Crawl](https://commoncrawl.org) releases that were processed with [OSCAR's Ungoliant pipeline](https://github.com/oscar-project/ungoliant). \n\nAll data was further filtered with a set of language-specific filters based on [Huggingface's fine-web](https://github.com/huggingface/datatrove/blob/main/examples/fineweb.py) and globally deduplicated. \n\nFor more information please refer to the [dataset card](https://huggingface.co/datasets/occiglot/occiglot-fineweb-v0.5) and corresponding [blog-post](https://occiglot.eu/posts/occiglot-fineweb/).\n\n## Evaluation and Results\n\nWe evaluated the model using a suite of common English Benchmarks and their German counterparts with [GermanBench](https://github.com/bjoernpl/GermanBenchmark).\n\nThe following figure shows the benchmark results in comparison to the base model [meta-llama/Meta-Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and two different hyperparameter configurations. \nWe swept different learning rates to identify a well-working setup. The final released model is the 1.5e-5 lr version.\n![alt text](base_model_evals.png)\n\nFind the detailed benchmark scores for the base and long-context models in this table.\n\n| Model                                | truthful_qa_de | truthfulqa_mc | arc_challenge | arc_challenge_de | hellaswag | hellaswag_de | MMLU   | MMLU-DE | mean       |\n|--------------------------------------|----------------|---------------|---------------|------------------|-----------|--------------|--------|---------|------------|\n| DiscoResearch/Llama3-German-8B       | **0.49499**    | 0.44838       | 0.55802       | **0.49829**      | 0.79924   | **0.65395**  | 0.62240| **0.54413** | **0.57743** |\n| DiscoResearch/Llama3-German-8B-32k   | 0.48920        | **0.45138**   | 0.54437       | 0.49232          | 0.79078   | 0.64310      | 0.58774| 0.47971  | 0.55982    |\n| meta-llama/Meta-Llama-3-8B-Instruct  | 0.47498        | 0.43923       | **0.59642**   | 0.47952          | **0.82025**| 0.60008      | **0.66658**| 0.53541  | 0.57656    |\n\n## Long-Context Extension\n\nIn addition to the base model, we release a long-context version of Llama3-German-8B ([DiscoResearch/Llama3-German-8B-32k](https://huggingface.co/DiscoResearch/Llama3-German-8B-32k) capable of processing context lengths up to 65k tokens. This variant was trained on an additional 100 million tokens at 32k context length, using a rope_theta value of `1.5e6` and a learning rate of `1.5e-5` with a batch size of `256*8192` tokens and otherwise equal hyperparameters to the base model. \n\n## Instruction Tuning\n\nWe also provide an instruction-tuned version: [DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1](https://huggingface.co/DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1), utilizing the DiscoLM German dataset for fine-tuning (also available as a long-context model at [DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1](https://huggingface.co/DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1)).\nFind more details in the respective model cards. Also check out our experimental merge ([DiscoResearch/Llama3-DiscoLeo-8B-DARE-Experimental](https://huggingface.co/DiscoResearch/Llama3-DiscoLeo-8B-DARE-Experimental)) between [meta-llama/Meta-Llama3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) and our finetuned model in an attempt to keep the extraordinary capabilities of Llama3-Instruct and add exceptional German skills.\n\n## Document Packing\n\nWe employed a more intelligent document packing strategy based on the [\"Fewer Truncations Improve Language Modeling\" paper by Ding et al.](https://arxiv.org/abs/2404.10830v2), using the first-fit-decreasing algorithm to pack documents into batches without truncation. \nWe packed our data in chunks of 10000 documents for more efficient processing while maintaining >99% packing efficiency. Documents longer than the sequence length are split into chunks of sequence length.\n\nThis approach results in overall higher benchmark scores when training on the same data with equal hyperparameters. The following numbers are from initial experiments with `3e-5 lr` and 12k steps and show improvements comparable to those shown in the original paper.\n\n| Task              | Naive Packing | Fewer Truncations Packing | Percentage Increase |\n|-------------------|---------------|---------------------------|---------------------|\n| truthfulqa_mc     | 0.452648      | 0.467687                  | 3.32%               |\n| arc_challenge     | 0.517918      | 0.528157                  | 1.98%               |\n| truthful_qa_de    | 0.485529      | 0.492979                  | 1.53%               |\n| arc_challenge_de  | 0.480375      | 0.493174                  | 2.66%               |\n| hellaswag         | 0.776041      | 0.773352                  | -0.35%              |\n| hellaswag_de      | 0.655248      | 0.653356                  | -0.29%              |\n| MMLU              | 0.573719      | 0.579802                  | 1.06%               |\n| MMLU-DE           | 0.504509      | 0.503863                  | -0.13%              |\n\nThe following is our simple implementation of the first-fit-decreasing algorithm described in the paper.\n```python\ndef pack_documents(tokenized_documents):\n    # Sort documents by their length in descending order\n    sorted_docs = sorted(tokenized_documents, key=len, reverse=True)\n    \n    # Initialize bins\n    bins = []\n    \n    # Function to find the first bin that can accommodate the document\n    def find_bin(doc):\n        for b in bins:\n            if sum(len(d) for d in b) + len(doc) <= 8192:\n                return b\n        return None\n    \n    # Place each document in the first available bin or create a new bin\n    for doc in sorted_docs:\n        target_bin = find_bin(doc)\n        if target_bin is not None:\n            target_bin.append(doc)\n        else:\n            # Create a new bin with this document if no suitable bin is found\n            bins.append([doc])\n    \n    # Return results\n    return bins\n```\n\n## Model Configurations\n\nWe release DiscoLeo-8B in the following configurations:\n1. [Base model with continued pretraining](https://huggingface.co/DiscoResearch/Llama3-German-8B)\n2. [Long-context version (32k context length)](https://huggingface.co/DiscoResearch/Llama3-German-8B-32k)\n3. [Instruction-tuned version of the base model](https://huggingface.co/DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1)\n4. [Instruction-tuned version of the long-context model](https://huggingface.co/DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1)\n5. [Experimental `DARE-TIES` Merge with Llama3-Instruct](https://huggingface.co/DiscoResearch/Llama3-DiscoLeo-8B-DARE-Experimental)\n6. [Collection of Quantized versions](https://huggingface.co/collections/DiscoResearch/discoleo-8b-quants-6651bcf8f72c9a37ce485d42)\n\n## How to use:\nHere's how to use the model with transformers:\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nimport torch\n\ndevice=\"cuda\"\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    \"DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1\",\n    torch_dtype=\"auto\",\n    device_map=\"auto\"\n)\ntokenizer = AutoTokenizer.from_pretrained(\"DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1\")\n\nprompt = \"Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft\"\nmessages = [\n    {\"role\": \"system\", \"content\": \"Du bist ein hilfreicher Assistent.\"},\n    {\"role\": \"user\", \"content\": prompt}\n]\ntext = tokenizer.apply_chat_template(\n    messages,\n    tokenize=False,\n    add_generation_prompt=True\n)\nmodel_inputs = tokenizer([text], return_tensors=\"pt\").to(device)\n\ngenerated_ids = model.generate(\n    model_inputs.input_ids,\n    max_new_tokens=512\n)\ngenerated_ids = [\n    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)\n]\n\nresponse = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]\n```\n\n## Acknowledgements\n\nThe model was trained and evaluated by [Björn Plüster](https://huggingface.co/bjoernp) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)) with data preparation and project supervision by [Manuel Brack](http://manuel-brack.eu) ([DFKI](https://www.dfki.de/web/), [TU-Darmstadt](https://www.tu-darmstadt.de/)). Initial work on dataset collection and curation was performed by [Malte Ostendorff](https://ostendorff.org) and [Pedro Ortiz Suarez](https://portizs.eu). Instruction tuning was done with the DiscoLM German dataset created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)). We extend our gratitude to [LAION](https://laion.ai/) and friends, especially  [Christoph Schuhmann](https://entwickler.de/experten/christoph-schuhmann) and [Jenia Jitsev](https://huggingface.co/JJitsev), for initiating this collaboration.\n\nThe model training was supported by a compute grant at the [42 supercomputer](https://hessian.ai/)  which is a central component in the development of [hessian AI](https://hessian.ai/), the [AI Innovation Lab](https://hessian.ai/infrastructure/ai-innovationlab/) (funded by the [Hessian Ministry of Higher Education, Research and the Art (HMWK)](https://wissenschaft.hessen.de) & the [Hessian Ministry of the Interior, for Security and Homeland Security (HMinD)](https://innen.hessen.de)) and the [AI Service Centers](https://hessian.ai/infrastructure/ai-service-centre/) (funded by the [German Federal Ministry for Economic Affairs and Climate Action (BMWK)](https://www.bmwk.de/Navigation/EN/Home/home.html)).\nThe curation of the training data is partially funded by the [German Federal Ministry for Economic Affairs and Climate Action (BMWK)](https://www.bmwk.de/Navigation/EN/Home/home.html)\nthrough the project [OpenGPT-X](https://opengpt-x.de/en/) (project no. 68GX21007D).\n\n\n\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "arxiv:2404.10830",
    "endpoints_compatible",
    "region:us"
  ],
  "likes": 0,
  "downloads": 88,
  "gated": false,
  "private": false,
  "last_modified": "2024-08-21T10:19:36.000Z",
  "created_at": "2024-08-21T08:19:28.000Z",
  "pipeline_tag": "",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "66c5a3107c428fa740fff6bc",
  "id": "RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf",
  "modelId": "RichardErkhov/DiscoResearch_-_Llama3-German-8B-32k-gguf",
  "sha": "743d93af58c277ff93b63a236334914d7b6e17f8",
  "createdAt": "2024-08-21T08:19:28.000Z",
  "lastModified": "2024-08-21T10:19:36.000Z",
  "author": "RichardErkhov",
  "downloads": 88,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 24
}