Model Intelligence Sheet

richarderkhov/cmarkea_-_bloomz-560m-sft-chat-gguf overview

Quantization made by Richard Erkhov. Github Discord Request more models bloomz-560m-sft-chat - GGUF | Name | Quant method | Size | | ---- | ---- | ---- | | bloomz-560m-sft-chat.Q2K.gguf | Q2K | 0.39GB | | bloomz-560m-sft-chat.Q3KS.gguf | Q3KS | 0.43GB | | bloomz-560m-sft-chat.Q3K.gguf | Q3K | 0.46GB | | bloomz-560m-sft-chat.Q3KM.gguf | Q3KM | 0.46GB | | bloomz-560m-sft-chat.Q3KL.gguf | Q3KL | 0.47GB | | bloomz-560m-sft-chat.IQ4XS.gguf | IQ4XS | 0.49GB | | bloomz-560m-sft-chat.Q40.gguf | Q40 | 0.5GB | | bloomz-560m-sft-chat.IQ4NL.gguf | IQ4NL | 0.5GB | | bloomz-560m-sft-chat.Q4KS.gguf | Q4KS | 0.5GB | | bloomz-560m-sft-chat.Q4K.gguf | Q4K | 0.52GB | | bloomz-560m-sft-chat.Q4KM.gguf | Q4KM | 0.52GB | | bloomz-560m-sft-chat.Q41.gguf | Q41 | 0.53GB | | bloomz-560m-sft-chat.Q50.gguf | Q50 | 0.57GB | | bloomz-560m-sft-chat.Q5KS.gguf | Q5KS | 0.57GB | | bloomz-560m-sft-chat.Q5K.gguf | Q5K | 0.58GB | | bloomz-560m-sft-chat.Q5KM.gguf | Q5KM | 0.58GB | | bloomz-560m-sft-chat.Q51.gguf | Q51 | 0.6GB | | bloomz-560m-sft-chat.Q6K.gguf | Q6K | 0.64GB | | bloomz-560m-sft-chat.Q80.gguf | Q80 | 0.82GB | Original model description: --- license: bigscience-bloom-rail-1.0 datasets: language: libraryname: transformers pipelinetag: text-generation inference: parameters: maxnewtokens: 128 widget: --- bloomz-560m-sft-chat -------------------- We introduce the bloomz-560m-sft-chat model, which is a fine-tuning of a Large Language Model (LLM) bigscience/bloomz-560m. This model is notable for being pre-trained for a chatbot context and undergoing a transposition from float16 to bfloat16. Therefore, this model serves as a solid starting point for fine-tuning towards other more specific tasks. The model was trained equally on both French and English data, ensuring maximum efficiency for these two languages (and their interactions). Due to the transition from float16 to bfloat16, we do not guarantee the preservation of the original model's multilingual capabilities. However, fine-tuning can restore reasonable performance on other languages. The objective is to pre-train all three models (Bloomz-{560m, 3b, 7b1-mt}-sft-chat) to ensure high-performing, energy-efficient, and fast "foundation" models for inference on "realistic" infrastructures suitable for a business with standard industrial capabilities. Bloomz, through its license, enables free and flexible industrial use. Its tokenizer has been designed with true multi-lingual context in mind, with a significantly lower token generation per word compared to other LLM models. This capability not only leads to improved performance but also enhanced efficiency during inference by making fewer model calls when generating text with shorter contexts. Here is a table illustrating our points using French as an example, where we tokenized Marcel Proust's longest sentence (823 words): | model | GPT 3.5 | Boris | Flan-T5 | LLaMA | Dolly | MPT | Falcon | Bloomz | |:--------------:|:-------:|:-----:|:-------:|:-----:|:-----:|:---:|:------:|:------:| | tokens per word | 2.3 | 2.3 | 2 | 1.9 | 1.9 | 1.9 | 1.8 | 1.4 | For comparison, with a specialized French tokenizer like CamemBERT or DistilCamemBERT, we have 1.5 tokens per word. In addition to its positive impact on inference time and resource consumption, there has already been shown that there is a direct relationship between the number of tokens per word required for modeling and the predictive performance of the model. Dataset ------- After analyzing a substantial set of modelings, we have observed that the most effective pre-training for zero-shot use cases is pre-training for chatbot contexts. This study was conducted internally, focusing specifically on the French context. As a result, we trained the model on a dataset comprising 0.9 billion tokens. This dataset consists of interactions between an individual and a third party. To balance the French and English data, we utilized the Google Translate API. Training -------- Here is the table summarizing the architecture used for training, along with the training time and average inference speed per token on the target architecture in tokens per second: | model | Architecture | Training time (h) | Inference speed (tokens per second) | |:----------------------:|:-------------:|:-----------------:|:-----------------------------------:| | bloomz-560m-sft-chat | 1 x A100 40GB | 41 | 29 | | bloomz-3b-sft-chat | 1 x A100 40GB | 140 | 13 | | bloomz-7b1-mt-sft-chat | 4 x A100 40GB | 268 | 8 | | Hyperparameter | Value | |:---------------------:|:----------:| | label smoothing | 0.05 | | optimizer | AdamW | | betas | 0.9, 0.999 | | AMSGrad | True | | learning rate | 5e-4 | | anneal strategy | cos | | div factor | 100 | | final div factor | 0.1 | | batch size | 16 | | gradient accumulation | 25 | | max length | 1500 | Experimentations ---------------- Since the model is trained only on English and French corpora, the performance of the model cannot be guaranteed in other languages. This degradation in performance in other languages is also due to the change in the model's data type from float16 to bfloat16. The conversation example below illustrates this point: | French | English | Spanish | |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | human: C'est quoi la première guerre mondiale ? bot: La première guerre mondiale est une guerre entre les États-Unis et la Grande-Bretagne. human: Tu es certain de ça ? bot: Oui, je suis sûr. human: Elle a eu lieu quand ? bot: La première guerre mondiale a eu lieu en 1914. human: Et elle a fini quand ? bot: La première guerre mondiale a eu lieu en 1918. human: Elle a opposé qui a qui ? bot: La première guerre mondiale a eu lieu entre les États-Unis et la Grande-Bretagne. | human: What is the first world war? bot: The first world war was a conflict between the United States and the Soviet Union. The conflict began in World War II and lasted until the end of World War III. human: Are you sure that this war was between these two belligerents and that it started during the Second World War? bot: Yes, the conflict was between the United States and the Soviet Union. human: What are the start and end dates of the first war? bot: The first war began in World War II and lasted until the end of World War III. | human: ¿Qué es la Primera Guerra Mundial? bot: La Primera Guerra Mundial fue una guerra entre los Estados Unidos y la Alemania, que se extendió por un total de 18 meses. human: ¿Solo estos dos países y por tan poco tiempo? bot: La Primera Guerra Mundial fue una guerra entre los Estados Unidos y la Alemania, que se extendió por un total de 18 meses. | Evaluating the performance of generative models can be challenging, and quantitative measures may not always provide the most relevant insights for this type of approach. In this regard, we conducted a qualitative performance study. It involves assessing the relevance of model responses to a pool of French questions using blind evaluation. The metric used for evaluation is the average of two criteria: response accuracy and completeness, similar to what the SSA metric aims to capture. Please note that this measure lacks rigorous scientific validity due to the limited number of questions in the test dataset and the fact that it involves only a few individuals with similar socio-demographic characteristics. The prompts take the same format for all models: As a result, the prompts do not exploit the structures of chatbot models to ensure fairness, and the evaluation quantifies performance in a purely instruction-based approach. The figure below illustrates the results. The higher a model is positioned in the top-left corner with a small circle radius, the better the model; conversely, if a model is towards the bottom-right with a large circle, it performs less favorably. !constellation We observe that across all models, the performance gain is logarithmic in relation to the increase in model parameters. However, for models that undergo multiple pre-trainings (vanilla, instruction, and chat), models pre-trained on instruction and chat perform significantly better in zero-shot contexts, with a notable improvement for chat-based approaches. The models we have trained demonstrate promising efficiency in this test compared to the number of parameters, indicating cost-effectiveness in a production context. How to use bloomz-560m-sft-chat ------------------------------- There are no specific instructions for using these models in a normal causal inference context. However, to leverage the chatbot capability of the model, an individual's prompt should be preceded by the EOS token (</s>), and the generated part should be preceded by the BOS token (<s>). The structure takes the following form: For example, to load the model using the HuggingFace pipeline interface: Citation --------

ggufarxiv:2012.15613arxiv:2001.09977endpoints_compatibleregion:us

richarderkhov/cmarkea_-_bloomz-560m-sft-chat-gguf visual

Downloads

937

Likes

Pipeline

—

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

19 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
bloomz-560m-sft-chat.IQ4_NL.gguf	GGUF	IQ4_NL	515.56 MB	Download
bloomz-560m-sft-chat.IQ4_XS.gguf	GGUF	IQ4_XS	499.28 MB	Download
bloomz-560m-sft-chat.Q2_K.gguf	GGUF	Q2_K	401.33 MB	Download
bloomz-560m-sft-chat.Q3_K.gguf	GGUF	Q3_K	469.27 MB	Download
bloomz-560m-sft-chat.Q3_K_L.gguf	GGUF	Q3_K_L	483.77 MB	Download
bloomz-560m-sft-chat.Q3_K_M.gguf	GGUF	Q3_K_M	469.27 MB	Download
bloomz-560m-sft-chat.Q3_K_S.gguf	GGUF	Q3_K_S	443.27 MB	Download
bloomz-560m-sft-chat.Q4_0.gguf	GGUF	—	514.06 MB	Download
bloomz-560m-sft-chat.Q4_1.gguf	GGUF	—	547.37 MB	Download
bloomz-560m-sft-chat.Q4_K.gguf	GGUF	Q4_K	535.44 MB	Download
bloomz-560m-sft-chat.Q4_K_M.gguf	GGUF	Q4_K_M	535.44 MB	Download
bloomz-560m-sft-chat.Q4_K_S.gguf	GGUF	Q4_K_S	515.56 MB	Download
bloomz-560m-sft-chat.Q5_0.gguf	GGUF	—	580.69 MB	Download
bloomz-560m-sft-chat.Q5_1.gguf	GGUF	—	614.00 MB	Download
bloomz-560m-sft-chat.Q5_K.gguf	GGUF	Q5_K	596.62 MB	Download
bloomz-560m-sft-chat.Q5_K_M.gguf	GGUF	Q5_K_M	596.62 MB	Download
bloomz-560m-sft-chat.Q5_K_S.gguf	GGUF	Q5_K_S	580.69 MB	Download
bloomz-560m-sft-chat.Q6_K.gguf	GGUF	Q6_K	651.48 MB	Download
bloomz-560m-sft-chat.Q8_0.gguf	GGUF	—	839.90 MB	Download

Model Details Live

Model Slug

richarderkhov/cmarkea_-_bloomz-560m-sft-chat-gguf

Author

RichardErkhov

Pipeline Task

—

Library

—

Created

2024-11-07

Last Modified

2024-11-07

Gated

Private

HF SHA

f7389f2167d9208153def3f90a29a186f4ebc1ac

License

Unknown

Language

Unknown

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "frontmatter": {},
    "hero_image_url": "https://i.postimg.cc/kggYhKg9/constellation.png",
    "summary": "Quantization made by Richard Erkhov. Github Discord Request more models bloomz-560m-sft-chat - GGUF | Name | Quant method | Size | | ---- | ---- | ---- | | bloomz-560m-sft-chat.Q2_K.gguf | Q2_K | 0.39GB | | bloomz-560m-sft-chat.Q3_K_S.gguf | Q3_K_S | 0.43GB | | bloomz-560m-sft-chat.Q3_K.gguf | Q3_K | 0.46GB | | bloomz-560m-sft-chat.Q3_K_M.gguf | Q3_K_M | 0.46GB | | bloomz-560m-sft-chat.Q3_K_L.gguf | Q3_K_L | 0.47GB | | bloomz-560m-sft-chat.IQ4_XS.gguf | IQ4_XS | 0.49GB | | bloomz-560m-sft-chat.Q4_0.gguf | Q4_0 | 0.5GB | | bloomz-560m-sft-chat.IQ4_NL.gguf | IQ4_NL | 0.5GB | | bloomz-560m-sft-chat.Q4_K_S.gguf | Q4_K_S | 0.5GB | | bloomz-560m-sft-chat.Q4_K.gguf | Q4_K | 0.52GB | | bloomz-560m-sft-chat.Q4_K_M.gguf | Q4_K_M | 0.52GB | | bloomz-560m-sft-chat.Q4_1.gguf | Q4_1 | 0.53GB | | bloomz-560m-sft-chat.Q5_0.gguf | Q5_0 | 0.57GB | | bloomz-560m-sft-chat.Q5_K_S.gguf | Q5_K_S | 0.57GB | | bloomz-560m-sft-chat.Q5_K.gguf | Q5_K | 0.58GB | | bloomz-560m-sft-chat.Q5_K_M.gguf | Q5_K_M | 0.58GB | | bloomz-560m-sft-chat.Q5_1.gguf | Q5_1 | 0.6GB | | bloomz-560m-sft-chat.Q6_K.gguf | Q6_K | 0.64GB | | bloomz-560m-sft-chat.Q8_0.gguf | Q8_0 | 0.82GB | Original model description: --- license: bigscience-bloom-rail-1.0 datasets: language: library_name: transformers pipeline_tag: text-generation inference: parameters: max_new_tokens: 128 widget: --- bloomz-560m-sft-chat -------------------- We introduce the bloomz-560m-sft-chat model, which is a fine-tuning of a Large Language Model (LLM)  bigscience/bloomz-560m. This model is notable for being pre-trained for a chatbot context and undergoing a transposition from float16 to bfloat16. Therefore, this model serves as a solid starting point for fine-tuning towards other more specific tasks. The model was trained equally on both French and English data, ensuring maximum efficiency for these two languages (and their interactions). Due to the transition from float16 to bfloat16, we do not guarantee the preservation of the original model's multilingual capabilities. However, fine-tuning can restore reasonable performance on other languages. The objective is to pre-train all three models (Bloomz-{560m, 3b, 7b1-mt}-sft-chat) to ensure high-performing, energy-efficient, and fast \"foundation\" models for inference on \"realistic\" infrastructures suitable for a business with standard industrial capabilities. Bloomz, through its license, enables free and flexible industrial use. Its tokenizer has been designed with true multi-lingual context in mind, with a significantly lower token generation per word compared to other LLM models. This capability not only leads to improved performance but also enhanced efficiency during inference by making fewer model calls when generating text with shorter contexts. Here is a table illustrating our points using French as an example, where we tokenized Marcel Proust's longest sentence (823 words): `` Sans honneur que précaire, sans liberté que provisoire, [...], et de façon qu’à eux-mêmes il ne leur paraisse pas un vice. ` |      model     | GPT 3.5 | Boris | Flan-T5 | LLaMA | Dolly | MPT | Falcon | Bloomz | |:--------------:|:-------:|:-----:|:-------:|:-----:|:-----:|:---:|:------:|:------:| | tokens per word |   2.3   |  2.3  |    2    |  1.9  |  1.9  | 1.9 |   1.8  |   1.4  | For comparison, with a specialized French tokenizer like CamemBERT or DistilCamemBERT, we have 1.5 tokens per word. In addition to its positive impact on inference time and resource consumption, there has already been shown that there is a direct relationship between the number of tokens per word required for modeling and the predictive performance of the model. Dataset ------- After analyzing a substantial set of modelings, we have observed that the most effective pre-training for zero-shot use cases is pre-training for chatbot contexts. This study was conducted internally, focusing specifically on the French context. As a result, we trained the model on a dataset comprising 0.9 billion tokens. This dataset consists of interactions between an individual and a third party. To balance the French and English data, we utilized the Google Translate API. Training -------- Here is the table summarizing the architecture used for training, along with the training time and average inference speed per token on the target architecture in tokens per second: |          model         |  Architecture | Training time (h) | Inference speed (tokens per second) | |:----------------------:|:-------------:|:-----------------:|:-----------------------------------:| |  bloomz-560m-sft-chat  | 1 x A100 40GB |         41        |                 29                  | |   bloomz-3b-sft-chat   | 1 x A100 40GB |        140        |                 13                  | | bloomz-7b1-mt-sft-chat | 4 x A100 40GB |        268        |                  8                  | |     Hyperparameter    |    Value   | |:---------------------:|:----------:| |       label smoothing | 0.05       | |             optimizer | AdamW      | |                 betas | 0.9, 0.999 | |               AMSGrad | True       | |         learning rate | 5e-4       | |       anneal strategy | cos        | |            div factor | 100        | |      final div factor | 0.1        | |            batch size | 16         | | gradient accumulation | 25         | |            max length | 1500       | Experimentations ---------------- Since the model is trained only on English and French corpora, the performance of the model cannot be guaranteed in other languages. This degradation in performance in other languages is also due to the change in the model's data type from float16 to bfloat16. The conversation example below illustrates this point: |                                                                                                                                                                                                                                                                                      French                                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                  English                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                            Spanish                                                                                                                                                                                            | |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | **human:** C'est quoi la première guerre mondiale ? **bot:** La première guerre mondiale est une guerre entre les États-Unis et la Grande-Bretagne. **human:** Tu es certain de ça ? **bot:** Oui, je suis sûr. **human:** Elle a eu lieu quand ? **bot:** La première guerre mondiale a eu lieu en 1914. **human:** Et elle a fini quand ? **bot:** La première guerre mondiale a eu lieu en 1918. **human:** Elle a opposé qui a qui ? **bot:** La première guerre mondiale a eu lieu entre les États-Unis et la Grande-Bretagne. | **human:** What is the first world war? **bot:** The first world war was a conflict between the United States and the Soviet Union. The conflict began in World War II and lasted until the end of World War III. **human:** Are you sure that this war was between these two belligerents and that it started during the Second World War? **bot:** Yes, the conflict was between the United States and the Soviet Union. **human:** What are the start and end dates of the first war? **bot:** The first war began in World War II and lasted until the end of World War III. | **human:** ¿Qué es la Primera Guerra Mundial? **bot:** La Primera Guerra Mundial fue una guerra entre los Estados Unidos y la Alemania, que se extendió por un total de 18 meses. **human:** ¿Solo estos dos países y por tan poco tiempo? **bot:** La Primera Guerra Mundial fue una guerra entre los Estados Unidos y la Alemania, que se extendió por un total de 18 meses. | Evaluating the performance of generative models can be challenging, and quantitative measures may not always provide the most relevant insights for this type of approach. In this regard, we conducted a qualitative performance study. It involves assessing the relevance of model responses to a pool of French questions using blind evaluation. The metric used for evaluation is the average of two criteria: response accuracy and completeness, similar to what the SSA metric aims to capture. Please note that this measure lacks rigorous scientific validity due to the limited number of questions in the test dataset and the fact that it involves only a few individuals with similar socio-demographic characteristics. The prompts take the same format for all models: ` [Instruction] Question : [Question] Réponse : ` As a result, the prompts do not exploit the structures of chatbot models to ensure fairness, and the evaluation quantifies performance in a purely instruction-based approach. The figure below illustrates the results. The higher a model is positioned in the top-left corner with a small circle radius, the better the model; conversely, if a model is towards the bottom-right with a large circle, it performs less favorably. !constellation We observe that across all models, the performance gain is logarithmic in relation to the increase in model parameters. However, for models that undergo multiple pre-trainings (vanilla, instruction, and chat), models pre-trained on instruction and chat perform significantly better in zero-shot contexts, with a notable improvement for chat-based approaches. The models we have trained demonstrate promising efficiency in this test compared to the number of parameters, indicating cost-effectiveness in a production context. How to use bloomz-560m-sft-chat ------------------------------- There are no specific instructions for using these models in a normal causal inference context. However, to leverage the chatbot capability of the model, an individual's prompt should be preceded by the EOS token (&lt;/s>), and the generated part should be preceded by the BOS token (&lt;s>). The structure takes the following form: ` [human prompt 1][bot answer 1][human prompt 2] ` For example, to load the model using the HuggingFace pipeline interface: `python from transformers import pipeline model = pipeline(\"text-generation\", \"cmarkea/bloomz-560m-sft-chat\") result = model(\"C'est quoi le deep learning ?\", max_new_tokens=512) result [{'generated_text': \"C'est quoi le deep learning ?Le deep learning est un sous-ensemble de l'intelligence artificielle qui utilise des réseaux de neurones pour apprendre à partir de données. Il est utilisé dans de nombreux domaines, notamment la reconnaissance d'images, la reconnaissance vocale et la reconnaissance de la parole.\"}] ` Citation -------- `bibtex @online{DeBloomzChat, AUTHOR = {Cyrile Delestre}, URL = {https://huggingface.co/cmarkea/bloomz-560m-sft-chat}, YEAR = {2023}, KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz}, } ``",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "Quantization made by Richard Erkhov.\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\n\nbloomz-560m-sft-chat - GGUF\n- Model creator: https://huggingface.co/cmarkea/\n- Original model: https://huggingface.co/cmarkea/bloomz-560m-sft-chat/\n\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [bloomz-560m-sft-chat.Q2_K.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q2_K.gguf) | Q2_K | 0.39GB |\n| [bloomz-560m-sft-chat.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q3_K_S.gguf) | Q3_K_S | 0.43GB |\n| [bloomz-560m-sft-chat.Q3_K.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q3_K.gguf) | Q3_K | 0.46GB |\n| [bloomz-560m-sft-chat.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q3_K_M.gguf) | Q3_K_M | 0.46GB |\n| [bloomz-560m-sft-chat.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q3_K_L.gguf) | Q3_K_L | 0.47GB |\n| [bloomz-560m-sft-chat.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.IQ4_XS.gguf) | IQ4_XS | 0.49GB |\n| [bloomz-560m-sft-chat.Q4_0.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q4_0.gguf) | Q4_0 | 0.5GB |\n| [bloomz-560m-sft-chat.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.IQ4_NL.gguf) | IQ4_NL | 0.5GB |\n| [bloomz-560m-sft-chat.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q4_K_S.gguf) | Q4_K_S | 0.5GB |\n| [bloomz-560m-sft-chat.Q4_K.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q4_K.gguf) | Q4_K | 0.52GB |\n| [bloomz-560m-sft-chat.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q4_K_M.gguf) | Q4_K_M | 0.52GB |\n| [bloomz-560m-sft-chat.Q4_1.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q4_1.gguf) | Q4_1 | 0.53GB |\n| [bloomz-560m-sft-chat.Q5_0.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q5_0.gguf) | Q5_0 | 0.57GB |\n| [bloomz-560m-sft-chat.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q5_K_S.gguf) | Q5_K_S | 0.57GB |\n| [bloomz-560m-sft-chat.Q5_K.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q5_K.gguf) | Q5_K | 0.58GB |\n| [bloomz-560m-sft-chat.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q5_K_M.gguf) | Q5_K_M | 0.58GB |\n| [bloomz-560m-sft-chat.Q5_1.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q5_1.gguf) | Q5_1 | 0.6GB |\n| [bloomz-560m-sft-chat.Q6_K.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q6_K.gguf) | Q6_K | 0.64GB |\n| [bloomz-560m-sft-chat.Q8_0.gguf](https://huggingface.co/RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf/blob/main/bloomz-560m-sft-chat.Q8_0.gguf) | Q8_0 | 0.82GB |\n\n\n\n\nOriginal model description:\n---\nlicense: bigscience-bloom-rail-1.0\ndatasets:\n- ehartford/wizard_vicuna_70k_unfiltered\n- shahules786/orca-chat\n- timdettmers/openassistant-guanaco\n- laion/OIG\nlanguage:\n- fr\n- en\nlibrary_name: transformers\npipeline_tag: text-generation\ninference:\n  parameters:\n    max_new_tokens: 128\nwidget:\n  - text: </s>Bonjour, qui es-tu ?<s>\n  - text: </s>Hello, who are you?<s>\n---\n\nbloomz-560m-sft-chat\n--------------------\nWe introduce the bloomz-560m-sft-chat model, which is a fine-tuning of a Large Language Model (LLM)  [bigscience/bloomz-560m](https://huggingface.co/bigscience/bloomz-560m). This model is notable for being pre-trained for a chatbot context and undergoing a transposition from float16 to bfloat16. Therefore, this model serves as a solid starting point for fine-tuning towards other more specific tasks.\n\nThe model was trained equally on both French and English data, ensuring maximum efficiency for these two languages (and their interactions). Due to the transition from float16 to bfloat16, we do not guarantee the preservation of the original model's multilingual capabilities. However, fine-tuning can restore reasonable performance on other languages.\n\nThe objective is to pre-train all three models (Bloomz-{560m, 3b, 7b1-mt}-sft-chat) to ensure high-performing, energy-efficient, and fast \"foundation\" models for inference on \"realistic\" infrastructures suitable for a business with standard industrial capabilities.\n\n\nBloomz, through its license, enables free and flexible industrial use. Its tokenizer has been designed with true multi-lingual context in mind, with a significantly lower token generation per word compared to other LLM models. This capability not only leads to improved performance but also enhanced efficiency during inference by making fewer model calls when generating text with shorter contexts. Here is a table illustrating our points using French as an example, where we tokenized Marcel Proust's longest sentence (823 words):\n```\nSans honneur que précaire, sans liberté que provisoire, [...], et de façon qu’à eux-mêmes il ne leur paraisse pas un vice. \n```\n\n|      model     | GPT 3.5 | Boris | Flan-T5 | LLaMA | Dolly | MPT | Falcon | Bloomz |\n|:--------------:|:-------:|:-----:|:-------:|:-----:|:-----:|:---:|:------:|:------:|\n| tokens per word |   2.3   |  2.3  |    2    |  1.9  |  1.9  | 1.9 |   1.8  |   1.4  |\n\n\nFor comparison, with a specialized French tokenizer like [CamemBERT](https://huggingface.co/camembert/camembert-base) or [DistilCamemBERT](cmarkea/distilcamembert-base), we have 1.5 tokens per word. In addition to its positive impact on inference time and resource consumption, there has already been [shown that there is a direct relationship](https://arxiv.org/abs/2012.15613) between the number of tokens per word required for modeling and the predictive performance of the model.\n\nDataset\n-------\nAfter analyzing a substantial set of modelings, we have observed that the most effective pre-training for zero-shot use cases is pre-training for chatbot contexts. This study was conducted internally, focusing specifically on the French context. As a result, we trained the model on a dataset comprising 0.9 billion tokens. This dataset consists of interactions between an individual and a third party. To balance the French and English data, we utilized the Google Translate API.\n\nTraining\n--------\nHere is the table summarizing the architecture used for training, along with the training time and average inference speed per token on the target architecture in tokens per second:\n\n|          model         |  Architecture | Training time (h) | Inference speed (tokens per second) |\n|:----------------------:|:-------------:|:-----------------:|:-----------------------------------:|\n|  [bloomz-560m-sft-chat](https://huggingface.co/cmarkea/bloomz-560m-sft-chat)  | 1 x A100 40GB |         41        |                 29                  |\n|   [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat)   | 1 x A100 40GB |        140        |                 13                  |\n| [bloomz-7b1-mt-sft-chat](https://huggingface.co/cmarkea/bloomz-7b1-mt-sft-chat) | 4 x A100 40GB |        268        |                  8                  |\n\n|     Hyperparameter    |    Value   |\n|:---------------------:|:----------:|\n|       label smoothing | 0.05       |\n|             optimizer | AdamW      |\n|                 betas | 0.9, 0.999 |\n|               AMSGrad | True       |\n|         learning rate | 5e-4       |\n|       anneal strategy | cos        |\n|            div factor | 100        |\n|      final div factor | 0.1        |\n|            batch size | 16         |\n| gradient accumulation | 25         |\n|            max length | 1500       |\n\nExperimentations\n----------------\nSince the model is trained only on English and French corpora, the performance of the model cannot be guaranteed in other languages. This degradation in performance in other languages is also due to the change in the model's data type from float16 to bfloat16. The conversation example below illustrates this point:\n\n|                                                                                                                                                                                                                                                                                      French                                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                  English                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                            Spanish                                                                                                                                                                                            |\n|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|\n| **human:** C'est quoi la première guerre mondiale ?<br/> **bot:** La première guerre mondiale est une guerre entre les États-Unis et la Grande-Bretagne.<br/> **human:** Tu es certain de ça ?<br/> **bot:** Oui, je suis sûr.<br/> **human:** Elle a eu lieu quand ?<br/> **bot:** La première guerre mondiale a eu lieu en 1914.<br/> **human:** Et elle a fini quand ?<br/> **bot:** La première guerre mondiale a eu lieu en 1918.<br/> **human:** Elle a opposé qui a qui ?<br/> **bot:** La première guerre mondiale a eu lieu entre les États-Unis et la Grande-Bretagne. | **human:** What is the first world war?<br/> **bot:** The first world war was a conflict between the United States and the Soviet Union. The conflict began in World War II and lasted until the end of World War III.<br/> **human:** Are you sure that this war was between these two belligerents and that it started during the Second World War?<br/> **bot:** Yes, the conflict was between the United States and the Soviet Union.<br/> **human:** What are the start and end dates of the first war?<br/> **bot:** The first war began in World War II and lasted until the end of World War III. | **human:** ¿Qué es la Primera Guerra Mundial?<br/> **bot:** La Primera Guerra Mundial fue una guerra entre los Estados Unidos y la Alemania, que se extendió por un total de 18 meses.<br/> **human:** ¿Solo estos dos países y por tan poco tiempo?<br/> **bot:** La Primera Guerra Mundial fue una guerra entre los Estados Unidos y la Alemania, que se extendió por un total de 18 meses. |\n\nEvaluating the performance of generative models can be challenging, and quantitative measures may not always provide the most relevant insights for this type of approach. In this regard, we conducted a qualitative performance study. It involves assessing the relevance of model responses to a pool of French questions using blind evaluation. The metric used for evaluation is the average of two criteria: response accuracy and completeness, similar to what the [SSA metric](https://arxiv.org/abs/2001.09977) aims to capture. Please note that this measure lacks rigorous scientific validity due to the limited number of questions in the test dataset and the fact that it involves only a few individuals with similar socio-demographic characteristics. The prompts take the same format for all models:\n```\n[Instruction]\nQuestion : [Question]\nRéponse : \n```\nAs a result, the prompts do not exploit the structures of chatbot models to ensure fairness, and the evaluation quantifies performance in a purely instruction-based approach. The figure below illustrates the results. The higher a model is positioned in the top-left corner with a small circle radius, the better the model; conversely, if a model is towards the bottom-right with a large circle, it performs less favorably.\n![constellation](https://i.postimg.cc/kggYhKg9/constellation.png)\nWe observe that across all models, the performance gain is logarithmic in relation to the increase in model parameters. However, for models that undergo multiple pre-trainings (vanilla, instruction, and chat), models pre-trained on instruction and chat perform significantly better in zero-shot contexts, with a notable improvement for chat-based approaches. The models we have trained demonstrate promising efficiency in this test compared to the number of parameters, indicating cost-effectiveness in a production context.\n\nHow to use bloomz-560m-sft-chat\n-------------------------------\n\nThere are no specific instructions for using these models in a normal causal inference context. However, to leverage the chatbot capability of the model, an individual's prompt should be preceded by the EOS token (&lt;/s>), and the generated part should be preceded by the BOS token (&lt;s>). The structure takes the following form:\n```\n</s>[human prompt 1]<s>[bot answer 1]</s>[human prompt 2]<s>\n```\nFor example, to load the model using the HuggingFace pipeline interface:\n```python\nfrom transformers import pipeline\n\nmodel = pipeline(\"text-generation\", \"cmarkea/bloomz-560m-sft-chat\")\nresult = model(\"</s>C'est quoi le deep learning ?<s>\", max_new_tokens=512)\n\nresult\n[{'generated_text': \"</s>C'est quoi le deep learning ?<s>Le deep learning\n   est un sous-ensemble de l'intelligence artificielle qui utilise des\n   réseaux de neurones pour apprendre à partir de données. Il est utilisé\n   dans de nombreux domaines, notamment la reconnaissance d'images, la\n   reconnaissance vocale et la reconnaissance de la parole.\"}]\n```\n\nCitation\n--------\n\n```bibtex\n@online{DeBloomzChat,\n  AUTHOR = {Cyrile Delestre},\n  URL = {https://huggingface.co/cmarkea/bloomz-560m-sft-chat},\n  YEAR = {2023},\n  KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},\n}\n```\n\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "arxiv:2012.15613",
    "arxiv:2001.09977",
    "endpoints_compatible",
    "region:us"
  ],
  "likes": 0,
  "downloads": 937,
  "gated": false,
  "private": false,
  "last_modified": "2024-11-07T05:35:26.000Z",
  "created_at": "2024-11-07T04:55:16.000Z",
  "pipeline_tag": "",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "672c4834a3cac338ea365bff",
  "id": "RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf",
  "modelId": "RichardErkhov/cmarkea_-_bloomz-560m-sft-chat-gguf",
  "sha": "f7389f2167d9208153def3f90a29a186f4ebc1ac",
  "createdAt": "2024-11-07T04:55:16.000Z",
  "lastModified": "2024-11-07T05:35:26.000Z",
  "author": "RichardErkhov",
  "downloads": 937,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 21
}