Model Intelligence Sheet

richarderkhov/openai-community_-_gpt2-xl-gguf overview

Comprehensive model page for richarderkhov/openai-community-gpt2-xl-gguf

ggufarxiv:1910.09700endpoints_compatibleregion:us

richarderkhov/openai-community_-_gpt2-xl-gguf visual

Downloads

2,758

Likes

Pipeline

—

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

21 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
gpt2-xl.IQ3_M.gguf	GGUF	IQ3_M	928.41 MB	Download
gpt2-xl.IQ3_S.gguf	GGUF	IQ3_S	862.04 MB	Download
gpt2-xl.IQ3_XS.gguf	GGUF	IQ3_XS	862.04 MB	Download
gpt2-xl.IQ4_NL.gguf	GGUF	IQ4_NL	931.62 MB	Download
gpt2-xl.IQ4_XS.gguf	GGUF	IQ4_XS	918.80 MB	Download
gpt2-xl.Q2_K.gguf	GGUF	Q2_K	862.04 MB	Download
gpt2-xl.Q3_K.gguf	GGUF	Q3_K	986.55 MB	Download
gpt2-xl.Q3_K_L.gguf	GGUF	Q3_K_L	1.02 GB	Download
gpt2-xl.Q3_K_M.gguf	GGUF	Q3_K_M	986.55 MB	Download
gpt2-xl.Q3_K_S.gguf	GGUF	Q3_K_S	862.04 MB	Download
gpt2-xl.Q4_0.gguf	GGUF	—	924.29 MB	Download
gpt2-xl.Q4_1.gguf	GGUF	—	1016.98 MB	Download
gpt2-xl.Q4_K.gguf	GGUF	Q4_K	1.11 GB	Download
gpt2-xl.Q4_K_M.gguf	GGUF	Q4_K_M	1.11 GB	Download
gpt2-xl.Q4_K_S.gguf	GGUF	Q4_K_S	1.03 GB	Download
gpt2-xl.Q5_0.gguf	GGUF	—	1.08 GB	Download
gpt2-xl.Q5_1.gguf	GGUF	—	1.17 GB	Download
gpt2-xl.Q5_K.gguf	GGUF	Q5_K	1.28 GB	Download
gpt2-xl.Q5_K_M.gguf	GGUF	Q5_K_M	1.28 GB	Download
gpt2-xl.Q5_K_S.gguf	GGUF	Q5_K_S	1.15 GB	Download
gpt2-xl.Q6_K.gguf	GGUF	Q6_K	1.52 GB	Download

Model Details Live

Model Slug

richarderkhov/openai-community_-_gpt2-xl-gguf

Author

RichardErkhov

Pipeline Task

—

Library

—

Created

2024-04-17

Last Modified

2024-05-02

Gated

Private

HF SHA

d38efe9b77237ca63165da3d8137a73ace02e256

License

Unknown

Language

Unknown

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "frontmatter": {},
    "hero_image_url": "",
    "summary": "",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "Quantization made by Richard Erkhov.\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\n\ngpt2-xl - GGUF\n- Model creator: https://huggingface.co/openai-community/\n- Original model: https://huggingface.co/openai-community/gpt2-xl/\n\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [gpt2-xl.Q2_K.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q2_K.gguf) | Q2_K | 0.84GB |\n| [gpt2-xl.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.IQ3_XS.gguf) | IQ3_XS | 0.84GB |\n| [gpt2-xl.IQ3_S.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.IQ3_S.gguf) | IQ3_S | 0.84GB |\n| [gpt2-xl.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q3_K_S.gguf) | Q3_K_S | 0.84GB |\n| [gpt2-xl.IQ3_M.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.IQ3_M.gguf) | IQ3_M | 0.91GB |\n| [gpt2-xl.Q3_K.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q3_K.gguf) | Q3_K | 0.96GB |\n| [gpt2-xl.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q3_K_M.gguf) | Q3_K_M | 0.96GB |\n| [gpt2-xl.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q3_K_L.gguf) | Q3_K_L | 1.02GB |\n| [gpt2-xl.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.IQ4_XS.gguf) | IQ4_XS | 0.9GB |\n| [gpt2-xl.Q4_0.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q4_0.gguf) | Q4_0 | 0.9GB |\n| [gpt2-xl.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.IQ4_NL.gguf) | IQ4_NL | 0.91GB |\n| [gpt2-xl.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q4_K_S.gguf) | Q4_K_S | 1.03GB |\n| [gpt2-xl.Q4_K.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q4_K.gguf) | Q4_K | 1.11GB |\n| [gpt2-xl.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q4_K_M.gguf) | Q4_K_M | 1.11GB |\n| [gpt2-xl.Q4_1.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q4_1.gguf) | Q4_1 | 0.99GB |\n| [gpt2-xl.Q5_0.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q5_0.gguf) | Q5_0 | 1.08GB |\n| [gpt2-xl.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q5_K_S.gguf) | Q5_K_S | 1.15GB |\n| [gpt2-xl.Q5_K.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q5_K.gguf) | Q5_K | 1.28GB |\n| [gpt2-xl.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q5_K_M.gguf) | Q5_K_M | 1.28GB |\n| [gpt2-xl.Q5_1.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q5_1.gguf) | Q5_1 | 1.17GB |\n| [gpt2-xl.Q6_K.gguf](https://huggingface.co/RichardErkhov/openai-community_-_gpt2-xl-gguf/blob/main/gpt2-xl.Q6_K.gguf) | Q6_K | 1.52GB |\n\n\n\n\nOriginal model description:\n---\nlanguage: en\nlicense: mit\n---\n\n# GPT-2 XL\n\n## Table of Contents\n- [Model Details](#model-details)\n- [How To Get Started With the Model](#how-to-get-started-with-the-model)\n- [Uses](#uses)\n- [Risks, Limitations and Biases](#risks-limitations-and-biases)\n- [Training](#training)\n- [Evaluation](#evaluation)\n- [Environmental Impact](#environmental-impact)\n- [Technical Specifications](#technical-specifications)\n- [Citation Information](#citation-information)\n- [Model Card Authors](#model-card-authors)\n\n## Model Details\n\n**Model Description:** GPT-2 XL is the **1.5B parameter** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective. \n\n- **Developed by:** OpenAI, see [associated research paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) and [GitHub repo](https://github.com/openai/gpt-2) for model developers.\n- **Model Type:** Transformer-based language model\n- **Language(s):** English\n- **License:** [Modified MIT License](https://github.com/openai/gpt-2/blob/master/LICENSE)\n- **Related Models:** [GPT-2](https://huggingface.co/gpt2), [GPT-Medium](https://huggingface.co/gpt2-medium) and [GPT-Large](https://huggingface.co/gpt2-large)\n- **Resources for more information:**\n  - [Research Paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)\n  - [OpenAI Blog Post](https://openai.com/blog/better-language-models/)\n  - [GitHub Repo](https://github.com/openai/gpt-2)\n  - [OpenAI Model Card for GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md)\n  - [OpenAI GPT-2 1.5B Release Blog Post](https://openai.com/blog/gpt-2-1-5b-release/)\n  - Test the full generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large\n\n## How to Get Started with the Model \n\nUse the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:\n\n```python\nfrom transformers import pipeline, set_seed\ngenerator = pipeline('text-generation', model='gpt2-xl')\nset_seed(42)\ngenerator(\"Hello, I'm a language model,\", max_length=30, num_return_sequences=5)\n```\n\nHere is how to use this model to get the features of a given text in PyTorch:\n\n```python\nfrom transformers import GPT2Tokenizer, GPT2Model\ntokenizer = GPT2Tokenizer.from_pretrained('gpt2-xl')\nmodel = GPT2Model.from_pretrained('gpt2-xl')\ntext = \"Replace me by any text you'd like.\"\nencoded_input = tokenizer(text, return_tensors='pt')\noutput = model(**encoded_input)\n```\n\nand in TensorFlow:\n\n```python\nfrom transformers import GPT2Tokenizer, TFGPT2Model\ntokenizer = GPT2Tokenizer.from_pretrained('gpt2-xl')\nmodel = TFGPT2Model.from_pretrained('gpt2-xl')\ntext = \"Replace me by any text you'd like.\"\nencoded_input = tokenizer(text, return_tensors='tf')\noutput = model(encoded_input)\n```\n\n## Uses\n\n#### Direct Use\n\nIn their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote: \n\n> The primary intended users of these models are AI researchers and practitioners.\n> \n> We primarily imagine these language models will be used by researchers to better understand the behaviors, capabilities, biases, and constraints of large-scale generative language models.\n\n#### Downstream Use\n\nIn their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote: \n\n> Here are some secondary use cases we believe are likely:\n> \n> - Writing assistance: Grammar assistance, autocompletion (for normal prose or code)\n> - Creative writing and art: exploring the generation of creative, fictional texts; aiding creation of poetry and other literary art.\n> - Entertainment: Creation of games, chat bots, and amusing generations.\n\n#### Misuse and Out-of-scope Use\n\nIn their [model card about GPT-2](https://github.com/openai/gpt-2/blob/master/model_card.md), OpenAI wrote: \n\n> Because large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases that require the generated text to be true.\n> \n> Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes.\n\n## Risks, Limitations and Biases\n\n**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propogate historical and current stereotypes.**\n\n#### Biases\n\nSignificant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). \n\nThe training data used for this model has not been released as a dataset one can browse. We know it contains a lot of unfiltered content from the internet, which is far from neutral. Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:\n\n```python\nfrom transformers import pipeline, set_seed\ngenerator = pipeline('text-generation', model='gpt2-xl')\nset_seed(42)\ngenerator(\"The man worked as a\", max_length=10, num_return_sequences=5)\n\nset_seed(42)\ngenerator(\"The woman worked as a\", max_length=10, num_return_sequences=5)\n```\n\nThis bias will also affect all fine-tuned versions of this model. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.\n\n#### Risks and Limitations\n\nWhen they released the 1.5B parameter model, OpenAI wrote in a [blog post](https://openai.com/blog/gpt-2-1-5b-release/):\n\n > GPT-2 can be fine-tuned for misuse. Our partners at the Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism (CTEC) found that extremist groups can use GPT-2 for misuse, specifically by fine-tuning GPT-2 models on four ideological positions: white supremacy, Marxism, jihadist Islamism, and anarchism. CTEC demonstrated that it’s possible to create models that can generate synthetic propaganda for these ideologies. They also show that, despite having low detection accuracy on synthetic outputs, ML-based detection methods can give experts reasonable suspicion that an actor is generating synthetic text. \n \nThe blog post further discusses the risks, limitations, and biases of the model. \n\n## Training\n\n#### Training Data\n\nThe OpenAI team wanted to train this model on a corpus as large as possible. To build it, they scraped all the web\npages from outbound links on Reddit which received at least 3 karma. Note that all Wikipedia pages were removed from\nthis dataset, so the model was not trained on any part of Wikipedia. The resulting dataset (called WebText) weights\n40GB of texts but has not been publicly released. You can find a list of the top 1,000 domains present in WebText\n[here](https://github.com/openai/gpt-2/blob/master/domains.txt).\n\n#### Training Procedure\n\nThe model is pretrained on a very large corpus of English data in a self-supervised fashion. This\nmeans it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots\nof publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely,\nit was trained to guess the next word in sentences.\n\nMore precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence,\nshifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the\npredictions for the token `i` only uses the inputs from `1` to `i` but not the future tokens.\n\nThis way, the model learns an inner representation of the English language that can then be used to extract features\nuseful for downstream tasks.\n\nThe texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a\nvocabulary size of 50,257. The inputs are sequences of 1024 consecutive tokens.\n\n## Evaluation\n\nThe following evaluation information is extracted from the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).\n\n#### Testing Data, Factors and Metrics\n\nThe model authors write in the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) that:\n\n> Since our model operates on a byte level and does not require lossy pre-processing or tokenization, we can evaluate it on any language model benchmark. Results on language modeling datasets are commonly reported in a quantity which is a scaled or ex- ponentiated version of the average negative log probability per canonical prediction unit - usually a character, a byte, or a word. We evaluate the same quantity by computing the log-probability of a dataset according to a WebText LM and dividing by the number of canonical units. For many of these datasets, WebText LMs would be tested significantly out- of-distribution, having to predict aggressively standardized text, tokenization artifacts such as disconnected punctuation and contractions, shuffled sentences, and even the string <UNK> which is extremely rare in WebText - occurring only 26 times in 40 billion bytes. We report our main results...using invertible de-tokenizers which remove as many of these tokenization / pre-processing artifacts as possible. Since these de-tokenizers are invertible, we can still calculate the log probability of a dataset and they can be thought of as a simple form of domain adaptation. \n\n#### Results\n\nThe model achieves the following results without any fine-tuning (zero-shot):\n\n| Dataset  | LAMBADA | LAMBADA | CBT-CN | CBT-NE | WikiText2 | PTB    | enwiki8 | text8  | WikiText103 | 1BW   |\n|:--------:|:-------:|:-------:|:------:|:------:|:---------:|:------:|:-------:|:------:|:-----------:|:-----:|\n| (metric) | (PPL)   | (ACC)   | (ACC)  | (ACC)  | (PPL)     | (PPL)  | (BPB)   | (BPC)  | (PPL)       | (PPL) |\n|          | 8.63    | 63.24   | 93.30  | 89.05  | 18.34     | 35.76  | 0.93    | 0.98   | 17.48       | 42.16 |\n\n## Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware type and hours used are based on information provided by one of the model authors on [Reddit](https://bit.ly/2Tw1x4L).\n\n- **Hardware Type:** 32 TPUv3 chips\n- **Hours used:** 168\n- **Cloud Provider:** Unknown\n- **Compute Region:** Unknown\n- **Carbon Emitted:** Unknown\n\n## Technical Specifications\n\nSee the [associated paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) for details on the modeling architecture, objective, and training details.\n\n## Citation Information\n\n```bibtex\n@article{radford2019language,\n  title={Language models are unsupervised multitask learners},\n  author={Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others},\n  journal={OpenAI blog},\n  volume={1},\n  number={8},\n  pages={9},\n  year={2019}\n}\n```\n\n## Model Card Authors\n\nThis model card was written by the Hugging Face team.\n\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "arxiv:1910.09700",
    "endpoints_compatible",
    "region:us"
  ],
  "likes": 0,
  "downloads": 2758,
  "gated": false,
  "private": false,
  "last_modified": "2024-05-02T02:03:49.000Z",
  "created_at": "2024-04-17T09:16:12.000Z",
  "pipeline_tag": "",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "661f935c42c9bf38760499bf",
  "id": "RichardErkhov/openai-community_-_gpt2-xl-gguf",
  "modelId": "RichardErkhov/openai-community_-_gpt2-xl-gguf",
  "sha": "d38efe9b77237ca63165da3d8137a73ace02e256",
  "createdAt": "2024-04-17T09:16:12.000Z",
  "lastModified": "2024-05-02T02:03:49.000Z",
  "author": "RichardErkhov",
  "downloads": 2758,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 23
}