Model Intelligence Sheet

richarderkhov/raibp_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf overview

This model was trained from scratch on the RaiBP/openwebtext2-first-30-chunks-ablation-bilingual dataset.

ggufendpoints_compatibleregion:us

richarderkhov/raibp_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf visual

Downloads

797

Likes

Pipeline

—

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

19 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.IQ4_NL.gguf	GGUF	IQ4_NL	101.90 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.IQ4_XS.gguf	GGUF	IQ4_XS	98.29 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q2_K.gguf	GGUF	Q2_K	77.44 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K.gguf	GGUF	Q3_K	93.14 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K_L.gguf	GGUF	Q3_K_L	97.36 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K_M.gguf	GGUF	Q3_K_M	93.14 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K_S.gguf	GGUF	Q3_K_S	85.97 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_0.gguf	GGUF	—	101.62 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_1.gguf	GGUF	—	108.98 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_K.gguf	GGUF	Q4_K	107.63 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_K_M.gguf	GGUF	Q4_K_M	107.63 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_K_S.gguf	GGUF	Q4_K_S	101.90 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_0.gguf	GGUF	—	116.35 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_1.gguf	GGUF	—	123.71 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_K.gguf	GGUF	Q5_K	120.83 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_K_M.gguf	GGUF	Q5_K_M	120.83 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_K_S.gguf	GGUF	Q5_K_S	116.35 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q6_K.gguf	GGUF	Q6_K	131.99 MB	Download
gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q8_0.gguf	GGUF	—	169.44 MB	Download

Model Details Live

Model Slug

richarderkhov/raibp_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf

Author

RichardErkhov

Pipeline Task

—

Library

—

Created

2024-11-03

Last Modified

2024-11-03

Gated

Private

HF SHA

24903bb5f645adaf3c3c772786b4ab7c959720fc

License

Unknown

Language

Unknown

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "frontmatter": {},
    "hero_image_url": "",
    "summary": "This model was trained from scratch on the RaiBP/openwebtext2-first-30-chunks-ablation-bilingual dataset.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "Quantization made by Richard Erkhov.\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\n\ngpt2-openwebtext2-first-30-chunks-ablation-bilingual - GGUF\n- Model creator: https://huggingface.co/RaiBP/\n- Original model: https://huggingface.co/RaiBP/gpt2-openwebtext2-first-30-chunks-ablation-bilingual/\n\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q2_K.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q2_K.gguf) | Q2_K | 0.08GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K_S.gguf) | Q3_K_S | 0.08GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K.gguf) | Q3_K | 0.09GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K_M.gguf) | Q3_K_M | 0.09GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q3_K_L.gguf) | Q3_K_L | 0.1GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.IQ4_XS.gguf) | IQ4_XS | 0.1GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_0.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_0.gguf) | Q4_0 | 0.1GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.IQ4_NL.gguf) | IQ4_NL | 0.1GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_K_S.gguf) | Q4_K_S | 0.1GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_K.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_K.gguf) | Q4_K | 0.11GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_K_M.gguf) | Q4_K_M | 0.11GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_1.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q4_1.gguf) | Q4_1 | 0.11GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_0.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_0.gguf) | Q5_0 | 0.11GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_K_S.gguf) | Q5_K_S | 0.11GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_K.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_K.gguf) | Q5_K | 0.12GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_K_M.gguf) | Q5_K_M | 0.12GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_1.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q5_1.gguf) | Q5_1 | 0.12GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q6_K.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q6_K.gguf) | Q6_K | 0.13GB |\n| [gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q8_0.gguf](https://huggingface.co/RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf/blob/main/gpt2-openwebtext2-first-30-chunks-ablation-bilingual.Q8_0.gguf) | Q8_0 | 0.17GB |\n\n\n\n\nOriginal model description:\n---\ntags:\n- generated_from_trainer\ndatasets:\n- RaiBP/openwebtext2-first-30-chunks-ablation-bilingual\nmodel-index:\n- name: training_bilingual\n  results: []\n---\n\n<!-- This model card has been generated automatically according to the information the Trainer had access to. You\nshould probably proofread and complete it, then remove this comment. -->\n\n# training_bilingual\n\nThis model was trained from scratch on the RaiBP/openwebtext2-first-30-chunks-ablation-bilingual dataset.\n\n## Model description\n\nMore information needed\n\n## Intended uses & limitations\n\nMore information needed\n\n## Training and evaluation data\n\nMore information needed\n\n## Training procedure\nThe [`run_clm.py` script](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py) from the transformers library was used. Training was distributed on two NVIDIA Quadro RTX 6000 GPUs:\n```bash\nTORCH_CPP_LOG_LEVEL=INFO NCCL_DEBUG=INFO CUDA_VISIBLE_DEVICES=0,1 nohup python -m torch.distributed.launch \\\n--nproc_per_node=2 run_clm.py --output_dir=\"./training_bilingual\" \\\n--model_type=\"gpt2\" \\\n--config_name=\"./training\" \\\n--tokenizer_name=\"./training\" \\\n--dataset_name=\"RaiBP/openwebtext2-first-30-chunks-ablation-bilingual\" \\\n--do_train \\\n--per_device_train_batch_size 8 \\\n--block_size=\"1024\" \\\n--learning_rate=\"5e-3\" --warmup_steps=\"1000\" \\\n--adam_beta1=\"0.9\" --adam_beta2=\"0.98\" --weight_decay=\"0.01\" \\\n--overwrite_output_dir \\\n--num_train_epochs=\"1\" \\\n--logging_steps=\"500\" \\\n--save_steps=\"5000\" --preprocessing_num_workers=\"16\" \\\n--gradient_accumulation_steps=\"4\" --report_to=\"tensorboard\" \\\n--logging_dir=\"./log_bilingual\"  > command_bilingual_log.log 2>&1 &\n```\n### Training hyperparameters\n\nThe following hyperparameters were used during training:\n- learning_rate: 0.005\n- train_batch_size: 8\n- eval_batch_size: 8\n- seed: 42\n- distributed_type: multi-GPU\n- num_devices: 2\n- gradient_accumulation_steps: 4\n- total_train_batch_size: 64\n- total_eval_batch_size: 16\n- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08\n- lr_scheduler_type: linear\n- lr_scheduler_warmup_steps: 1000\n- num_epochs: 1.0\n\n### Training results\n\n### Evaluation results\nPerplexity on random 2000 examples of the target language's [Wikipedia dataset](https://huggingface.co/datasets/wikimedia/wikipedia), using the code provided in the [perplexity docs](https://huggingface.co/docs/transformers/perplexity), with 512 tokes of stride.\nBaseline is the result from evaluating [OpenAI's GPT-2](https://huggingface.co/gpt2) on the same examples.\n| Target language | PPL               | Baseline PPL      |\n|-----------------|-------------------|-------------------|\n| en              |40.30453872680664 |26.562532424926758 |\n| de              |24.30541229248047  |56.907039642333984 |\n| es              |22.53978729248047  |55.592445373535156 |\n| fr              |26.614990234375  |49.69472885131836  |\n|it               |28.24549674987793 |75.95120239257812  |\n|pt               |19.720951080322266   ||\n|nl               |33.292930603027344  ||\n\nThe following script was used for evaluation\n\n\n```python\nimport numpy as np\nfrom datasets import load_dataset\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\nfrom tqdm import tqdm\nimport random\n\n# Set the seed for reproducibility\nrandom.seed(42)\n\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n\n# Load the model\nmodel_name = \"RaiBP/gpt2-openwebtext2-first-30-chunks-ablation-bilingual\"\nmodel = AutoModelForCausalLM.from_pretrained(model_name).to(device)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n\ntarget_language_dataset = \"20231101.de\" # change here for other languages\n\ndataset = load_dataset(\"wikimedia/wikipedia\", target_language_dataset, split=\"train\")\nnum_examples = 2000\nrandom_numbers = list(np.random.randint(0, len(dataset), num_examples))\nexamples = []\nfor i in tqdm(random_numbers):\n    examples.append(dataset[int(i)][\"text\"])\nencodings = tokenizer(\"\\n\\n\".join(examples), return_tensors=\"pt\")\n\nmax_length = model.config.n_positions\nstride = 512\nseq_len = encodings.input_ids.size(1)\n\nnlls = []\nprev_end_loc = 0\nfor begin_loc in tqdm(range(0, seq_len, stride)):\n    end_loc = min(begin_loc + max_length, seq_len)\n    trg_len = end_loc - prev_end_loc  # may be different from stride on last loop\n    input_ids = encodings.input_ids[:, begin_loc:end_loc].to(device)\n    target_ids = input_ids.clone()\n    target_ids[:, :-trg_len] = -100\n\n    with torch.no_grad():\n        outputs = model(input_ids, labels=target_ids)\n\n        # loss is calculated using CrossEntropyLoss which averages over valid labels\n        # N.B. the model only calculates loss over trg_len - 1 labels, because it internally shifts the labels\n        # to the left by 1.\n        neg_log_likelihood = outputs.loss\n\n    nlls.append(neg_log_likelihood)\n\n    prev_end_loc = end_loc\n    if end_loc == seq_len:\n        break\n\nppl = torch.exp(torch.stack(nlls).mean())\n\nprint(\"Perplexity: \", ppl.item())\n```\n\n### Framework versions\n\n- Transformers 4.37.0.dev0\n- Pytorch 1.13.0\n- Datasets 2.16.0\n- Tokenizers 0.15.0\n\n\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "endpoints_compatible",
    "region:us"
  ],
  "likes": 0,
  "downloads": 797,
  "gated": false,
  "private": false,
  "last_modified": "2024-11-03T11:42:38.000Z",
  "created_at": "2024-11-03T11:40:37.000Z",
  "pipeline_tag": "",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "6727613555787d6c8e155979",
  "id": "RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf",
  "modelId": "RichardErkhov/RaiBP_-_gpt2-openwebtext2-first-30-chunks-ablation-bilingual-gguf",
  "sha": "24903bb5f645adaf3c3c772786b4ab7c959720fc",
  "createdAt": "2024-11-03T11:40:37.000Z",
  "lastModified": "2024-11-03T11:42:38.000Z",
  "author": "RichardErkhov",
  "downloads": 797,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 21
}