Model Intelligence Sheet
qwp4w3hyb/meta-llama-3.1-70b-instruct-imat-gguf overview
# Original Model Card: TODO
Downloads
247
Likes
8
Pipeline
text-generation
Library
—
Visibility
Public
Access
Open
Repository Files & Downloads
24 files detected
Direct downloads for all repository files
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| meta-llama-3.1-70b-instruct-bf16.split-00001-of-00004.gguf | GGUF | BF16 | 35.57 GB | Download |
| meta-llama-3.1-70b-instruct-bf16.split-00002-of-00004.gguf | GGUF | BF16 | 34.64 GB | Download |
| meta-llama-3.1-70b-instruct-bf16.split-00003-of-00004.gguf | GGUF | BF16 | 33.63 GB | Download |
| meta-llama-3.1-70b-instruct-bf16.split-00004-of-00004.gguf | GGUF | BF16 | 27.58 GB | Download |
| meta-llama-3.1-70b-instruct-imat-IQ1_S.gguf | GGUF | IQ1_S | 14.29 GB | Download |
| meta-llama-3.1-70b-instruct-imat-IQ2_M.gguf | GGUF | IQ2_M | 22.46 GB | Download |
| meta-llama-3.1-70b-instruct-imat-IQ2_XXS.gguf | GGUF | IQ2_XXS | 17.79 GB | Download |
| meta-llama-3.1-70b-instruct-imat-IQ3_M.gguf | GGUF | IQ3_M | 29.74 GB | Download |
| meta-llama-3.1-70b-instruct-imat-IQ3_XXS.gguf | GGUF | IQ3_XXS | 25.58 GB | Download |
| meta-llama-3.1-70b-instruct-imat-IQ4_NL.gguf | GGUF | IQ4_NL | 37.30 GB | Download |
| meta-llama-3.1-70b-instruct-imat-IQ4_XS.gguf | GGUF | IQ4_XS | 35.30 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q4_0.gguf | GGUF | — | 37.36 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q4_K_L.gguf | GGUF | Q4_K_L | 42.16 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q4_K_M.gguf | GGUF | Q4_K_M | 39.60 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q5_K_L.split-00001-of-00002.gguf | GGUF | Q5_K_L | 24.63 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q5_K_L.split-00002-of-00002.gguf | GGUF | Q5_K_L | 24.33 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q5_K_M.gguf | GGUF | Q5_K_M | 46.52 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q6_K.split-00001-of-00002.gguf | GGUF | Q6_K | 27.14 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q6_K.split-00002-of-00002.gguf | GGUF | Q6_K | 26.77 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q6_K_L.split-00001-of-00002.gguf | GGUF | Q6_K_L | 28.29 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q6_K_L.split-00002-of-00002.gguf | GGUF | Q6_K_L | 27.93 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q8_0.split-00001-of-00002.gguf | GGUF | — | 35.15 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q8_0.split-00002-of-00002.gguf | GGUF | — | 34.68 GB | Download |
| meta-llama-3.1-70b-instruct-imat-Q8_0_L.split-00001-of-00002.gguf | GGUF | — | 36.07 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"language": [
"en",
"de",
"fr",
"it",
"pt",
"hi",
"es",
"th"
],
"license": "llama3.1",
"pipeline_tag": "text-generation",
"tags": [
"facebook",
"meta",
"pytorch",
"llama",
"llama-3",
"gguf",
"imatrix"
],
"base_model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
"frontmatter": {
"language": [
"en",
"de",
"fr",
"it",
"pt",
"hi",
"es",
"th"
],
"license": "llama3.1",
"pipeline_tag": "text-generation",
"tags": [
"facebook",
"meta",
"pytorch",
"llama",
"llama-3",
"gguf",
"imatrix"
],
"base_model": "meta-llama/Meta-Llama-3.1-70B-Instruct"
},
"hero_image_url": "",
"summary": "`` ./imatrix -m $model_name-bf16.gguf -f calibration_datav3.txt -o $model_name.imatrix `` # Original Model Card: TODO",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlanguage:\n - en\n - de\n - fr\n - it\n - pt\n - hi\n - es\n - th\nlicense: llama3.1\npipeline_tag: text-generation\ntags:\n - facebook\n - meta\n - pytorch\n - llama\n - llama-3\n - gguf\n - imatrix\nbase_model: meta-llama/Meta-Llama-3.1-70B-Instruct\n---\n\n# Quant Infos\n\n- ~Requires latest master + [Rope Scaling PR](https://github.com/ggerganov/llama.cpp/pull/8676).~ Rope scaling is merged, so just a a recent master is required now.\n- quants done with an importance matrix for improved quantization loss\n- Quantized ggufs & imatrix from hf bf16, through bf16. `safetensors bf16 -> gguf bf16 -> quant` for *optimal* quant loss.\n- Wide coverage of different gguf quant types from Q\\_8\\_0 down to IQ1\\_S\n - experimental custom quant types\n - `_L` with `--output-tensor-type f16 --token-embedding-type f16`, which supposedly leads to better accuracy.\n- Imatrix generated with [this](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8) multi-purpose dataset by [bartowski](https://huggingface.co/bartowski).\n ```\n ./imatrix -m $model_name-bf16.gguf -f calibration_datav3.txt -o $model_name.imatrix\n ```\n\n# Original Model Card:\n\nTODO",
"related_quantizations": []
},
"tags": [
"gguf",
"facebook",
"meta",
"pytorch",
"llama",
"llama-3",
"imatrix",
"text-generation",
"en",
"de",
"fr",
"it",
"pt",
"hi",
"es",
"th",
"base_model:meta-llama/Llama-3.1-70B-Instruct",
"base_model:quantized:meta-llama/Llama-3.1-70B-Instruct",
"license:llama3.1",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 8,
"downloads": 247,
"gated": false,
"private": false,
"last_modified": "2024-08-06T09:10:09.000Z",
"created_at": "2024-07-25T12:18:38.000Z",
"pipeline_tag": "text-generation",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "66a2429e614dd62973aa9f39",
"id": "qwp4w3hyb/Meta-Llama-3.1-70B-Instruct-iMat-GGUF",
"modelId": "qwp4w3hyb/Meta-Llama-3.1-70B-Instruct-iMat-GGUF",
"sha": "6a77687765d18011e4778a544cd94a18381955b7",
"createdAt": "2024-07-25T12:18:38.000Z",
"lastModified": "2024-08-06T09:10:09.000Z",
"author": "qwp4w3hyb",
"downloads": 247,
"likes": 8,
"gated": false,
"private": false,
"pipeline_tag": "text-generation",
"library_name": "",
"siblings_count": 27
}