Model Intelligence Sheet

nexesquants/wintergoddess-1.4x-limarpv3-70b-l2-32k-requant.gguf overview

Quants for Sao10K's model WinterGoddess 1.4 70b : https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2 With a twist : the model I used come from a third party, and has been tweaked with limarvp3 and a Linear Rope 8 training to go to 32k context (with even better results in rope 4 and rope 2, maybe other lesser ropes as well) I don't know who did the job, only that I found this Q4KS quant of it hanging around without FP16 : https://huggingface.co/mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF So I made a Q80 out of it (best way to requantize after), and requantized it in : Full offload possible on 48GB VRAM with a huge context size : Q3KL Full offload possible on 36GB VRAM with a variable context size (up to 7168 with Q3KM, for example) Q3KM, Q3KS, Q3KXS, IQ3XXS SOTA (which is equivalent to a Q3KS with more context! (filename is partly wrong, ch2500 is the real values)) Lower quality : Q2K, Q2KS Full offload possible on 24GB VRAM with a decent context size. IQ2XS SOTA (filename is partly wrong, b2035 and ch2500 are the real values) The higher ch number, the better the quality. And a bonus to play with it, my KoboldCPP-v1.55.1.b1933-Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1b1933 ----- Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3KS and Q2K with an small iMatrix of ctx 32 with 25 chunks (so, 800 tokens). And good news, it lowers the perplexity by : More than 3% with linear rope 8 (Pos Compress Embeddings) on Q2K More than 2% with linear ropee 4 on Q2K More than 1.5% with linear rope 2 on Q2K More than 1% with linear rope 8 on Q3KS ----- Edit : A Q3KXS, new quant offered in LlamaCPP, is otw, with a iMatrix of ctx 32 with 2500 chunks (so, 80,000 tokens) ----- Interestingly, linear rope 2.5 (and linear rope 1.6 as well after further testing) is almost without loss compared to linear rope 2, while 3 and 3.2 are quite good. Here are the values with the normal Q2K : And for the adventurous, linear rope 10 : (max context 40960) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2K.gguf,-,wikitext,7.1577,512 So the linear rope, at least on this model, is flexible, and you can lower it to have the best peplexity for your max context. All these results are reproducible with lowers deltas between them for Q3KS, and I suppose for other quants as well. Then, I wonder about applying a NTK rope on the top of it to expend it further, even if it screws with the integrity of numbers in chat). Multiply a linear rope (2, 4, 8, whatever) by 5888 (Alpha 1.6, or RBF 16119.8), 6144 (Alpha 1.8, or RBF 18168.7) and even 7424 (Alpha 2.2, or RBF 22277). This to get a further boost in max context size. Ex with Linear 8 with Alpha 1.8/RBF22277 : 8*7424 = 59392. It's only theorical of course, but worth testing. ----- Original 70b 4k model perplexity : Benchs of the original Q4K_S quant I found : Linear rope 8 10000 Linear rope 4 10000 Linear rope 2 10000 Linear rope 1 10000

gguflicense:llama2endpoints_compatibleregion:us

nexesquants/wintergoddess-1.4x-limarpv3-70b-l2-32k-requant.gguf visual

Downloads

304

Likes

Pipeline

—

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

12 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf	GGUF	Q2_K	23.71 GB	Download
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf	GGUF	Q3_K_S	27.87 GB	Download
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf	GGUF	Q2_K	23.71 GB	Download
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K_S.gguf	GGUF	Q2_K_S	21.94 GB	Download
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_M.gguf	GGUF	Q3_K_M	30.99 GB	Download
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf	GGUF	Q3_K_S	27.87 GB	Download
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q2_K.gguf	GGUF	Q2_K	23.71 GB	Download
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q2_K_S.gguf	GGUF	Q2_K_S	21.94 GB	Download
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf	GGUF	Q3_K_XS	26.32 GB	Download
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-b2007-iMat-c32_ch400-IQ2_XS.gguf	GGUF	IQ2_XS	18.94 GB	Download
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-b2007-iMat-c32_ch400-IQ3_XXS.gguf	GGUF	IQ3_XXS	25.17 GB	Download
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-b2131-iMat-c32_ch400-IQ1_S.gguf	GGUF	IQ1_S	13.54 GB	Download

Model Details Live

Model Slug

nexesquants/wintergoddess-1.4x-limarpv3-70b-l2-32k-requant.gguf

Author

NexesQuants

Pipeline Task

—

Library

—

Created

2024-01-20

Last Modified

2024-02-13

Gated

Private

HF SHA

e40d11fd2a6d3201efa8f7721e5e842e12e07717

License

llama2

Language

Unknown

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "llama2",
    "frontmatter": {
      "license": "llama2"
    },
    "hero_image_url": "",
    "summary": "Quants for Sao10K's model WinterGoddess 1.4 70b : https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2 With a twist : the model I used come from a third party, and has been tweaked with limarvp3 and a Linear Rope 8 training to go to 32k context (with even better results in rope 4 and rope 2, maybe other lesser ropes as well) I don't know who did the job, only that I found this Q4_K_S quant of it hanging around without FP16 : https://huggingface.co/mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF So I made a Q8_0 out of it (best way to requantize after), and requantized it in : Full offload possible on 48GB VRAM with a huge context size : Q3_K_L Full offload possible on 36GB VRAM with a variable context size (up to 7168 with Q3_K_M, for example) Q3_K_M, Q3_K_S, Q3_K_XS, IQ3_XXS SOTA (which is equivalent to a Q3_K_S with more context! (filename is partly wrong, ch2500 is the real values)) Lower quality : Q2_K, Q2_K_S Full offload possible on 24GB VRAM with a decent context size. IQ2_XS SOTA (filename is partly wrong, b2035 and ch2500 are the real values) The higher ch number, the better the quality. And a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933 ----- Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 800 tokens). And good news, it lowers the perplexity by : More than 3% with linear rope 8 (Pos Compress Embeddings) on Q2_K More than 2% with linear ropee 4 on Q2_K More than 1.5% with linear rope 2 on Q2_K More than 1% with linear rope 8 on Q3_K_S ----- Edit : A Q3_K_XS, new quant offered in LlamaCPP, is otw, with a iMatrix of ctx 32 with 2500 chunks (so, 80,000 tokens) ----- Interestingly, linear rope 2.5 (and linear rope 1.6 as well after further testing) is almost without loss compared to linear rope 2, while 3 and 3.2 are quite good. Here are the values with the normal Q2_K : And for the adventurous, linear rope 10 : (max context 40960) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,7.1577,512 So the linear rope, at least on this model, is flexible, and you can lower it to have the best peplexity for your max context. All these results are reproducible with lowers deltas between them for Q3_K_S, and I suppose for other quants as well. Then, I wonder about applying a NTK rope on the top of it to expend it further, even if it screws with the integrity of numbers in chat). Multiply a linear rope (2, 4, 8, whatever) by 5888 (Alpha 1.6, or RBF 16119.8), 6144 (Alpha 1.8, or RBF 18168.7) and even 7424 (Alpha 2.2, or RBF 22277). This to get a further boost in max context size. Ex with Linear 8 with Alpha 1.8/RBF22277 : 8*7424 = 59392. It's only theorical of course, but worth testing. ----- Original 70b 4k model perplexity : Benchs of the original Q4_K_S quant I found : Linear rope 8 10000 Linear rope 4 10000 Linear rope 2 10000 Linear rope 1 10000",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: llama2\n---\nQuants for Sao10K's model WinterGoddess 1.4 70b : https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2\n\nWith a twist : the model I used come from a third party, and has been tweaked with limarvp3 and a Linear Rope 8 training to go to 32k context (with even better results in rope 4 and rope 2, maybe other lesser ropes as well)\n\nI don't know who did the job, only that I found this Q4_K_S quant of it hanging around without FP16 : https://huggingface.co/mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF\n\nSo I made a Q8_0 out of it (best way to requantize after), and requantized it in :\n\nFull offload possible on 48GB VRAM with a huge context size :\n\n    Q3_K_L\n\nFull offload possible on 36GB VRAM with a variable context size (up to 7168 with Q3_K_M, for example)\n\n    Q3_K_M, Q3_K_S, Q3_K_XS,\n    IQ3_XXS SOTA (which is equivalent to a Q3_K_S with more context! (filename is partly wrong, ch2500 is the real values))\n    Lower quality : Q2_K, Q2_K_S\n\nFull offload possible on 24GB VRAM with a decent context size.\n\n    IQ2_XS SOTA (filename is partly wrong, b2035 and ch2500 are the real values)\n\nThe higher ch number, the better the quality.\n\nAnd a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933 \n\n-----\n\nEdit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 800 tokens).\nAnd good news, it lowers the perplexity by :\n\nMore than 3% with linear rope 8 (Pos Compress Embeddings) on Q2_K\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,6.2489,512 \n- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.0482,512\n\nMore than 2% with linear ropee 4 on Q2_K\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.8859,512 \n- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.7739,512\n\nMore than 1.5% with linear rope 2 on Q2_K\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5030,512 \n- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.42,512\n\nMore than 1% with linear rope 8 on Q3_K_S\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512 \n- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512\n\n-----\n\nEdit : A Q3_K_XS, new quant offered in LlamaCPP, is otw, with a iMatrix of ctx 32 with 2500 chunks (so, 80,000 tokens)\n\n-----\n\nInterestingly, linear rope 2.5 (and linear rope 1.6 as well after further testing) is almost without loss compared to linear rope 2, while 3 and 3.2 are quite good. Here are the values with the normal Q2_K :\n- Linear rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,4.0509,512\n- Linear rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,wikitext,4.2327\n- Linear rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5246,512\n- Linear rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K_S.gguf,-,wikitext,4.6789,512\n\n- Linear rope 3 (max context 12288) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6203,512 \n- Linear rope 3.2 (max context 13107) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6679,512\n\nAnd for the adventurous, linear rope 10 : (max context 40960) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,7.1577,512 \n- Minus 3% With my Q2_K with c32ch25 iMatrix : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.9405,512\n\nSo the linear rope, at least on this model, is flexible, and you can lower it to have the best peplexity for your max context.\n\nAll these results are reproducible with lowers deltas between them for Q3_K_S, and I suppose for other quants as well.\n\nThen, I wonder about applying a NTK rope on the top of it to expend it further, even if it screws with the integrity of numbers in chat).\nMultiply a linear rope (2, 4, 8, whatever) by 5888 (Alpha 1.6, or RBF 16119.8), 6144 (Alpha 1.8, or RBF 18168.7) and even 7424 (Alpha 2.2, or RBF 22277).\nThis to get a further boost in max context size. Ex with Linear 8 with Alpha 1.8/RBF22277 : 8*7424 = 59392.\nIt's only theorical of course, but worth testing.\n\n-----\n\nOriginal 70b 4k model perplexity :\n- WinterGoddess-1.4x-70B-L2.Q3_K_M.gguf,-,wikitext,3.7428,512,PEC1\n\nBenchs of the original Q4_K_S quant I found :\n\nLinear rope 8 10000\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.2177,4096\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1324,6144\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.3923,2048\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.4945,1536\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.6700,1024\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,5.2577,512\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,84.5,,400\n\nLinear rope 4 10000\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.5762,2048\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1235,512\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,87.25,,400\n\nLinear rope 2 10000\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.3394 *327,2048\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.8254,512\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,88,,400\n\nLinear rope 1 10000\n- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,85,,400",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "license:llama2",
    "endpoints_compatible",
    "region:us"
  ],
  "likes": 2,
  "downloads": 304,
  "gated": false,
  "private": false,
  "last_modified": "2024-02-13T14:10:04.000Z",
  "created_at": "2024-01-20T23:56:26.000Z",
  "pipeline_tag": "",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "65ac5daa819fbfaf49eec9c9",
  "id": "NexesQuants/WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant.GGUF",
  "modelId": "NexesQuants/WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant.GGUF",
  "sha": "e40d11fd2a6d3201efa8f7721e5e842e12e07717",
  "createdAt": "2024-01-20T23:56:26.000Z",
  "lastModified": "2024-02-13T14:10:04.000Z",
  "author": "NexesQuants",
  "downloads": 304,
  "likes": 2,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 16
}