Model Intelligence Sheet

unsloth/glm-z1-32b-0414-gguf overview

Comprehensive model page for unsloth/glm-z1-32b-0414-gguf

transformersggufunslothtext-generationzhenarxiv:2406.12793base_model:zai-org/GLM-Z1-32B-0414base_model:quantized:zai-org/GLM-Z1-32B-0414license:mitendpoints_compatibleregion:usimatrixconversational

Downloads

645

Likes

Pipeline

text-generation

Library

transformers

Visibility

Public

Access

Open

Repository Files & Downloads

27 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
GLM-Z1-32B-0414-BF16-00001-of-00002.gguf	GGUF	BF16	46.07 GB	Download
GLM-Z1-32B-0414-BF16-00002-of-00002.gguf	GGUF	BF16	14.60 GB	Download
GLM-Z1-32B-0414-IQ4_NL.gguf	GGUF	IQ4_NL	17.31 GB	Download
GLM-Z1-32B-0414-IQ4_XS.gguf	GGUF	IQ4_XS	16.42 GB	Download
GLM-Z1-32B-0414-Q2_K.gguf	GGUF	Q2_K	11.45 GB	Download
GLM-Z1-32B-0414-Q2_K_L.gguf	GGUF	Q2_K_L	11.65 GB	Download
GLM-Z1-32B-0414-Q3_K_M.gguf	GGUF	Q3_K_M	14.80 GB	Download
GLM-Z1-32B-0414-Q3_K_S.gguf	GGUF	Q3_K_S	13.38 GB	Download
GLM-Z1-32B-0414-Q4_0.gguf	GGUF	—	17.36 GB	Download
GLM-Z1-32B-0414-Q4_1.gguf	GGUF	—	19.14 GB	Download
GLM-Z1-32B-0414-Q4_K_M.gguf	GGUF	Q4_K_M	18.33 GB	Download
GLM-Z1-32B-0414-Q4_K_S.gguf	GGUF	Q4_K_S	17.41 GB	Download
GLM-Z1-32B-0414-Q5_K_M.gguf	GGUF	Q5_K_M	21.51 GB	Download
GLM-Z1-32B-0414-Q5_K_S.gguf	GGUF	Q5_K_S	20.98 GB	Download
GLM-Z1-32B-0414-Q6_K.gguf	GGUF	Q6_K	24.89 GB	Download
GLM-Z1-32B-0414-Q8_0.gguf	GGUF	—	32.24 GB	Download
GLM-Z1-32B-0414-UD-IQ1_M.gguf	GGUF	IQ1_M	7.71 GB	Download
GLM-Z1-32B-0414-UD-IQ1_S.gguf	GGUF	IQ1_S	7.17 GB	Download
GLM-Z1-32B-0414-UD-IQ2_M.gguf	GGUF	IQ2_M	10.67 GB	Download
GLM-Z1-32B-0414-UD-IQ2_XXS.gguf	GGUF	IQ2_XXS	8.63 GB	Download
GLM-Z1-32B-0414-UD-IQ3_XXS.gguf	GGUF	IQ3_XXS	12.06 GB	Download
GLM-Z1-32B-0414-UD-Q2_K_XL.gguf	GGUF	Q2_K_XL	11.93 GB	Download
GLM-Z1-32B-0414-UD-Q3_K_XL.gguf	GGUF	Q3_K_XL	15.20 GB	Download
GLM-Z1-32B-0414-UD-Q4_K_XL.gguf	GGUF	Q4_K_XL	18.58 GB	Download
GLM-Z1-32B-0414-UD-Q5_K_XL.gguf	GGUF	Q5_K_XL	21.50 GB	Download
GLM-Z1-32B-0414-UD-Q6_K_XL.gguf	GGUF	Q6_K_XL	26.77 GB	Download
GLM-Z1-32B-0414-UD-Q8_K_XL.gguf	GGUF	Q8_K_XL	36.78 GB	Download

Model Details Live

Model Slug

unsloth/glm-z1-32b-0414-gguf

Author

unsloth

Pipeline Task

text-generation

Library

transformers

Created

2025-07-03

Last Modified

2025-07-03

Gated

Private

HF SHA

1a27b7404359fca3d63cfbef2f95cd7c43c2e945

License

mit

Language

zh, en

Base Model

THUDM/GLM-Z1-32B-0414

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "mit",
    "language": [
      "zh",
      "en"
    ],
    "pipeline_tag": "text-generation",
    "library_name": "transformers",
    "tags": [
      "unsloth"
    ],
    "base_model": [
      "THUDM/GLM-Z1-32B-0414"
    ],
    "frontmatter": {
      "license": "mit",
      "language": [
        "zh",
        "en"
      ],
      "pipeline_tag": "text-generation",
      "library_name": "transformers",
      "tags": [
        "unsloth"
      ],
      "base_model": [
        "THUDM/GLM-Z1-32B-0414"
      ]
    },
    "hero_image_url": "https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png",
    "summary": "",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: mit\nlanguage:\n- zh\n- en\npipeline_tag: text-generation\nlibrary_name: transformers\ntags:\n- unsloth\nbase_model:\n- THUDM/GLM-Z1-32B-0414\n---\n\n> [!NOTE]  \n>  If you are using `llama.cpp`, use `--jinja` to enable the system prompt.\n>\n\n<div>\n<p style=\"margin-top: 0;margin-bottom: 0;\">\n    <em><a href=\"https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf\">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>\n  </p>\n  <div style=\"display: flex; gap: 5px; align-items: center; \">\n    <a href=\"https://github.com/unslothai/unsloth/\">\n      <img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"133\">\n    </a>\n    <a href=\"https://discord.gg/unsloth\">\n      <img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png\" width=\"173\">\n    </a>\n    <a href=\"https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune\">\n      <img src=\"https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png\" width=\"143\">\n    </a>\n  </div>\n</div>\n\n# GLM-4-Z1-32B-0414\n\n## Introduction\n\nThe GLM family welcomes a new generation of open-source models, the **GLM-4-32B-0414** series, featuring 32 billion parameters. Its performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series, and it supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including a large amount of reasoning-type synthetic data, laying the foundation for subsequent reinforcement learning extensions. In the post-training stage, in addition to human preference alignment for dialogue scenarios, we also enhanced the model's performance in instruction following, engineering code, and function calling using techniques such as rejection sampling and reinforcement learning, strengthening the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in areas such as engineering code, Artifact generation, function calling, search-based Q&A, and report generation. Some benchmarks even rival larger models like GPT-4o and DeepSeek-V3-0324 (671B).\n\n**GLM-Z1-32B-0414** is a reasoning model with **deep thinking capabilities**. This was developed based on GLM-4-32B-0414 through cold start and extended reinforcement learning, as well as further training of the model on tasks involving mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. During the training process, we also introduced general reinforcement learning based on pairwise ranking feedback, further enhancing the model's general capabilities.\n\n**GLM-Z1-Rumination-32B-0414** is a deep reasoning model with **rumination capabilities** (benchmarked against OpenAI's Deep Research). Unlike typical deep thinking models, the rumination model employs longer periods of deep thought to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). The rumination model integrates search tools during its deep thinking process to handle complex tasks and is trained by utilizing multiple rule-based rewards to guide and extend end-to-end reinforcement learning. Z1-Rumination shows significant improvements in research-style writing and complex retrieval tasks.\n\nFinally, **GLM-Z1-9B-0414** is a surprise. We employed the aforementioned series of techniques to train a 9B small-sized model that maintains the open-source tradition. Despite its smaller scale, GLM-Z1-9B-0414 still exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is already at a leading level among open-source models of the same size. Especially in resource-constrained scenarios, this model achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking lightweight deployment.\n\n## Performance\n\n<p align=\"center\">\n  <img width=\"100%\" src=\"https://raw.githubusercontent.com/THUDM/GLM-4/refs/heads/main/resources/Bench-Z1-32B.png\">\n</p>\n\n<p align=\"center\">\n  <img width=\"100%\" src=\"https://raw.githubusercontent.com/THUDM/GLM-4/refs/heads/main/resources/Bench-Z1-9B.png\">\n</p>\n\n## Model Usage Guidelines\n\n### I. Sampling Parameters\n\n| Parameter    | Recommended Value | Description                                  |\n| ------------ | ----------------- | -------------------------------------------- |\n| temperature  | **0.6**           | Balances creativity and stability            |\n| top_p        | **0.95**          | Cumulative probability threshold for sampling|\n| top_k        | **40**         | Filters out rare tokens while maintaining diversity |\n| max_new_tokens        | **30000**         | Leaves enough tokens for thinking |\n\n### II. Enforced Thinking\n\n- Add \\<think\\>\\n to the **first line**: Ensures the model thinks before responding  \n- When using `chat_template.jinja`, the prompt is automatically injected to enforce this behavior\n\n\n### III. Dialogue History Trimming\n\n- Retain only the **final user-visible reply**.  \n  Hidden thinking content should **not** be saved to history to reduce interference—this is already implemented in `chat_template.jinja`\n\n\n### IV. Handling Long Contexts (YaRN)\n\n- When input length exceeds **8,192 tokens**, consider enabling YaRN (Rope Scaling)\n\n- In supported frameworks, add the following snippet to `config.json`:\n\n  ```json\n  \"rope_scaling\": {\n    \"type\": \"yarn\",\n    \"factor\": 4.0,\n    \"original_max_position_embeddings\": 32768\n  }\n  ```\n\n- **Static YaRN** applies uniformly to all text. It may slightly degrade performance on short texts, so enable as needed.\n\n\n## Inference Code\n\nMake Sure Using `transforemrs>=4.51.3`.\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nMODEL_PATH = \"THUDM/GLM-4-Z1-32B-0414\"\n\ntokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)\nmodel = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map=\"auto\")\n\nmessage = [{\"role\": \"user\", \"content\": \"Let a, b be positive real numbers such that ab = a + b + 3. Determine the range of possible values for a + b.\"}]\n\ninputs = tokenizer.apply_chat_template(\n    message,\n    return_tensors=\"pt\",\n    add_generation_prompt=True,\n    return_dict=True,\n).to(model.device)\n\ngenerate_kwargs = {\n    \"input_ids\": inputs[\"input_ids\"],\n    \"attention_mask\": inputs[\"attention_mask\"],\n    \"max_new_tokens\": 4096,\n    \"do_sample\": False,\n}\nout = model.generate(**generate_kwargs)\nprint(tokenizer.decode(out[0][inputs[\"input_ids\"].shape[1]:], skip_special_tokens=True))\n```\n\n## Citations\n\nIf you find our work useful, please consider citing the following paper.\n\n```\n@misc{glm2024chatglm,\n      title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools}, \n      author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},\n      year={2024},\n      eprint={2406.12793},\n      archivePrefix={arXiv},\n      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}\n}\n```",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "unsloth",
    "text-generation",
    "zh",
    "en",
    "arxiv:2406.12793",
    "base_model:zai-org/GLM-Z1-32B-0414",
    "base_model:quantized:zai-org/GLM-Z1-32B-0414",
    "license:mit",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 3,
  "downloads": 645,
  "gated": false,
  "private": false,
  "last_modified": "2025-07-03T11:45:05.000Z",
  "created_at": "2025-07-03T07:50:56.000Z",
  "pipeline_tag": "text-generation",
  "library_name": "transformers"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "68663660cdfe8499d93e728f",
  "id": "unsloth/GLM-Z1-32B-0414-GGUF",
  "modelId": "unsloth/GLM-Z1-32B-0414-GGUF",
  "sha": "1a27b7404359fca3d63cfbef2f95cd7c43c2e945",
  "createdAt": "2025-07-03T07:50:56.000Z",
  "lastModified": "2025-07-03T11:45:05.000Z",
  "author": "unsloth",
  "downloads": 645,
  "likes": 3,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "transformers",
  "siblings_count": 32
}