duyntnet/cogito-v1-preview-llama-8b-imatrix-gguf IQ1_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

duyntnet/cogito-v1-preview-llama-8b-imatrix-gguf overview

The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use. # Usage Here is a snippet below for usage with Transformers:

transformersggufimatrixcogito-v1-preview-llama-8Btext-generationenlicense:otherregion:usconversational

duyntnet/cogito-v1-preview-llama-8b-imatrix-gguf visual

Downloads

259

Likes

Pipeline

text-generation

Library

transformers

Visibility

Public

Access

Open

Repository Files & Downloads

27 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
cogito-v1-preview-llama-8B-IQ1_M.gguf	GGUF	IQ1_M	2.01 GB	Download
cogito-v1-preview-llama-8B-IQ1_S.gguf	GGUF	IQ1_S	1.88 GB	Download
cogito-v1-preview-llama-8B-IQ2_M.gguf	GGUF	IQ2_M	2.75 GB	Download
cogito-v1-preview-llama-8B-IQ2_S.gguf	GGUF	IQ2_S	2.57 GB	Download
cogito-v1-preview-llama-8B-IQ2_XS.gguf	GGUF	IQ2_XS	2.43 GB	Download
cogito-v1-preview-llama-8B-IQ2_XXS.gguf	GGUF	IQ2_XXS	2.23 GB	Download
cogito-v1-preview-llama-8B-IQ3_M.gguf	GGUF	IQ3_M	3.52 GB	Download
cogito-v1-preview-llama-8B-IQ3_S.gguf	GGUF	IQ3_S	3.43 GB	Download
cogito-v1-preview-llama-8B-IQ3_XS.gguf	GGUF	IQ3_XS	3.28 GB	Download
cogito-v1-preview-llama-8B-IQ3_XXS.gguf	GGUF	IQ3_XXS	3.05 GB	Download
cogito-v1-preview-llama-8B-IQ4_NL.gguf	GGUF	IQ4_NL	4.36 GB	Download
cogito-v1-preview-llama-8B-IQ4_XS.gguf	GGUF	IQ4_XS	4.14 GB	Download
cogito-v1-preview-llama-8B-Q2_K.gguf	GGUF	Q2_K	2.96 GB	Download
cogito-v1-preview-llama-8B-Q2_K_S.gguf	GGUF	Q2_K_S	2.78 GB	Download
cogito-v1-preview-llama-8B-Q3_K_L.gguf	GGUF	Q3_K_L	4.03 GB	Download
cogito-v1-preview-llama-8B-Q3_K_M.gguf	GGUF	Q3_K_M	3.74 GB	Download
cogito-v1-preview-llama-8B-Q3_K_S.gguf	GGUF	Q3_K_S	3.41 GB	Download
cogito-v1-preview-llama-8B-Q4_0.gguf	GGUF	—	4.35 GB	Download
cogito-v1-preview-llama-8B-Q4_1.gguf	GGUF	—	4.78 GB	Download
cogito-v1-preview-llama-8B-Q4_K_M.gguf	GGUF	Q4_K_M	4.58 GB	Download
cogito-v1-preview-llama-8B-Q4_K_S.gguf	GGUF	Q4_K_S	4.37 GB	Download
cogito-v1-preview-llama-8B-Q5_0.gguf	GGUF	—	5.23 GB	Download
cogito-v1-preview-llama-8B-Q5_1.gguf	GGUF	—	5.65 GB	Download
cogito-v1-preview-llama-8B-Q5_K_M.gguf	GGUF	Q5_K_M	5.34 GB	Download
cogito-v1-preview-llama-8B-Q5_K_S.gguf	GGUF	Q5_K_S	5.21 GB	Download
cogito-v1-preview-llama-8B-Q6_K.gguf	GGUF	Q6_K	6.14 GB	Download
cogito-v1-preview-llama-8B-Q8_0.gguf	GGUF	—	7.95 GB	Download

Model Details Live

Model Slug

duyntnet/cogito-v1-preview-llama-8b-imatrix-gguf

Author

duyntnet

Pipeline Task

text-generation

Library

transformers

Created

2025-04-10

Last Modified

2025-04-10

Gated

Private

HF SHA

5725a490c158644f2c1e24e69b76d1ab2e610bf3

License

other

Language

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "other",
    "language": [
      "en"
    ],
    "pipeline_tag": "text-generation",
    "inference": false,
    "tags": [
      "transformers",
      "gguf",
      "imatrix",
      "cogito-v1-preview-llama-8B"
    ],
    "frontmatter": {
      "license": "other",
      "language": [
        "en"
      ],
      "pipeline_tag": "text-generation",
      "inference": "false",
      "tags": [
        "transformers",
        "gguf",
        "imatrix",
        "cogito-v1-preview-llama-8B"
      ]
    },
    "hero_image_url": "",
    "summary": "The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use. # Usage Here is a snippet below for usage with Transformers: ``python import transformers import torch model_id = \"deepcogito/cogito-v1-preview-llama-8B\" pipeline = transformers.pipeline( \"text-generation\", model=model_id, model_kwargs={\"torch_dtype\": torch.bfloat16}, device_map=\"auto\", ) messages = [ {\"role\": \"system\", \"content\": \"You are a pirate chatbot who always responds in pirate speak!\"}, {\"role\": \"user\", \"content\": \"Give me a short introduction to LLMs.\"}, ] outputs = pipeline( messages, max_new_tokens=512, ) print(outputs[0][\"generated_text\"][-1]) ``",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: other\nlanguage:\n- en\npipeline_tag: text-generation\ninference: false\ntags:\n- transformers\n- gguf\n- imatrix\n- cogito-v1-preview-llama-8B\n---\nQuantizations of https://huggingface.co/deepcogito/cogito-v1-preview-llama-8B\n\n\n### Open source inference clients/UIs\n* [llama.cpp](https://github.com/ggerganov/llama.cpp)\n* [KoboldCPP](https://github.com/LostRuins/koboldcpp)\n* [ollama](https://github.com/ollama/ollama)\n* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)\n* [jan](https://github.com/janhq/jan)\n* [GPT4All](https://github.com/nomic-ai/gpt4all)\n\n### Closed source inference clients/UIs\n* [LM Studio](https://lmstudio.ai/)\n* [Backyard AI](https://backyard.ai/)\n* More will be added...\n---\n\n# From original readme\n\nThe Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use.\n\n- Cogito models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models).\n- The LLMs are trained using **Iterated Distillation and Amplification (IDA)** - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement.\n- The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts.\n  - In both standard and reasoning modes, Cogito v1-preview models outperform their size equivalent counterparts on common industry benchmarks. \n- Each model is trained in over 30 languages and supports a context length of 128k.\n\n\n# Usage\nHere is a snippet below for usage with Transformers:\n\n```python\nimport transformers\nimport torch\n\nmodel_id = \"deepcogito/cogito-v1-preview-llama-8B\"\n\npipeline = transformers.pipeline(\n    \"text-generation\",\n    model=model_id,\n    model_kwargs={\"torch_dtype\": torch.bfloat16},\n    device_map=\"auto\",\n)\n\nmessages = [\n    {\"role\": \"system\", \"content\": \"You are a pirate chatbot who always responds in pirate speak!\"},\n    {\"role\": \"user\", \"content\": \"Give me a short introduction to LLMs.\"},\n]\n\noutputs = pipeline(\n    messages,\n    max_new_tokens=512,\n)\n\nprint(outputs[0][\"generated_text\"][-1])\n```\n\n\n\n## Implementing extended thinking\n- By default, the model will answer in the standard mode. \n- To enable thinking, you can do any one of the two methods:\n  - Add a specific system prompt, or \n  - Set `enable_thinking=True` while applying the chat template.\n\n\n### Method 1 - Add a specific system prompt.\nTo enable thinking, simply use this in the system prompt `system_instruction = 'Enable deep thinking subroutine.'`\n\nIf you already have a system_instruction, then use `system_instruction = 'Enable deep thinking subroutine.' + '\\n\\n' + system_instruction`.\n\nHere is an example - \n\n```python\nimport transformers\nimport torch\n\nmodel_id = \"deepcogito/cogito-v1-preview-llama-8B\"\n\npipeline = transformers.pipeline(\n    \"text-generation\",\n    model=model_id,\n    model_kwargs={\"torch_dtype\": torch.bfloat16},\n    device_map=\"auto\",\n)\n\nDEEP_THINKING_INSTRUCTION = \"Enable deep thinking subroutine.\"\n\nmessages = [\n    {\"role\": \"system\", \"content\": DEEP_THINKING_INSTRUCTION},\n    {\"role\": \"user\", \"content\": \"Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format.\"},\n]\n\noutputs = pipeline(\n    messages,\n    max_new_tokens=512,\n)\n\nprint(outputs[0][\"generated_text\"][-1])\n```\n\n\nSimilarly, if you have a system prompt, you can append the `DEEP_THINKING_INSTRUCTION` to the beginning in this way - \n\n```python\nDEEP_THINKING_INSTRUCTION = \"Enable deep thinking subroutine.\"\n\nsystem_prompt = \"Reply to each prompt with only the actual code - no explanations.\"\nprompt = \"Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format.\"\n\nmessages = [\n    {\"role\": \"system\", \"content\": DEEP_THINKING_INSTRUCTION + '\\n\\n' + system_prompt},\n    {\"role\": \"user\", \"content\": prompt}\n]\n```\n\n### Method 2 - Set enable_thinking=True in the tokenizer\nIf you are using Huggingface tokenizers, then you can simply use add the argument `enable_thinking=True` to the tokenization (this option is added to the chat template).\n\nHere is an example - \n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_name = \"deepcogito/cogito-v1-preview-llama-8B\"\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_name,\n    torch_dtype=\"auto\",\n    device_map=\"auto\"\n)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n\nprompt = \"Give me a short introduction to LLMs.\"\nmessages = [\n    {\"role\": \"system\", \"content\": \"You are a pirate chatbot who always responds in pirate speak!\"},\n    {\"role\": \"user\", \"content\": prompt}\n]\n\ntext = tokenizer.apply_chat_template(\n    messages,\n    tokenize=False,\n    add_generation_prompt=True,\n    enable_thinking=True\n)\nmodel_inputs = tokenizer([text], return_tensors=\"pt\").to(model.device)\n\ngenerated_ids = model.generate(\n    **model_inputs,\n    max_new_tokens=512\n)\ngenerated_ids = [\n    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)\n]\n\nresponse = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]\nprint(response)\n```\n\n# Tool Calling\nCogito models support tool calling (single, parallel, multiple and parallel_multiple) both in standard and extended thinking mode.\n\nHere is a snippet -\n\n```python\n# First, define a tool\ndef get_current_temperature(location: str) -> float:\n    \"\"\"\n    Get the current temperature at a location.\n    \n    Args:\n        location: The location to get the temperature for, in the format \"City, Country\"\n    Returns:\n        The current temperature at the specified location in the specified units, as a float.\n    \"\"\"\n    return 22.  # A real function should probably actually get the temperature!\n\n# Next, create a chat and apply the chat template\nmessages = [\n  {\"role\": \"user\", \"content\": \"Hey, what's the temperature in Paris right now?\"}\n]\n\nmodel_inputs = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True)\n\ntext = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True, tokenize=False)\ninputs = tokenizer(text, return_tensors=\"pt\", add_special_tokens=False).to(model.device)\noutputs = model.generate(**inputs, max_new_tokens=512)\noutput_text = tokenizer.batch_decode(outputs)[0][len(text):]\nprint(output_text)\n```\n\nThis will result in the output - \n```\n<tool_call>\n{\"name\": \"get_current_temperature\", \"arguments\": {\"location\": \"Paris, France\"}}\n</tool_call><|eot_id|>\n```\n\nYou can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so:\n\n```python\ntool_call = {\"name\": \"get_current_temperature\", \"arguments\": {\"location\": \"Paris, France\"}}\nmessages.append({\"role\": \"assistant\", \"tool_calls\": [{\"type\": \"function\", \"function\": tool_call}]})\n```\n\nand then call the tool and append the result, with the `tool` role, like so:\n\n```python\nmessages.append({\"role\": \"tool\", \"name\": \"get_current_temperature\", \"content\": \"22.0\"})\n```\n\nAfter that, you can `generate()` again to let the model use the tool result in the chat:\n\n```python\ntext = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True, tokenize=False)\ninputs = tokenizer(text, return_tensors=\"pt\", add_special_tokens=False).to(model.device)\noutputs = model.generate(**inputs, max_new_tokens=512)\noutput_text = tokenizer.batch_decode(outputs)[0][len(text):]\n```\n\nThis should result in the string -\n```\n'The current temperature in Paris is 22.0 degrees.<|eot_id|>'\n```",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "imatrix",
    "cogito-v1-preview-llama-8B",
    "text-generation",
    "en",
    "license:other",
    "region:us",
    "conversational"
  ],
  "likes": 0,
  "downloads": 259,
  "gated": false,
  "private": false,
  "last_modified": "2025-04-10T21:47:25.000Z",
  "created_at": "2025-04-10T20:49:23.000Z",
  "pipeline_tag": "text-generation",
  "library_name": "transformers"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "67f82ed3969d96be6d5a08cc",
  "id": "duyntnet/cogito-v1-preview-llama-8B-imatrix-GGUF",
  "modelId": "duyntnet/cogito-v1-preview-llama-8B-imatrix-GGUF",
  "sha": "5725a490c158644f2c1e24e69b76d1ab2e610bf3",
  "createdAt": "2025-04-10T20:49:23.000Z",
  "lastModified": "2025-04-10T21:47:25.000Z",
  "author": "duyntnet",
  "downloads": 259,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "transformers",
  "siblings_count": 29
}

duyntnet/cogito-v1-preview-llama-8b-imatrix-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard