lucianosb/llama-2-7b-langchain-chat-gguf q4_0 GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

lucianosb/llama-2-7b-langchain-chat-gguf overview

Comprehensive model page for lucianosb/llama-2-7b-langchain-chat-gguf

gguftext-generationptenesrudeplthvisvbndaheitfaskidnbelhueuzheojacacsbgfitr

lucianosb/llama-2-7b-langchain-chat-gguf visual

Downloads

216

Likes

Pipeline

text-generation

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

5 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
llama-2-7b-langchain-chat-q4_0.gguf	GGUF	—	3.56 GB	Download
llama-2-7b-langchain-chat-q4_1.gguf	GGUF	—	3.95 GB	Download
llama-2-7b-langchain-chat-q5_0.gguf	GGUF	—	4.33 GB	Download
llama-2-7b-langchain-chat-q5_1.gguf	GGUF	—	4.72 GB	Download
llama-2-7b-langchain-chat-q8_0.gguf	GGUF	—	6.67 GB	Download

Model Details Live

Model Slug

lucianosb/llama-2-7b-langchain-chat-gguf

Author

lucianosb

Pipeline Task

text-generation

Library

—

Created

2023-08-28

Last Modified

2023-08-29

Gated

Private

HF SHA

8a59e8283c2d506e311888e7f305c571a78569e2

License

llama2

Language

pt, en, es, ru, de, pl, th, vi, sv, bn, da, he, it, fa, sk, id, nb, el, hu, eu, zh, eo, ja, ca, cs, bg, fi, tr, ro, ar, uk, ko, gl, fr, nl

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "inference": false,
    "language": [
      "pt",
      "en",
      "es",
      "ru",
      "de",
      "pl",
      "th",
      "vi",
      "sv",
      "bn",
      "da",
      "he",
      "it",
      "fa",
      "sk",
      "id",
      "nb",
      "el",
      "hu",
      "eu",
      "zh",
      "eo",
      "ja",
      "ca",
      "cs",
      "bg",
      "fi",
      "tr",
      "ro",
      "ar",
      "uk",
      "ko",
      "gl",
      "fr",
      "nl"
    ],
    "license": "llama2",
    "model_creator": "Photolens",
    "model_link": "https://huggingface.co/Photolens/llama-2-7b-langchain-chat",
    "model_name": "lama-2-7b-langchain-chat",
    "model_type": "llama",
    "quantized_by": "lucianosb",
    "pipeline_tag": "text-generation",
    "datasets": [
      "Photolens/oasst1-langchain-llama-2-formatted"
    ],
    "frontmatter": {
      "inference": "false",
      "language": [
        "pt",
        "en",
        "es",
        "ru",
        "de",
        "pl",
        "th",
        "vi",
        "sv",
        "bn",
        "da",
        "he",
        "it",
        "fa",
        "sk",
        "id",
        "nb",
        "el",
        "hu",
        "eu",
        "zh",
        "eo",
        "ja",
        "ca",
        "cs",
        "bg",
        "fi",
        "tr",
        "ro",
        "ar",
        "uk",
        "ko",
        "gl",
        "fr",
        "nl"
      ],
      "license": "llama2",
      "model_creator": "Photolens",
      "model_link": "https://huggingface.co/Photolens/llama-2-7b-langchain-chat",
      "model_name": "lama-2-7b-langchain-chat",
      "model_type": "llama",
      "quantized_by": "lucianosb",
      "pipeline_tag": "text-generation",
      "datasets": [
        "Photolens/oasst1-langchain-llama-2-formatted"
      ]
    },
    "hero_image_url": "",
    "summary": "",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\ninference: false\nlanguage:\n- pt\n- en\n- es\n- ru\n- de\n- pl\n- th\n- vi\n- sv\n- bn\n- da\n- he\n- it\n- fa\n- sk\n- id\n- nb\n- el\n- hu\n- eu\n- zh\n- eo\n- ja\n- ca\n- cs\n- bg\n- fi\n- tr\n- ro\n- ar\n- uk\n- ko\n- gl\n- fr\n- nl\nlicense: llama2\nmodel_creator: Photolens\nmodel_link: https://huggingface.co/Photolens/llama-2-7b-langchain-chat\nmodel_name: lama-2-7b-langchain-chat\nmodel_type: llama\nquantized_by: lucianosb\npipeline_tag: text-generation\ndatasets:\n- Photolens/oasst1-langchain-llama-2-formatted\n---\n\n# lama-2-7b-langchain-chat - GGUF\n- Criador do Modelo: [Photolens](https://huggingface.co/Photolens)\n- Modelo Original: [llama-2-7b-langchain-chat](https://huggingface.co/Photolens/llama-2-7b-langchain-chat)\n\n## Arquivos Incluídos\n\n| Nome | Método Quant | Bits | Tamanho  | Desc |\n| ---- | ---- | ---- | ---- | ----- |\n| [llama-2-7b-langchain-chat-q4_0.gguf](https://huggingface.co/lucianosb/llama-2-7b-langchain-chat-GGUF/blob/main/llama-2-7b-langchain-chat-q4_0.gguf) | q4_0 | 4 | 3.56 GB | Quantização em 4-bit. |\n| [llama-2-7b-langchain-chat-q4_1.gguf](https://huggingface.co/lucianosb/llama-2-7b-langchain-chat-GGUF/blob/main/llama-2-7b-langchain-chat-q4_1.gguf) | q4_1 | 4 | 3.95 GB | Quantização em 4-bit. Acurácia maior que q4_0 mas não tão boa quanto q5_0. Inferência mais rápida que os modelos q5. |\n| [llama-2-7b-langchain-chat-q5_0.gguf](https://huggingface.co/lucianosb/llama-2-7b-langchain-chat-GGUF/blob/main/llama-2-7b-langchain-chat-q5_0.gguf) | q5_0 | 5 | 4.33 GB | Quantização em 5-bit. Melhor acurácia, maior uso de recursos, inferência mais lenta. |\n| [llama-2-7b-langchain-chat-q5_1.gguf](https://huggingface.co/lucianosb/llama-2-7b-langchain-chat-GGUF/blob/main/llama-2-7b-langchain-chat-q5_1.gguf) | q5_1 | 5 | 4.72 GB | Quantização em 5-bit. Ainda Melhor acurácia, maior uso de recursos, inferência mais lenta. |\n| [llama-2-7b-langchain-chat-q8_0.gguf](https://huggingface.co/lucianosb/llama-2-7b-langchain-chat-GGUF/blob/main/llama-2-7b-langchain-chat-q8_0.gguf) | q8_0 | 8 | 6.67 GB | Quantização em 8-bit. Quase indistinguível do float16. Usa muitos recursos e é mais lento. |\n\n**Observação**: os valores de RAM acima não pressupõem descarregamento de GPU. Se as camadas forem descarregadas para a GPU, isso reduzirá o uso de RAM e usará VRAM.\n\n## Como executar com `llama.cpp`\n\nUsei o seguinte comando. Ajuste para suas necessidades:\n\n```\n./main -m ./models/llama-2-7b-langchain-chat/llama-2-7b-langchain-chat-q5_1.gguf --color --temp 0.5 -n 256 -p \"<s>[INST] Há muito tempo atrás, numa galáxia distante [/INST] Assistant Message </s>\"\n```\n\nPara compreender os parâmetros, veja [a documentação do llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)\n\n\n## Sobre o formato GGUF\n\nGGUF é um novo formato introduzido pela equipe llama.cpp em 21 de agosto de 2023. É um substituto para o GGML, que não é mais suportado pelo llama.cpp.\n\nO principal benefício do GGUF é que ele é um formato extensível e à prova de futuro que armazena mais informações sobre o modelo como metadados. Ele também inclui código de tokenização significativamente melhorado, incluindo pela primeira vez suporte total para tokens especiais. Isso deve melhorar o desempenho, especialmente com modelos que usam novos tokens especiais e implementam modelos de prompt personalizados.\n\nAqui está uma lista de clientes e bibliotecas que são conhecidos por suportar GGUF:\n\n- [llama.cpp](https://github.com/ggerganov/llama.cpp).\n- [text-generation-webui](https://github.com/oobabooga/text-generation-webui), a interface web mais amplamente utilizada. Suporta GGUF com aceleração GPU via backend ctransformers - backend llama-cpp-python deve funcionar em breve também.\n- [KoboldCpp](https://github.com/LostRuins/koboldcpp), agora suporta GGUF a partir da versão 1.41! Uma poderosa interface web GGML, com aceleração total da GPU. Especialmente bom para contar histórias.\n- [LM Studio](https://lmstudio.ai), versão 0.2.2 e posteriores suportam GGUF. Uma GUI local totalmente equipada com aceleração GPU em ambos Windows (NVidia e AMD) e macOS.\n- [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), agora deve funcionar, escolha o backend c_transformers. Uma ótima interface web com muitos recursos interessantes. Suporta aceleração GPU CUDA.\n- [ctransformers](https://github.com/marella/ctransformers), agora suporta GGUF a partir da versão 0.2.24! Uma biblioteca Python com aceleração GPU, suporte LangChain e servidor AI compatível com OpenAI.\n- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), suporta GGUF a partir da versão 0.1.79. Uma biblioteca Python com aceleração GPU, suporte LangChain e servidor API compatível com OpenAI.\n- [candle](https://github.com/huggingface/candle), adicionou suporte GGUF em 22 de agosto. Candle é um framework ML Rust com foco em desempenho, incluindo suporte GPU e facilidade de uso.\n- [LocalAI](https://github.com/go-skynet/LocalAI), adicionou suporte GGUF em 23 de agosto. LocalAI provê uma API Rest para modelos LLM e de geração de imagens.\n\n## Template\n\n````\n<s>[INST] Prompter Message [/INST] Assistant Message </s>\n````",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "text-generation",
    "pt",
    "en",
    "es",
    "ru",
    "de",
    "pl",
    "th",
    "vi",
    "sv",
    "bn",
    "da",
    "he",
    "it",
    "fa",
    "sk",
    "id",
    "nb",
    "el",
    "hu",
    "eu",
    "zh",
    "eo",
    "ja",
    "ca",
    "cs",
    "bg",
    "fi",
    "tr",
    "ro",
    "ar",
    "uk",
    "ko",
    "gl",
    "fr",
    "nl",
    "dataset:Photolens/oasst1-langchain-llama-2-formatted",
    "license:llama2",
    "region:us"
  ],
  "likes": 12,
  "downloads": 216,
  "gated": false,
  "private": false,
  "last_modified": "2023-08-29T12:44:22.000Z",
  "created_at": "2023-08-28T16:21:39.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "64ecc993130e756a7cda2c15",
  "id": "lucianosb/llama-2-7b-langchain-chat-GGUF",
  "modelId": "lucianosb/llama-2-7b-langchain-chat-GGUF",
  "sha": "8a59e8283c2d506e311888e7f305c571a78569e2",
  "createdAt": "2023-08-28T16:21:39.000Z",
  "lastModified": "2023-08-29T12:44:22.000Z",
  "author": "lucianosb",
  "downloads": 216,
  "likes": 12,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 7
}

lucianosb/llama-2-7b-langchain-chat-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard