What is cstr/lfm2-colbert-GGUF?

--- license: other license_name: lfm-1.0 license_link: https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE base_model: LiquidAI/LFM2.5-ColBERT-350M tags: - colbert - retrieval - multi-vector - late-interaction - gguf - crispembed - ggml language: - en - es - de - fr - it - pt - ar - sv - "no" - ja - ko --- # LFM2.5-ColBERT-350M — CrispEmbed GGUF CrispEmbed-native GGUF quantizations of [LiquidAI/LFM2.5-ColBERT-350M](https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M). Multi-vector (ColBERT-style) retrieval: per-token embeddings projected to 128 dimensions, L2-normalized. Uses late interaction (MaxSim) scoring for fine-grained token-level matching. **Format note:** These GGUFs use CrispEmbed's internal tensor naming (`lfm.*` prefix, arch=`lfm2`). They include the `colbert.projection.weight` tensor from the `1_Dense` module. **Not** compatible with llama.cpp. ## Model variants |…

What license applies to cstr/lfm2-colbert-GGUF?

License: other. Verify terms on Hugging Face before commercial use.

How do I run cstr/lfm2-colbert-GGUF locally?

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: text-generation.

Model Intelligence Sheet

cstr/lfm2-colbert-GGUF overview

LFM2.5 ColBERT 350M — CrispEmbed GGUF CrispEmbed native GGUF quantizations of LiquidAI/LFM2.5 ColBERT 350M https://huggingface.co/LiquidAI/LFM2.5 ColBERT 350M …

ggufcolbertretrievalmulti-vectorlate-interactioncrispembedggmlenesdefritptarsvnojakobase_model:LiquidAI/LFM2.5-ColBERT-350Mbase_model:quantized:LiquidAI/LFM2.5-ColBERT-350Mlicense:otherregion:us

Runs locally from ~223.1 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads

Likes

Pipeline

—

Author

cstr

Repository Files & Downloads

4 GGUF files detected

Direct downloads for local inference

File	Type	Quantization	Size	Link
lfm2-colbert-f32.gguf	GGUF	F32	676.2 MB	Download
lfm2-colbert-q4_k.gguf	GGUF	Q4_K	223.1 MB	Download
lfm2-colbert-q5_k.gguf	GGUF	Q5_K	257.4 MB	Download
lfm2-colbert-q8_0.gguf	GGUF	Q8_0	360.2 MB	Download

Model Details

Model ID	cstr/lfm2-colbert-GGUF
Author	cstr
Pipeline	—
License	other
Base model	LiquidAI/LFM2.5-ColBERT-350M
Last modified	2026-06-19T07:18:01.000Z

Model README

---

license: other

license_name: lfm-1.0

license_link: https://huggingface.co/LiquidAI/LFM2.5-350M/blob/main/LICENSE

base_model: LiquidAI/LFM2.5-ColBERT-350M

tags:

- colbert

- retrieval

- multi-vector

- late-interaction

- gguf

- crispembed

- ggml

language:

- en

- es

- de

- fr

- it

- pt

- ar

- sv

- "no"

- ja

- ko

---

LFM2.5-ColBERT-350M — CrispEmbed GGUF

CrispEmbed-native GGUF quantizations of LiquidAI/LFM2.5-ColBERT-350M.

Multi-vector (ColBERT-style) retrieval: per-token embeddings projected to 128 dimensions, L2-normalized. Uses late interaction (MaxSim) scoring for fine-grained token-level matching.

Format note: These GGUFs use CrispEmbed's internal tensor naming (lfm.* prefix, arch=lfm2). They include the colbert.projection.weight tensor from the 1_Dense module. Not compatible with llama.cpp.

Model variants

|------|-------|------|--------------------|

| lfm2-colbert-f32.gguf | F32 | 677 MB | 0.999995 |

| lfm2-colbert-q8_0.gguf | Q8_0 | 361 MB | 0.998 |

| lfm2-colbert-q5_k.gguf | Q5_K | 258 MB | 0.977 |

| lfm2-colbert-q4_k.gguf | Q4_K | 224 MB | 0.959 |

Architecture

Backbone: LFM2.5-350M bidirectional hybrid (16 layers: 10 ShortConv + 6 GQA attention, 1024-dim hidden, SwiGLU FFN)
ColBERT head: Linear(1024, 128) + L2 normalize per token
Scoring: MaxSim — max over doc tokens of cosine similarity per query token, summed
Parameters: 350M + 128K projection head
Languages: EN, ES, DE, FR, IT, PT, AR, SV, NO, JA, KO (11 languages)
Task prefixes: "query: " for queries, "document: " for passages

Usage

# ColBERT multi-vector encode
./crispembed -m lfm2-colbert-q8_0.gguf --colbert "query: what is deep learning?"

# JSON output (per-token vectors)
./crispembed -m lfm2-colbert-q8_0.gguf --colbert --json "query: machine learning"

# Server
./crispembed-server --embed lfm2-colbert-q8_0.gguf --port 8080
curl -X POST http://localhost:8080/colbert/score \
  -d '{"query": "what is deep learning?", "documents": ["Deep learning is a subset of ML", "The weather is nice"]}'

from crispembed import CrispVit

model = CrispVit("lfm2-colbert-q8_0.gguf")
assert model.has_colbert

# Encode multi-vector representations
query_vecs = model.encode_multivec("query: what is deep learning?")   # (n_tokens, 128)
doc_vecs = model.encode_multivec("document: Deep learning uses neural networks")

# MaxSim scoring
score = model.maxsim(query_vecs, doc_vecs)
print(f"Score: {score:.4f}")

use crispembed::CrispEmbed;

let mut model = CrispEmbed::new("lfm2-colbert-q8_0.gguf", 4)?;
assert!(model.has_colbert());

let query = model.encode_multivec("query: what is deep learning?");
let doc = model.encode_multivec("document: Neural networks learn representations");

Conversion

Convert from the source model yourself:

git clone https://github.com/CrispStrobe/CrispEmbed
cd CrispEmbed

# Convert (loads 1_Dense/model.safetensors for ColBERT projection)
python models/convert-lfm2-embed-to-gguf.py \
    --model LiquidAI/LFM2.5-ColBERT-350M \
    --output lfm2-colbert-f32.gguf --dtype f32

# Quantize
./build/crispembed-quantize lfm2-colbert-f32.gguf lfm2-colbert-q8_0.gguf q8_0
./build/crispembed-quantize lfm2-colbert-f32.gguf lfm2-colbert-q5_k.gguf q5_k
./build/crispembed-quantize lfm2-colbert-f32.gguf lfm2-colbert-q4_k.gguf q4_k

License

LFM Open License v1.0 — same as the base model.

Credits

Original model by LiquidAI. GGUF conversion and inference engine by CrispEmbed.

Run cstr/lfm2-colbert-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models