GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

cstr/lfm2-embed-GGUF overview

LFM2.5 Embedding 350M — CrispEmbed GGUF CrispEmbed native GGUF quantizations of LiquidAI/LFM2.5 Embedding 350M https://huggingface.co/LiquidAI/LFM2.5 Embedding…

ggufembeddingretrievaltext-embeddings-inferencecrispembedendefresitptnlplrujazhbase_model:LiquidAI/LFM2.5-Embedding-350Mbase_model:quantized:LiquidAI/LFM2.5-Embedding-350Mlicense:otherregion:us

Runs locally from ~224.3 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
Author

Repository Files & Downloads

3 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
lfm2-embed-f16.ggufGGUFF16678.1 MBDownload
lfm2-embed-q4_k.ggufGGUFQ4_K224.3 MBDownload
lfm2-embed-q8_0.ggufGGUFQ8_0361.3 MBDownload

Model Details

Model IDcstr/lfm2-embed-GGUF
Authorcstr
Pipeline
Licenseother
Base modelLiquidAI/LFM2.5-Embedding-350M
Last modified2026-06-18T17:04:42.000Z

Model README

---

base_model: LiquidAI/LFM2.5-Embedding-350M

language:

- en

- de

- fr

- es

- it

- pt

- nl

- pl

- ru

- ja

- zh

license: other

license_name: lfm1.0

license_link: https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M/blob/main/LICENSE

tags:

- gguf

- embedding

- retrieval

- text-embeddings-inference

- crispembed

---

LFM2.5-Embedding-350M — CrispEmbed GGUF

CrispEmbed-native GGUF quantizations of LiquidAI/LFM2.5-Embedding-350M.

Format note: These GGUFs use CrispEmbed's internal tensor naming (lfm. prefix, arch=lfm2). They are not interchangeable with the official LiquidAI GGUFs which target llama.cpp (lfm2-bidir arch, blk. tensor naming). Use the LiquidAI GGUFs if you want llama.cpp/llama-server.

---

Files

| File | Size | Description |

|------|------|-------------|

| lfm2-embed-q8_0.gguf | 359 MB | 8-bit quantization — best accuracy, recommended |

| lfm2-embed-q4_k.gguf | 222 MB | 4-bit K-quant — 3× compression, minimal quality loss |

| lfm2-embed-f16.gguf | 678 MB | Full fp16 — reference precision |

Parity (CrispEmbed q8_0 vs HF float32 Lfm2BidirectionalModel)

| Stage | Cosine | Notes |

|-------|--------|-------|

| per-layer (all 20) | ≥ 0.9999 | measured on 3-token input via test-lfm2-diff |

| CLS embedding q8_0 | 0.9999 | 5 diverse test sentences |

| CLS embedding q4_k | 0.982 | expected q4_k quantization floor |

Model

  • Architecture: 16-layer hybrid (10 ShortConv + 6 GQA attention), hidden=1024
  • Pooling: CLS token (position 0) of last hidden state, L2-normalized
  • Dimension: 1024
  • Languages: 11 (en, de, fr, es, it, pt, nl, pl, ru, ja, zh)
  • Parameters: 350M
  • Task prefixes: "query: " for queries, "document: " for passages

Usage with CrispEmbed

CLI

# Download
./crispembed --download lfm2-embed

# Embed a query (prefix auto-applied)
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf "What is the capital of France?"

# Embed a document (disable auto-prefix and supply explicitly, or use --prefix)
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf \
  --prefix "document: " "Paris is the capital of France."

# JSON output for downstream use
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf --json "query: machine learning"

Python (via crispembed Python bindings)

import crispembed

model = crispembed.load("~/.cache/crispembed/lfm2-embed-q8_0.gguf")

query_emb = model.encode("query: What is the capital of France?")
doc_emb   = model.encode("document: Paris is the capital of France.")

import numpy as np
score = np.dot(query_emb, doc_emb)  # both are already L2-normalized
print(f"Similarity: {score:.4f}")

Rust

use crispembed::CrispEmbed;

let model = CrispEmbed::load("lfm2-embed-q8_0.gguf")?;
let emb = model.encode("query: hello world")?;

Comparison with official LiquidAI GGUFs

| | This repo | LiquidAI/LFM2.5-Embedding-350M-GGUF |

|---|---|---|

| Runtime | CrispEmbed | llama.cpp / llama-server |

| GGUF arch tag | lfm2 | lfm2-bidir |

| Tensor naming | lfm. prefix | blk. / llama.cpp convention |

| Quantizations | f16, q8_0, q4_k | BF16, F16, Q4_0, Q4_K_M, Q5_K_M, Q6_K, Q8_0 |

| q8_0 size | 359 MB | 379 MB |

| Metal GPU | Yes (Apple Silicon) | Yes |

Conversion

Convert from the source model yourself:

git clone https://github.com/CrispStrobe/CrispEmbed
cd CrispEmbed

# Download source
python models/convert-lfm2-embed-to-gguf.py \
    --model LiquidAI/LFM2.5-Embedding-350M \
    --output lfm2-embed-f16.gguf --dtype f16

# Quantize
./build/crispembed-quantize lfm2-embed-f16.gguf lfm2-embed-q8_0.gguf q8_0
./build/crispembed-quantize lfm2-embed-f16.gguf lfm2-embed-q4_k.gguf q4_k

License

LFM1.0 — same as the base model.

Run cstr/lfm2-embed-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models