cstr/lfm2-embed-GGUF overview
LFM2.5 Embedding 350M — CrispEmbed GGUF CrispEmbed native GGUF quantizations of LiquidAI/LFM2.5 Embedding 350M https://huggingface.co/LiquidAI/LFM2.5 Embedding…
Runs locally from ~224.3 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
Model Details
Model README
---
base_model: LiquidAI/LFM2.5-Embedding-350M
language:
- en
- de
- fr
- es
- it
- pt
- nl
- pl
- ru
- ja
- zh
license: other
license_name: lfm1.0
license_link: https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M/blob/main/LICENSE
tags:
- gguf
- embedding
- retrieval
- text-embeddings-inference
- crispembed
---
LFM2.5-Embedding-350M — CrispEmbed GGUF
CrispEmbed-native GGUF quantizations of LiquidAI/LFM2.5-Embedding-350M.
Format note: These GGUFs use CrispEmbed's internal tensor naming (lfm. prefix, arch=lfm2). They are not interchangeable with the official LiquidAI GGUFs which target llama.cpp (lfm2-bidir arch, blk. tensor naming). Use the LiquidAI GGUFs if you want llama.cpp/llama-server.
---
Files
| File | Size | Description |
|------|------|-------------|
| lfm2-embed-q8_0.gguf | 359 MB | 8-bit quantization — best accuracy, recommended |
| lfm2-embed-q4_k.gguf | 222 MB | 4-bit K-quant — 3× compression, minimal quality loss |
| lfm2-embed-f16.gguf | 678 MB | Full fp16 — reference precision |
Parity (CrispEmbed q8_0 vs HF float32 Lfm2BidirectionalModel)
| Stage | Cosine | Notes |
|-------|--------|-------|
| per-layer (all 20) | ≥ 0.9999 | measured on 3-token input via test-lfm2-diff |
| CLS embedding q8_0 | 0.9999 | 5 diverse test sentences |
| CLS embedding q4_k | 0.982 | expected q4_k quantization floor |
Model
- Architecture: 16-layer hybrid (10 ShortConv + 6 GQA attention), hidden=1024
- Pooling: CLS token (position 0) of last hidden state, L2-normalized
- Dimension: 1024
- Languages: 11 (en, de, fr, es, it, pt, nl, pl, ru, ja, zh)
- Parameters: 350M
- Task prefixes:
"query: "for queries,"document: "for passages
Usage with CrispEmbed
CLI
# Download
./crispembed --download lfm2-embed
# Embed a query (prefix auto-applied)
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf "What is the capital of France?"
# Embed a document (disable auto-prefix and supply explicitly, or use --prefix)
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf \
--prefix "document: " "Paris is the capital of France."
# JSON output for downstream use
./crispembed -m ~/.cache/crispembed/lfm2-embed-q8_0.gguf --json "query: machine learning"
Python (via crispembed Python bindings)
import crispembed
model = crispembed.load("~/.cache/crispembed/lfm2-embed-q8_0.gguf")
query_emb = model.encode("query: What is the capital of France?")
doc_emb = model.encode("document: Paris is the capital of France.")
import numpy as np
score = np.dot(query_emb, doc_emb) # both are already L2-normalized
print(f"Similarity: {score:.4f}")
Rust
use crispembed::CrispEmbed;
let model = CrispEmbed::load("lfm2-embed-q8_0.gguf")?;
let emb = model.encode("query: hello world")?;
Comparison with official LiquidAI GGUFs
| | This repo | LiquidAI/LFM2.5-Embedding-350M-GGUF |
|---|---|---|
| Runtime | CrispEmbed | llama.cpp / llama-server |
| GGUF arch tag | lfm2 | lfm2-bidir |
| Tensor naming | lfm. prefix | blk. / llama.cpp convention |
| Quantizations | f16, q8_0, q4_k | BF16, F16, Q4_0, Q4_K_M, Q5_K_M, Q6_K, Q8_0 |
| q8_0 size | 359 MB | 379 MB |
| Metal GPU | Yes (Apple Silicon) | Yes |
Conversion
Convert from the source model yourself:
git clone https://github.com/CrispStrobe/CrispEmbed
cd CrispEmbed
# Download source
python models/convert-lfm2-embed-to-gguf.py \
--model LiquidAI/LFM2.5-Embedding-350M \
--output lfm2-embed-f16.gguf --dtype f16
# Quantize
./build/crispembed-quantize lfm2-embed-f16.gguf lfm2-embed-q8_0.gguf q8_0
./build/crispembed-quantize lfm2-embed-f16.gguf lfm2-embed-q4_k.gguf q4_k
License
LFM1.0 — same as the base model.
Run cstr/lfm2-embed-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models