pplx-embed-v1: Diffusion-LM for Dense and Contextual Retrieval

pplx-embed-v1 and pplx-embed-context-v1 are state-of-the-art text embedding models optimized for real-world, web-scale retrieval tasks.

Use pplx-embed-v1 for independent text embedding (queries, documents, semantic search)
Use pplx-embed-context-v1 for document chunks in RAG systems where surrounding context matters

pplx-embed-v1 and pplx-embed-context-v1 natively produce unnormalized int8-quantized embeddings. Ensure that you compare them via cosine similarity.

Models

Model	Dimensions	Context	MRL	Quantization	Instruction	Pooling
`pplx-embed-v1-0.6B`	1024	32K	Yes	INT8/BINARY	No	Mean
`pplx-embed-v1-4B`	2560	32K	Yes	INT8/BINARY	No	Mean
`pplx-embed-context-v1-0.6B`	1024	32K	Yes	INT8/BINARY	No	Mean
`pplx-embed-context-v1-4B`	2560	32K	Yes	INT8/BINARY	No	Mean

_{All models are built on diffusion continued pre-trained Qwen3 at Perplexity AI.}

_{Many modern embedding models rely on instruction tuning, where users prepend an instruction string to the text being embedded. This can yield a 2%-3% lift on benchmarks, but it also introduces prompt-selection overhead and can make indexing pipelines brittle (small instruction changes can shift embedding space). We deliberately avoid this requirement: you can embed the text you want to index directly, without having to choose or maintain an instruction prefix.}

Usage

Via API

curl -X POST https://api.perplexity.ai/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
      "Scientists explore the universe driven by curiosity.",
      "Children learn through curious exploration.",
      "Historical discoveries began with curious questions.",
      "Animals use curiosity to adapt and survive.",
      "Philosophy examines the nature of curiosity."
    ],
    "model": "pplx-embed-v1-0.6b"
  }'

Using SentenceTransformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "perplexity-ai/pplx-embed-v1-0.6B",
    trust_remote_code=True
)

texts = [
    "Scientists explore the universe driven by curiosity.",
    "Children learn through curious exploration.",
    "Historical discoveries began with curious questions.",
    "Animals use curiosity to adapt and survive.",
    "Philosophy examines the nature of curiosity.",
]

embeddings = model.encode(texts) # Shape: (5, 1024), quantized to int8
embeddings = model.encode(texts, quantization="binary") # Shape: (5, 1024), quantized to binary

Using ONNX models


import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("perplexity-ai/pplx-embed-v1-0.6b", trust_remote_code=True)
session = ort.InferenceSession("onnx/model.onnx")


texts = [
    "Scientists explore the universe driven by curiosity.",
    "Children learn through curious exploration.",
    "Historical discoveries began with curious questions.",
    "Animals use curiosity to adapt and survive.",
    "Philosophy examines the nature of curiosity.",
]

tokenized = tokenizer(
    texts,
    padding=True,
    truncation=True,
    return_tensors="np"
)

onnx_inputs = {
    "input_ids": tokenized["input_ids"].astype(np.int64),
    "attention_mask": tokenized["attention_mask"].astype(np.int64),
}

# Run inference
onnx_embeddings = session.run([out.name for out in session.get_outputs()], onnx_inputs)

# ONNX produces both int8 and binary precision embeddings:
int8_embeddings = onnx_embeddings[2]
binary_embeddings = onnx_embeddings[3]
packed_embeddings = np.packbits(binary_embeddings != -1, axis=-1)

Technical Details

For comprehensive technical details and evaluation results, see our paper on arXiv: https://arxiv.org/abs/2602.11151.

Downloads last month: 968

Safetensors

Model size

0.6B params

Tensor type

F32

Collection including perplexity-ai/pplx-embed-v1-0.6b

pplx-embed

Collection

Diffusion-LM for Dense and Contextual Retrieval • 7 items • Updated about 6 hours ago • 15

Paper for perplexity-ai/pplx-embed-v1-0.6b

Diffusion-Pretrained Dense and Contextual Embeddings

Paper • 2602.11151 • Published 2 days ago