Documentation — vibe-hnindex

Configuration

Configure vibe-hnindex through environment variables set in your MCP config file. All variables are optional with sensible defaults.

Environment Variables

Core Configuration

Variable	Default	Description
`OLLAMA_URL`	`http://localhost:11434`	Ollama server URL
`OLLAMA_MODEL`	`bge-m3:567m`	Embedding model name
`EMBEDDING_DIMENSIONS`	`1024`	Vector size from Ollama model. Must match model output.
`STORAGE_PATH`	`~/.vibe-hnindex`	SQLite database directory
`QDRANT_URL`	`http://localhost:6333`	Qdrant REST URL
`QDRANT_API_KEY`	(unset)	Required for Qdrant Cloud
`QDRANT_COLLECTION_PREFIX`	`mcp_ck_`	Prefix for collection names

Chunking Configuration

Variable	Default	Description
`CHUNK_SIZE`	`60`	Target lines per chunk
`CHUNK_OVERLAP`	`5`	Overlap lines between chunks
`MAX_FILE_SIZE`	`1048576`	Max file size in bytes (1 MB)

Indexing Configuration

Variable	Default	Description
`INDEX_WORKERS`	`auto`	Worker threads for parallel indexing. `auto` = CPU count − 1. Set to `1` for single-thread.
`INDEX_PARALLEL_BATCH`	`8`	Files per worker batch. Higher = more throughput, more memory.

Search Configuration

Variable	Default	Description
`SEARCH_AUTO_ROUTE`	`false`	When `true`, omitting `mode` uses `auto` heuristic.
`SEARCH_KEYWORD_FALLBACK_SEMANTIC`	`true`	If keyword returns nothing, run semantic search.
`SEARCH_RERANK`	(enabled)	Set to `false` to disable post-retrieval reorder.
`SEARCH_RERANK_POOL`	`50`	Max candidates in rerank pool before trimming.
`SEARCH_CACHE_SIZE`	`100`	Max cached search results (LRU).
`SEARCH_CACHE_TTL_MS`	`300000`	Cache TTL in milliseconds (5 min).
`SEARCH_FUZZY_ENABLED`	`false`	Enable fuzzy search re-ranking by default.
`SEARCH_STREAM_ENABLED`	`false`	Enable streaming search by default.

Rerank Configuration

Variable	Default	Description
`RERANK_URL`	(empty)	Custom HTTP rerank endpoint. POST JSON `{query, documents}` → `{scores}`.
`RERANK_TIMEOUT_MS`	`15000`	Timeout for rerank HTTP request.

Timeout Configuration

Variable	Default	Description
`OLLAMA_TIMEOUT_MS`	`30000` (30s)	Max wait for Ollama API calls.
`QDRANT_TIMEOUT_MS`	`15000` (15s)	Max wait for Qdrant API calls.
`SEARCH_TIMEOUT_MS`	`60000` (60s)	Overall search operation timeout.

.hnindexignore

Create a .hnindexignore file at the project root to exclude files and directories from indexing. Supports gitignore-style glob patterns via minimatch(*, **, /).

# .hnindexignore
node_modules
dist
build
*.min.js
*.lock
.git
__pycache__

Re-index your project after changing this file. Negation (!) is not supported.

Embedding Model Selection

vibe-hnindex supports any Ollama embedding model. Change OLLAMA_MODEL and EMBEDDING_DIMENSIONS to switch.

Model Comparison

Model	Size	Dims	Context	MTEB Score	Best For
`bge-m3:567m` default	1.2 GB	1024	8192	~63	Multilingual (100+ languages), multi-vector retrieval
`nomic-embed-text`	274 MB	768	8192	62.39	Lightweight, CPU-friendly, Matryoshka dim reduction
`qwen3-embedding:4b`	2.5 GB (Q4)	32-4096	8192	~67	Best quality with GPU, instruction support
`mxbai-embed-large`	670 MB	1024	512 ⚠️	64.68	⚠️ Short context — not recommended for code
`snowflake-arctic-embed2`	1.1 GB	1024	8192	~58	Multilingual, Matryoshka, smaller than bge-m3
`all-minilm`	46 MB	384	256	~56	Prototyping, resource-constrained

How to Switch Models

# 1. Pull the new model
ollama pull nomic-embed-text

# 2. Update MCP config env
OLLAMA_MODEL=nomic-embed-text
EMBEDDING_DIMENSIONS=768

# 3. Delete old Qdrant collection and re-index
delete_project(project_name: "my-app")
index_codebase(path: "/path/to/project", project_name: "my-app")

After switching models, you must delete and re-index projects — Qdrant collection vector size is fixed at creation time.

Recommendations

Scenario	Model	Reason
CPU only / low RAM	`nomic-embed-text`	274 MB, runs on CPU
Multilingual codebase	`bge-m3:567m`	100+ languages, best multilingual
GPU ≥ 8 GB VRAM	`qwen3-embedding:4b`	Highest quality with instruction support
Minimal resources	`all-minilm`	46 MB, instant embedding

Parallel Indexing (v0.8.0+)

INDEX_WORKERS controls parallel file indexing using worker threads. Set to auto (or 0) to use all available CPU cores minus one.

# Auto — use all available cores
export INDEX_WORKERS=auto

# Manual — use exactly 4 workers
export INDEX_WORKERS=4

# Single-threaded
export INDEX_WORKERS=1

# Larger batches
export INDEX_PARALLEL_BATCH=16

Search Cache (v0.8.0+)

Search results are cached in-memory with LRU eviction and TTL. The cache key includes project name, query, mode, limit, and filters. Cache is automatically invalidated on re-index and is not used for regex mode.

Fuzzy Search (v0.8.1+)

Enable Levenshtein distance-based fuzzy re-ranking to find results even with typos:

export SEARCH_FUZZY_ENABLED=true

Or enable per-query: fuzzy: true in the search arguments.

Streaming Search (v0.9.0+)

Streaming search runs keyword + semantic in parallel for faster results:

export SEARCH_STREAM_ENABLED=true

Provides 4-phase progress notifications and early result preview.

Optional Rerank

After retrieval, vibe-hnindex can reorder results. Without RERANK_URL, it uses Qdrant semantic scores. With RERANK_URL, it sends results to your custom HTTP endpoint for finer ranking (e.g., cross-encoder).

Ollama does not provide a rerank endpoint. If you use an Ollama-hosted reranker model, you need a proxy that translates the {query, documents} → {scores} contract.

PreviousQuick Start

NextSearch