GitHubnpm
Configuration

Configuration

Configure vibe-hnindex through environment variables set in your MCP config file. All variables are optional with sensible defaults.

Environment Variables

Core Configuration

VariableDefaultDescription
OLLAMA_URLhttp://localhost:11434Ollama server URL
OLLAMA_MODELbge-m3:567mEmbedding model name
EMBEDDING_DIMENSIONS1024Vector size from Ollama model. Must match model output.
STORAGE_PATH~/.vibe-hnindexSQLite database directory
QDRANT_URLhttp://localhost:6333Qdrant REST URL
QDRANT_API_KEY(unset)Required for Qdrant Cloud
QDRANT_COLLECTION_PREFIXmcp_ck_Prefix for collection names

Chunking Configuration

VariableDefaultDescription
CHUNK_SIZE60Target lines per chunk
CHUNK_OVERLAP5Overlap lines between chunks
MAX_FILE_SIZE1048576Max file size in bytes (1 MB)

Indexing Configuration

VariableDefaultDescription
INDEX_WORKERSautoWorker threads for parallel indexing. auto = CPU count − 1. Set to 1 for single-thread.
INDEX_PARALLEL_BATCH8Files per worker batch. Higher = more throughput, more memory.

Search Configuration

VariableDefaultDescription
SEARCH_AUTO_ROUTEfalseWhen true, omitting mode uses auto heuristic.
SEARCH_KEYWORD_FALLBACK_SEMANTICtrueIf keyword returns nothing, run semantic search.
SEARCH_RERANK(enabled)Set to false to disable post-retrieval reorder.
SEARCH_RERANK_POOL50Max candidates in rerank pool before trimming.
SEARCH_CACHE_SIZE100Max cached search results (LRU).
SEARCH_CACHE_TTL_MS300000Cache TTL in milliseconds (5 min).
SEARCH_FUZZY_ENABLEDfalseEnable fuzzy search re-ranking by default.
SEARCH_STREAM_ENABLEDfalseEnable streaming search by default.

Rerank Configuration

VariableDefaultDescription
RERANK_URL(empty)Custom HTTP rerank endpoint. POST JSON {query, documents}{scores}.
RERANK_TIMEOUT_MS15000Timeout for rerank HTTP request.

Timeout Configuration

VariableDefaultDescription
OLLAMA_TIMEOUT_MS30000 (30s)Max wait for Ollama API calls.
QDRANT_TIMEOUT_MS15000 (15s)Max wait for Qdrant API calls.
SEARCH_TIMEOUT_MS60000 (60s)Overall search operation timeout.

.hnindexignore

Create a .hnindexignore file at the project root to exclude files and directories from indexing. Supports gitignore-style glob patterns via minimatch(*, **, /).

# .hnindexignore
node_modules
dist
build
*.min.js
*.lock
.git
__pycache__
Re-index your project after changing this file. Negation (!) is not supported.

Embedding Model Selection

vibe-hnindex supports any Ollama embedding model. Change OLLAMA_MODEL and EMBEDDING_DIMENSIONS to switch.

Model Comparison

ModelSizeDimsContextMTEB ScoreBest For
bge-m3:567m default1.2 GB10248192~63Multilingual (100+ languages), multi-vector retrieval
nomic-embed-text274 MB768819262.39Lightweight, CPU-friendly, Matryoshka dim reduction
qwen3-embedding:4b2.5 GB (Q4)32-40968192~67Best quality with GPU, instruction support
mxbai-embed-large670 MB1024512 ⚠️64.68⚠️ Short context — not recommended for code
snowflake-arctic-embed21.1 GB10248192~58Multilingual, Matryoshka, smaller than bge-m3
all-minilm46 MB384256~56Prototyping, resource-constrained

How to Switch Models

# 1. Pull the new model
ollama pull nomic-embed-text

# 2. Update MCP config env
OLLAMA_MODEL=nomic-embed-text
EMBEDDING_DIMENSIONS=768

# 3. Delete old Qdrant collection and re-index
delete_project(project_name: "my-app")
index_codebase(path: "/path/to/project", project_name: "my-app")
After switching models, you must delete and re-index projects — Qdrant collection vector size is fixed at creation time.

Recommendations

ScenarioModelReason
CPU only / low RAMnomic-embed-text274 MB, runs on CPU
Multilingual codebasebge-m3:567m100+ languages, best multilingual
GPU ≥ 8 GB VRAMqwen3-embedding:4bHighest quality with instruction support
Minimal resourcesall-minilm46 MB, instant embedding

Parallel Indexing (v0.8.0+)

INDEX_WORKERS controls parallel file indexing using worker threads. Set to auto (or 0) to use all available CPU cores minus one.

# Auto — use all available cores
export INDEX_WORKERS=auto

# Manual — use exactly 4 workers
export INDEX_WORKERS=4

# Single-threaded
export INDEX_WORKERS=1

# Larger batches
export INDEX_PARALLEL_BATCH=16

Search Cache (v0.8.0+)

Search results are cached in-memory with LRU eviction and TTL. The cache key includes project name, query, mode, limit, and filters. Cache is automatically invalidated on re-index and is not used for regex mode.

Enable Levenshtein distance-based fuzzy re-ranking to find results even with typos:

export SEARCH_FUZZY_ENABLED=true

Or enable per-query: fuzzy: true in the search arguments.

Streaming search runs keyword + semantic in parallel for faster results:

export SEARCH_STREAM_ENABLED=true

Provides 4-phase progress notifications and early result preview.

Optional Rerank

After retrieval, vibe-hnindex can reorder results. Without RERANK_URL, it uses Qdrant semantic scores. With RERANK_URL, it sends results to your custom HTTP endpoint for finer ranking (e.g., cross-encoder).

Ollama does not provide a rerank endpoint. If you use an Ollama-hosted reranker model, you need a proxy that translates the {query, documents}{scores} contract.