Configuration
Configure vibe-hnindex through environment variables set in your MCP config file. All variables are optional with sensible defaults.
Environment Variables
Core Configuration
| Variable | Default | Description |
|---|---|---|
OLLAMA_URL | http://localhost:11434 | Ollama server URL |
OLLAMA_MODEL | bge-m3:567m | Embedding model name |
EMBEDDING_DIMENSIONS | 1024 | Vector size from Ollama model. Must match model output. |
STORAGE_PATH | ~/.vibe-hnindex | SQLite database directory |
QDRANT_URL | http://localhost:6333 | Qdrant REST URL |
QDRANT_API_KEY | (unset) | Required for Qdrant Cloud |
QDRANT_COLLECTION_PREFIX | mcp_ck_ | Prefix for collection names |
Chunking Configuration
| Variable | Default | Description |
|---|---|---|
CHUNK_SIZE | 60 | Target lines per chunk |
CHUNK_OVERLAP | 5 | Overlap lines between chunks |
MAX_FILE_SIZE | 1048576 | Max file size in bytes (1 MB) |
Indexing Configuration
| Variable | Default | Description |
|---|---|---|
INDEX_WORKERS | auto | Worker threads for parallel indexing. auto = CPU count − 1. Set to 1 for single-thread. |
INDEX_PARALLEL_BATCH | 8 | Files per worker batch. Higher = more throughput, more memory. |
Search Configuration
| Variable | Default | Description |
|---|---|---|
SEARCH_AUTO_ROUTE | false | When true, omitting mode uses auto heuristic. |
SEARCH_KEYWORD_FALLBACK_SEMANTIC | true | If keyword returns nothing, run semantic search. |
SEARCH_RERANK | (enabled) | Set to false to disable post-retrieval reorder. |
SEARCH_RERANK_POOL | 50 | Max candidates in rerank pool before trimming. |
SEARCH_CACHE_SIZE | 100 | Max cached search results (LRU). |
SEARCH_CACHE_TTL_MS | 300000 | Cache TTL in milliseconds (5 min). |
SEARCH_FUZZY_ENABLED | false | Enable fuzzy search re-ranking by default. |
SEARCH_STREAM_ENABLED | false | Enable streaming search by default. |
Rerank Configuration
| Variable | Default | Description |
|---|---|---|
RERANK_URL | (empty) | Custom HTTP rerank endpoint. POST JSON {query, documents} → {scores}. |
RERANK_TIMEOUT_MS | 15000 | Timeout for rerank HTTP request. |
Timeout Configuration
| Variable | Default | Description |
|---|---|---|
OLLAMA_TIMEOUT_MS | 30000 (30s) | Max wait for Ollama API calls. |
QDRANT_TIMEOUT_MS | 15000 (15s) | Max wait for Qdrant API calls. |
SEARCH_TIMEOUT_MS | 60000 (60s) | Overall search operation timeout. |
.hnindexignore
Create a .hnindexignore file at the project root to exclude files and directories from indexing. Supports gitignore-style glob patterns via minimatch(*, **, /).
# .hnindexignore
node_modules
dist
build
*.min.js
*.lock
.git
__pycache__Re-index your project after changing this file. Negation (!) is not supported.Embedding Model Selection
vibe-hnindex supports any Ollama embedding model. Change OLLAMA_MODEL and EMBEDDING_DIMENSIONS to switch.
Model Comparison
| Model | Size | Dims | Context | MTEB Score | Best For |
|---|---|---|---|---|---|
bge-m3:567m default | 1.2 GB | 1024 | 8192 | ~63 | Multilingual (100+ languages), multi-vector retrieval |
nomic-embed-text | 274 MB | 768 | 8192 | 62.39 | Lightweight, CPU-friendly, Matryoshka dim reduction |
qwen3-embedding:4b | 2.5 GB (Q4) | 32-4096 | 8192 | ~67 | Best quality with GPU, instruction support |
mxbai-embed-large | 670 MB | 1024 | 512 ⚠️ | 64.68 | ⚠️ Short context — not recommended for code |
snowflake-arctic-embed2 | 1.1 GB | 1024 | 8192 | ~58 | Multilingual, Matryoshka, smaller than bge-m3 |
all-minilm | 46 MB | 384 | 256 | ~56 | Prototyping, resource-constrained |
How to Switch Models
# 1. Pull the new model
ollama pull nomic-embed-text
# 2. Update MCP config env
OLLAMA_MODEL=nomic-embed-text
EMBEDDING_DIMENSIONS=768
# 3. Delete old Qdrant collection and re-index
delete_project(project_name: "my-app")
index_codebase(path: "/path/to/project", project_name: "my-app")After switching models, you must delete and re-index projects — Qdrant collection vector size is fixed at creation time.
Recommendations
| Scenario | Model | Reason |
|---|---|---|
| CPU only / low RAM | nomic-embed-text | 274 MB, runs on CPU |
| Multilingual codebase | bge-m3:567m | 100+ languages, best multilingual |
| GPU ≥ 8 GB VRAM | qwen3-embedding:4b | Highest quality with instruction support |
| Minimal resources | all-minilm | 46 MB, instant embedding |
Parallel Indexing (v0.8.0+)
INDEX_WORKERS controls parallel file indexing using worker threads. Set to auto (or 0) to use all available CPU cores minus one.
# Auto — use all available cores
export INDEX_WORKERS=auto
# Manual — use exactly 4 workers
export INDEX_WORKERS=4
# Single-threaded
export INDEX_WORKERS=1
# Larger batches
export INDEX_PARALLEL_BATCH=16Search Cache (v0.8.0+)
Search results are cached in-memory with LRU eviction and TTL. The cache key includes project name, query, mode, limit, and filters. Cache is automatically invalidated on re-index and is not used for regex mode.
Fuzzy Search (v0.8.1+)
Enable Levenshtein distance-based fuzzy re-ranking to find results even with typos:
export SEARCH_FUZZY_ENABLED=trueOr enable per-query: fuzzy: true in the search arguments.
Streaming Search (v0.9.0+)
Streaming search runs keyword + semantic in parallel for faster results:
export SEARCH_STREAM_ENABLED=trueProvides 4-phase progress notifications and early result preview.
Optional Rerank
After retrieval, vibe-hnindex can reorder results. Without RERANK_URL, it uses Qdrant semantic scores. With RERANK_URL, it sends results to your custom HTTP endpoint for finer ranking (e.g., cross-encoder).
Ollama does not provide a rerank endpoint. If you use an Ollama-hosted reranker model, you need a proxy that translates the{query, documents}→{scores}contract.