🦀 RustyPageIndex

Rusty Page Indexer Cover

RustyPageIndex is a high-performance Rust implementation of the PageIndex pattern. It transforms complex documents into hierarchical "Table-of-Contents" (TOC) trees for vectorless, reasoning-based RAG.

This project is inspired by VectifyAI/PageIndex but has diverged significantly with multi-repo support, parallel processing, and a unified tree architecture.

🚀 Key Features

Performance

Parallel Indexing: Uses Rayon for parallel file parsing (238 files in ~0.04s)
Rust-Native Parsing: pdf-extract and pulldown-cmark for fast document processing
Incremental Updates: Hash-based caching skips unchanged files

Multi-Repository Support

Index multiple repos: Each indexed folder is tracked separately
Query across all: Search spans all indexed repositories by default
Manage indices: List, filter, and clean up indices easily

Unified Tree Architecture

Folder → File → Section hierarchy preserves document structure
Single tree per repo: Efficient storage and navigation
Smart search: Auto-unwraps folder roots for better LLM context

🔄 Divergence from Original PageIndex

Feature	Original PageIndex	RustyPageIndex
Language	Python	Rust
Indexing	Per-file indices	Unified folder tree
Multi-repo	Not supported	Full support with `list`/`clean`
Parallelism	Sequential	Rayon parallel processing
Storage	Cloud-based (MCP)	Local filesystem
Tree Structure	Flat sections	Folder → File → Section hierarchy
Headerless Markdown	Empty tree	Auto-creates "Document" node

🛠️ Getting Started

Installation

One-liner Install (Unix/macOS):

curl -fsSL https://raw.githubusercontent.com/Algiras/rusty-pageindex/main/install.sh | bash

One-liner Install (Windows PowerShell):

irm https://raw.githubusercontent.com/Algiras/rusty-pageindex/main/install.ps1 | iex

Via Cargo:

cargo install rusty-page-indexer

🧙 Use as an Agent Skill

npx skills add https://github.com/Algiras/rusty-pageindex --skill rusty-page-indexer

🔑 Authentication

# For OpenAI
rusty-page-indexer auth --api-key "your-key-here"

# For Ollama (local LLM)
rusty-page-indexer auth --api-key "ollama" --api-base "http://localhost:11434/v1" --model "llama3.2"

🌲 Usage

Indexing Documents

# Index a repository
rusty-page-indexer index ./my-project

# Index with LLM-generated summaries
rusty-page-indexer index ./my-project --enrich

# Force re-index (ignores cache)
rusty-page-indexer index ./my-project --force

# Preview what would be indexed
rusty-page-indexer index ./my-project --dry-run

Managing Multiple Repositories

# Index multiple repos
rusty-page-indexer index ./repo-a
rusty-page-indexer index ./repo-b

# List all indexed repositories
rusty-page-indexer list

# Example output:
# 📋 Indexed Repositories
# ────────────────────────────────────────────────────────────
#   📁 repo-a (125.3 KB)
#      /Users/you/projects/repo-a
#   📁 repo-b (89.7 KB)
#      /Users/you/projects/repo-b
# ────────────────────────────────────────────────────────────
# Total: 2 indices

Querying

# Search across ALL indexed repositories
rusty-page-indexer query "how does authentication work"

# Search within a specific repository
rusty-page-indexer query "kafka messaging" --path repo-a

Cleanup

# Remove a specific index
rusty-page-indexer clean repo-a

# Remove all indices
rusty-page-indexer clean --all

Status Information

rusty-page-indexer info

🤖 Model Compatibility

OpenAI Models (Remote)

Model	Cost	Speed	Notes
`gpt-4o`	$$$	Fast	Best accuracy, recommended for complex queries
`gpt-4o-mini`	$	Very Fast	Great balance of cost and quality ⭐
`gpt-4.1-mini`	$	Very Fast	Latest mini variant
`gpt-4-turbo`	$$	Fast	Good for detailed reasoning
`gpt-3.5-turbo`	¢	Very Fast	Budget option, decent accuracy

# Configure for OpenAI
rusty-page-indexer auth --api-key "sk-..." --model "gpt-4o-mini"

# Override model per query
rusty-page-indexer query "question" --model gpt-4o

Local Models (Ollama)

Model	Size	Works	Notes
`gemma3:1b`	1B	✅	Minimum recommended for local use
`llama3.2:latest`	3B	✅	Good balance of speed and accuracy ⭐
`qwen2.5:7b`	7B	✅	Reliable, slightly conservative
`llama3.1:latest`	8B	✅	Excellent accuracy
`mistral:7b`	7B	✅	Fast and capable
`phi3:mini`	3.8B	✅	Microsoft's compact model
`qwen2.5:0.5b`	0.5B	❌	Too small, unreliable responses
`tinyllama:1.1b`	1.1B	❌	Doesn't follow output format

...

algiras/rusty-pageindex

README