Skip to content

scripts/indexers

scripts.indexers.__init__

🧠 Docstring Summary

Section Content
Description The indexers module provides classes and utilities for building, managing, and searching vector indexes over log and summary data.
Core features include:
- Construction of FAISS indexes for both raw log entries and summarized corrections.
- Support for semantic search using SentenceTransformer embeddings.
- Management of index and metadata persistence for efficient retrieval.
- Utilities for rebuilding, updating, and searching indexes across different data granularities.
This module enables fast and flexible semantic search over structured and unstructured idea logs, supporting downstream applications such as idea retrieval, analytics, and intelligent querying.
Args
Returns

scripts.indexers.base_indexer

🧠 Docstring Summary

Section Content
Description base_indexer.py
This module defines the BaseIndexer class, which provides core functionality for building, saving, loading, and searching FAISS vector indexes over log and summary data.
Core features include:
- Initializing index and metadata paths based on project configuration and index type (summary or raw).
- Building a FAISS index from text data using SentenceTransformer embeddings.
- Saving and loading both the FAISS index and associated metadata.
- Performing semantic search over indexed data, returning the most relevant results with similarity scores.
- Supporting flexible configuration and robust error handling for index operations.
Intended for use as a base class for specialized indexers in the Zephyrus project, enabling fast and flexible semantic search over structured logs and summaries.
Args
Returns

📦 Classes

BaseIndexer

No description available. Parameters: ['self: Any', 'paths: ZephyrusPaths', 'index_name: str'] Returns: None

🛠️ Functions

__init__

Initializes the BaseIndexer object. Sets the paths to the summaries file, FAISS index file, and metadata file based on the provided index_name and the ZephyrusPaths object. If index_name is "summary", the paths are set to the correction summaries file, FAISS index file, and metadata file. If index_name is "raw", the paths are set to the JSON log file, raw log index file, and raw log metadata file. In all other cases, a ValueError is raised. Also, loads the SentenceTransformer model specified by the "embedding_model" configuration key, or defaults to "all-MiniLM-L6-v2" if the key is missing. Parameters: ['self: Any', 'paths: ZephyrusPaths', 'index_name: str'] Returns: None

_load_model

No description available. Parameters: ['self: Any'] Returns: Any

load_index

Loads the FAISS index and associated metadata from their respective files. This method reads the index from the file specified by self.index_path and loads the metadata from the file specified by self.metadata_path. If either file does not exist, a FileNotFoundError is raised. Raises: FileNotFoundError: If the index file or metadata file is not found. Parameters: ['self: Any'] Returns: None

Searches the FAISS index for the given query and returns the top-k most relevant results. Parameters: ['self: Any', 'query: str', 'top_k: int'] Returns: List[Dict[str, Any]]

build_index

Builds a FAISS index from provided texts and metadata. Parameters: ['self: Any', 'texts: List[str]', 'meta: List[Dict[str, Any]]', 'fail_on_empty: bool'] Returns: bool

save_index

Saves the FAISS index to a file, and the associated metadata. This method must be called after build_index or load_index has been called. Parameters: ['self: Any'] Returns: None

scripts.indexers.raw_log_indexer

🧠 Docstring Summary

Section Content
Description This module defines the RawLogIndexer class for building and managing a FAISS vector index
over raw log entries from zephyrus_log.json.
Core features:
- Loading and parsing raw log entries by date, main category, and subcategory.
- Extracting entry content and metadata for semantic indexing.
- Building, saving, loading, and rebuilding a FAISS index for full-text vector search.
- Robust error handling and logging for file I/O and data processing.
- Designed for use in the Zephyrus project to enable fast, flexible semantic search.
Args
Returns

📦 Classes

RawLogIndexer

Builds a FAISS index from raw entries in zephyrus_log.json. Used for full-text vector search across all logged ideas (not just summaries). Attributes: log_path (str): The path to the JSON log file. Parameters: ['self: Any', 'paths: ZephyrusPaths', 'autoload: bool'] Returns: None

🛠️ Functions

__init__

Initializes the RawLogIndexer with the specified paths and optionally loads the index. Parameters: ['self: Any', 'paths: ZephyrusPaths', 'autoload: bool'] Returns: None

load_entries

Loads raw entries from the zephyrus_log.json file. Parameters: ['self: Any'] Returns: Tuple[List[str], List[Dict[str, Any]]]

_process_categories

Processes categories for a given date, updating the texts and metadata. Parameters: ['self: Any', 'date: str', 'categories: Dict[str, Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]

_process_subcategories

Processes subcategories within a main category for a given date. Parameters: ['self: Any', 'date: str', 'main_cat: str', 'subcats: Dict[str, Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]

_process_entries

Processes a list of entries for a given date, main category, and subcategory. Parameters: ['self: Any', 'date: str', 'main_cat: str', 'subcat: str', 'entries: List[Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]

build_index_from_logs

Loads entries from file and rebuilds FAISS index. Parameters: ['self: Any'] Returns: bool

rebuild

Rebuilds the raw log index from scratch. This method loads entries from the log file, rebuilds the FAISS index, and saves the new index. Raises: Exception: If an error occurs while rebuilding the index. Parameters: ['self: Any'] Returns: None

scripts.indexers.summary_indexer

🧠 Docstring Summary

Section Content
Description summary_indexer.py
This module defines the SummaryIndexer class for building and managing a FAISS vector index over summarized entries from correction_summaries.json.
Core features include:
- Loading and parsing summarized entries organized by date, main category, and subcategory.
- Extracting summary texts and associated metadata for semantic indexing.
- Building, saving, loading, and rebuilding a FAISS index for semantic search across all summarized corrections.
- Robust error handling and logging for file I/O and data processing.
- Designed for use in the Zephyrus project to enable fast, flexible semantic search over all summarized log data.
Args
Returns

📦 Classes

SummaryIndexer

Builds a FAISS index from summarized entries in correction_summaries.json. Core features include loading and parsing summarized entries, extracting summary texts, and managing the FAISS index for semantic search. Parameters: ['self: Any', 'paths: ZephyrusPaths', 'autoload: bool'] Returns: None

🛠️ Functions

__init__

Initializes a SummaryIndexer object. Parameters: ['self: Any', 'paths: ZephyrusPaths', 'autoload: bool'] Returns: None

load_entries

Loads summarized entries from the correction_summaries.json file. Parameters: ['self: Any'] Returns: Tuple[List[str], List[Dict[str, Any]]]

_process_categories

No description available. Parameters: ['self: Any', 'date: str', 'categories: Dict[str, Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]

_process_subcategories

No description available. Parameters: ['self: Any', 'date: str', 'main_cat: str', 'subcats: Dict[str, Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]

_process_batches

No description available. Parameters: ['self: Any', 'date: str', 'main_cat: str', 'subcat: str', 'batches: List[Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]

load_index

Load the FAISS index and associated metadata from their respective files. Raises: FileNotFoundError: If the index file or metadata file is not found. Parameters: ['self: Any'] Returns: None

save_index

Save the FAISS index and associated metadata to their respective files. This method delegates to the BaseIndexer implementation. Parameters: ['self: Any'] Returns: None

rebuild_index

Rebuild the FAISS index from the summarized entries. This method invokes building index from logs and saving it. Parameters: ['self: Any'] Returns: None

build_index_from_logs

Loads entries from file and rebuilds FAISS index. Parameters: ['self: Any'] Returns: bool

rebuild

Rebuild the summary index from scratch. Parameters: ['self: Any'] Returns: None