scripts/indexers
¶
scripts.indexers.__init__
¶
🧠 Docstring Summary
Section | Content |
---|---|
Description | The indexers module provides classes and utilities for building, managing, and searching vector indexes over log and summary data. |
Core features include: | |
- Construction of FAISS indexes for both raw log entries and summarized corrections. | |
- Support for semantic search using SentenceTransformer embeddings. | |
- Management of index and metadata persistence for efficient retrieval. | |
- Utilities for rebuilding, updating, and searching indexes across different data granularities. | |
This module enables fast and flexible semantic search over structured and unstructured idea logs, supporting downstream applications such as idea retrieval, analytics, and intelligent querying. | |
Args | — |
Returns | — |
scripts.indexers.base_indexer
¶
🧠 Docstring Summary
Section | Content |
---|---|
Description | base_indexer.py |
This module defines the BaseIndexer class, which provides core functionality for building, saving, loading, and searching FAISS vector indexes over log and summary data. | |
Core features include: | |
- Initializing index and metadata paths based on project configuration and index type (summary or raw). | |
- Building a FAISS index from text data using SentenceTransformer embeddings. | |
- Saving and loading both the FAISS index and associated metadata. | |
- Performing semantic search over indexed data, returning the most relevant results with similarity scores. | |
- Supporting flexible configuration and robust error handling for index operations. | |
Intended for use as a base class for specialized indexers in the Zephyrus project, enabling fast and flexible semantic search over structured logs and summaries. | |
Args | — |
Returns | — |
📦 Classes¶
BaseIndexer
¶
No description available. Parameters: ['self: Any', 'paths: ZephyrusPaths', 'index_name: str'] Returns: None
🛠️ Functions¶
__init__
¶
Initializes the BaseIndexer object.
Sets the paths to the summaries file, FAISS index file, and metadata file based on the
provided index_name
and the ZephyrusPaths
object. If index_name
is "summary",
the paths are set to the correction summaries file, FAISS index file, and metadata file.
If index_name
is "raw", the paths are set to the JSON log file, raw log index file,
and raw log metadata file. In all other cases, a ValueError is raised.
Also, loads the SentenceTransformer model specified by the "embedding_model"
configuration key, or defaults to "all-MiniLM-L6-v2" if the key is missing.
Parameters:
['self: Any', 'paths: ZephyrusPaths', 'index_name: str']
Returns:
None
_load_model
¶
No description available. Parameters: ['self: Any'] Returns: Any
load_index
¶
Loads the FAISS index and associated metadata from their respective files.
This method reads the index from the file specified by self.index_path
and loads
the metadata from the file specified by self.metadata_path
. If either file does not
exist, a FileNotFoundError is raised.
Raises:
FileNotFoundError: If the index file or metadata file is not found.
Parameters:
['self: Any']
Returns:
None
search
¶
Searches the FAISS index for the given query and returns the top-k most relevant results. Parameters: ['self: Any', 'query: str', 'top_k: int'] Returns: List[Dict[str, Any]]
build_index
¶
Builds a FAISS index from provided texts and metadata. Parameters: ['self: Any', 'texts: List[str]', 'meta: List[Dict[str, Any]]', 'fail_on_empty: bool'] Returns: bool
save_index
¶
Saves the FAISS index to a file, and the associated metadata.
This method must be called after build_index
or load_index
has been called.
Parameters:
['self: Any']
Returns:
None
scripts.indexers.raw_log_indexer
¶
🧠 Docstring Summary
Section | Content |
---|---|
Description | This module defines the RawLogIndexer class for building and managing a FAISS vector index |
over raw log entries from zephyrus_log.json. | |
Core features: | |
- Loading and parsing raw log entries by date, main category, and subcategory. | |
- Extracting entry content and metadata for semantic indexing. | |
- Building, saving, loading, and rebuilding a FAISS index for full-text vector search. | |
- Robust error handling and logging for file I/O and data processing. | |
- Designed for use in the Zephyrus project to enable fast, flexible semantic search. | |
Args | — |
Returns | — |
📦 Classes¶
RawLogIndexer
¶
Builds a FAISS index from raw entries in zephyrus_log.json. Used for full-text vector search across all logged ideas (not just summaries). Attributes: log_path (str): The path to the JSON log file. Parameters: ['self: Any', 'paths: ZephyrusPaths', 'autoload: bool'] Returns: None
🛠️ Functions¶
__init__
¶
Initializes the RawLogIndexer with the specified paths and optionally loads the index. Parameters: ['self: Any', 'paths: ZephyrusPaths', 'autoload: bool'] Returns: None
load_entries
¶
Loads raw entries from the zephyrus_log.json file. Parameters: ['self: Any'] Returns: Tuple[List[str], List[Dict[str, Any]]]
_process_categories
¶
Processes categories for a given date, updating the texts and metadata. Parameters: ['self: Any', 'date: str', 'categories: Dict[str, Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]
_process_subcategories
¶
Processes subcategories within a main category for a given date. Parameters: ['self: Any', 'date: str', 'main_cat: str', 'subcats: Dict[str, Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]
_process_entries
¶
Processes a list of entries for a given date, main category, and subcategory. Parameters: ['self: Any', 'date: str', 'main_cat: str', 'subcat: str', 'entries: List[Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]
build_index_from_logs
¶
Loads entries from file and rebuilds FAISS index. Parameters: ['self: Any'] Returns: bool
rebuild
¶
Rebuilds the raw log index from scratch. This method loads entries from the log file, rebuilds the FAISS index, and saves the new index. Raises: Exception: If an error occurs while rebuilding the index. Parameters: ['self: Any'] Returns: None
scripts.indexers.summary_indexer
¶
🧠 Docstring Summary
Section | Content |
---|---|
Description | summary_indexer.py |
This module defines the SummaryIndexer class for building and managing a FAISS vector index over summarized entries from correction_summaries.json. | |
Core features include: | |
- Loading and parsing summarized entries organized by date, main category, and subcategory. | |
- Extracting summary texts and associated metadata for semantic indexing. | |
- Building, saving, loading, and rebuilding a FAISS index for semantic search across all summarized corrections. | |
- Robust error handling and logging for file I/O and data processing. | |
- Designed for use in the Zephyrus project to enable fast, flexible semantic search over all summarized log data. | |
Args | — |
Returns | — |
📦 Classes¶
SummaryIndexer
¶
Builds a FAISS index from summarized entries in correction_summaries.json. Core features include loading and parsing summarized entries, extracting summary texts, and managing the FAISS index for semantic search. Parameters: ['self: Any', 'paths: ZephyrusPaths', 'autoload: bool'] Returns: None
🛠️ Functions¶
__init__
¶
Initializes a SummaryIndexer object. Parameters: ['self: Any', 'paths: ZephyrusPaths', 'autoload: bool'] Returns: None
load_entries
¶
Loads summarized entries from the correction_summaries.json file. Parameters: ['self: Any'] Returns: Tuple[List[str], List[Dict[str, Any]]]
_process_categories
¶
No description available. Parameters: ['self: Any', 'date: str', 'categories: Dict[str, Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]
_process_subcategories
¶
No description available. Parameters: ['self: Any', 'date: str', 'main_cat: str', 'subcats: Dict[str, Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]
_process_batches
¶
No description available. Parameters: ['self: Any', 'date: str', 'main_cat: str', 'subcat: str', 'batches: List[Any]', 'texts: List[str]', 'meta: List[Dict[str, Any]]'] Returns: Tuple[List[str], List[Dict[str, Any]]]
load_index
¶
Load the FAISS index and associated metadata from their respective files. Raises: FileNotFoundError: If the index file or metadata file is not found. Parameters: ['self: Any'] Returns: None
save_index
¶
Save the FAISS index and associated metadata to their respective files. This method delegates to the BaseIndexer implementation. Parameters: ['self: Any'] Returns: None
rebuild_index
¶
Rebuild the FAISS index from the summarized entries. This method invokes building index from logs and saving it. Parameters: ['self: Any'] Returns: None
build_index_from_logs
¶
Loads entries from file and rebuilds FAISS index. Parameters: ['self: Any'] Returns: bool
rebuild
¶
Rebuild the summary index from scratch. Parameters: ['self: Any'] Returns: None