Application Storage (DB)

Application Storage (DB)#

The application storage (referred to as DB in the code) is the database that stores the Dataherald AI application logic. The engine currently only includes MongoDB as a supported Application Storage implementation.

Stored Information#

The application logic is stored in the following Mongo collections:

  1. database_connection- Stores the information needed to connect to a structured database to be queried.

  2. nl_question - List of natural language questions asked so far through the /question endpoint

  3. nl_query_response - Stores generated responses from Dataherald AI engine, including

  4. table_schema_detail - Stores metadata about the tables and columns from the connected structured data stores. These can be added automatically from a scan or manually by the admin.

  5. golden_records - Stores the golden records verfied by the admin or inserted manually by the admin. These are used to augment the prompts and improve the performance of the engine.

Abstract Storage Class#

All implementations of the Application Storage module must inherit and implement the abstract DB class. While MongoDB is the only supported implementation at this time, the abstract class is designed to be easily extended to support other storage solutions.

DB#

This abstract class provides a consistent interface for working with different storage solutions.

__init__(self, system: System)

Initializes the database storage instance.

Parameters:

system (System) – The system object.

insert_one(self, collection: str, obj: dict) int

Inserts a single document into the specified collection.

Parameters:
  • collection (str) – The name of the collection.

  • obj (dict) – The document to insert.

Returns:

The number of documents inserted (usually 1).

Return type:

int

update_or_create(self, collection: str, query: dict, obj: dict) int

Updates or creates a document in the specified collection based on a query.

Parameters:
  • collection (str) – The name of the collection.

  • query (dict) – The query to find the document to update.

  • obj (dict) – The document to update or create.

Returns:

The number of documents updated or created.

Return type:

int

find_one(self, collection: str, query: dict) dict

Retrieves a single document from the specified collection based on a query.

Parameters:
  • collection (str) – The name of the collection.

  • query (dict) – The query to find the document.

Returns:

The retrieved document.

Return type:

dict

find_by_id(self, collection: str, id: str) dict

Retrieves a document from the specified collection based on its ID.

Parameters:
  • collection (str) – The name of the collection.

  • id (str) – The ID of the document to retrieve.

Returns:

The retrieved document.

Return type:

dict

find(self, collection: str, query: dict) list

Retrieves a list of documents from the specified collection based on a query.

Parameters:
  • collection (str) – The name of the collection.

  • query (dict) – The query to find the documents.

Returns:

A list of retrieved documents.

Return type:

list

find_all(self, collection: str) list

Retrieves all documents from the specified collection.

Parameters:

collection (str) – The name of the collection.

Returns:

A list of retrieved documents.

Return type:

list

delete_by_id(self, collection: str, id: str) int

Deletes a document from the specified collection based on its ID.

Parameters:
  • collection (str) – The name of the collection.

  • id (str) – The ID of the document to delete.

Returns:

The number of documents deleted (usually 1).

Return type:

int