API#

The Dataherald Engine exposes RESTful APIs that can be used to:

🔌 Connect to and manage connections to databases
🔑 Add context to the engine through scanning the databases, adding database level instructions, adding descriptions to tables and columns and adding golden records
🙋‍♀️ Ask natural language questions from the relational data

Our APIs have resource-oriented URL built around standard HTTP response codes and verbs. The core resources are described below.

Database Connections#

The database-connections object allows you to define connections to your relational data stores.

Related endpoints are:

Create database connection – POST api/v1/database-connections
List database connections – GET api/v1/database-connections
Update a database connection – PUT api/v1/database-connections/{db_connection_id}

Database connection resource example:

{
    "alias": "string",
    "use_ssh": false,
    "connection_uri": "string",
    "path_to_credentials_file": "string",
    "llm_api_key": "string",
    "ssh_settings": {
        "db_name": "string",
        "host": "string",
        "username": "string",
        "password": "string",
        "remote_host": "string",
        "remote_db_name": "string",
        "remote_db_password": "string",
        "private_key_password": "string",
        "db_driver": "string"
    }
}

Table Descriptions#

The table-descriptions object is used to add context about the tables and columns in the relational database. These are then used to help the LLM build valid SQL to answer natural language questions.

Related endpoints are:

Scan table description – POST api/v1/table-descriptions/sync-schemas
Update a table description – PATCH api/v1/table-descriptions/{table_description_id}
List table description – GET api/v1/table-descriptions
Get a description – GET api/v1/table-descriptions/{table_description_id}
Refresh table descriptions – GET api/v1/table-descriptions/refresh

Table description resource example:

{
    "columns": [{}],
    "db_connection_id": "string",
    "description": "string",
    "examples": [{}],
    "table_name": "string",
    "table_schema": "string"
}

Database Instructions#

The database-instructions object is used to set constraints on the SQL that is generated by the LLM. These are then used to help the LLM build valid SQL to answer natural language questions based on your business rules.

Related endpoints are:

Add database instructions – POST api/v1/{db_connection_id}/instructions
List database instructions – GET api/v1/{db_connection_id}/instructions
Update database instructions – PUT api/v1/{db_connection_id}/instructions/{instruction_id}
Delete database instructions – DELETE api/v1/{db_connection_id}/instructions/{instruction_id}

Instruction resource example:

{
    "db_connection_id": "string",
    "instruction": "string",
}

Prompts#

The prompt object is used to ask questions or pass any natural language text to the LLM.

Related endpoints are:

Create a prompt – POST api/v1/prompts
List prompts – GET api/v1/prompts
Get a prompt – GET api/v1/prompts/{prompt_id}

prompt resource example:

{
    "id": "str"
    "db_connection_id": "str"
    "text": "str"
    "created_at": "datetime",
    "metadata": "dict | None"
}

SQL generations#

Given a prompt, Dataherald AI agents can generate SQL queries to efficiently answer the question or provide the necessary information.

Related endpoints are:

create a SQL generation for a given prompt – POST api/v1/prompts/{prompt_id}/sql-generations
create a prompt and a SQL generation – POST api/v1/prompts/sql-generations
list SQL generations – GET api/v1/sql-generations
get a SQL generation – GET api/v1/sql-generations/{sql_generation_id}
get a SQL a csv file of SQL query results – GET api/v1/sql-generations/{sql_generation_id}/csv-file
execute a SQL generation – GET api/v1/sql-generations/{sql_generation_id}/execute

SQL generation resource example:

{
    "id": "str"
    "prompt_id": "str"
    "finetuning_id": "str"
    "evaluate": "bool"
    "llm_config": {
        "llm_name": "str",
        "api_base": "str"
    },
    "sql": "str"
    "status": "str"
    "completed_at": "datetime"
    "tokens_used": "int"
    "confidence_score": "float"
    "error": "str"
    "created_at": "datetime",
    "metadata": "dict | None"
}

NL generations#

For each SQL generation, the LLMs can generate a natural language response based on the SQL query results.

Related endpoints are:

create a NL generation for a given SQL generation – POST api/v1/sql-generations/{sql_generation_id}/nl-generations
create a SQL generation and a NL generation for a given prompt – POST api/v1/prompts/{prompt_id}/sql-generations/nl-generations
create a prompt and a SQL generation and a NL generation – POST api/v1/prompts/sql-generations/nl-generations
list NL generations – GET api/v1/nl-generations
get a NL generation – GET api/v1/nl-generations/{nl_generation_id}

Nl generation resource example:

{
    "id": "str"
    "sql_generation_id": "str"
    "llm_config": {
        "llm_name": "str",
        "api_base": "str"
    },
    "text": "str"
    "created_at": "datetime",
    "metadata": "dict | None"
}

Finetuning jobs#

The finetuning object is used to finetune the LLM to your data. This is an asynchronous process that uploads your golden records to model provider servers and creates a finetuning job. The finetuned model is going to be used inside an agent for generating SQL queries.

Related endpoints are:

Finetuning job create – POST api/v1/finetunings
Finetuning job get – GET api/v1/finetunings/{finetuning_id}
Finetuning job cancel – POST api/v1/finetunings/{finetuning_id}/cancel
Finetuning job delete – DELETE api/v1/finetunings/{finetuning_id}
Finetuning job list – GET api/v1/finetunings

Finetuning resource example:

{
    "id": "finetuing-job-id",
    "db_connection_id": "database_connection_id",
    "alias": "model name",
    "status": "finetuning_job_status", // Possible values: queued, running, succeeded, validating_files, failed, or cancelled
    "error": "The error message if the job failed", // Optional, default is None
    "base_llm": {
        "model_provider": "model_provider_name", // Currently, only 'openai'
        "model_name": "model_name", // Supported: gpt-3.5-turbo, gpt-4
        "model_parameters": {
            "n_epochs": "int or string", // Optional, default 3
            "batch_size": "int or string", // Optional, default 1
            "learning_rate_multiplier": "int or string" // Optional, default "auto"
        }
    },
    "finetuning_file_id": "File ID for finetuning file",
    "finetuning_job_id": "Finetuning job ID",
    "model_id": "Model ID after finetuning",
    "created_at": "datetime",
    "golden_sqls": "array[ids]", // Default is None, meaning use all golden records
    "metadata": "dict | None" // Optional, default None
}

Error Codes#

Certain errors are accompanied by an error code and an explanatory message. These errors trigger an HTTP 400 response code.

DB connection error code response example:

{
  "error_code": "invalid_database_uri_format",
  "message": "Invalid URI format: foo",
  "description": null,
  "detail": {
    "alias": "foo",
    "use_ssh": false,
    "connection_uri": "gdfgdgAABl5e-dfg_-wErFJdFZeVXwnmew_dfg__WU-dfgdfa=="
  }
}

API

Contents

API#

Database Connections#

Table Descriptions#

Database Instructions#

Prompts#

SQL generations#

NL generations#

Finetuning jobs#

Error Codes#