API#

The Dataherald Engine exposes RESTful APIs that can be used to:

  • 🔌 Connect to and manage connections to databases

  • 🔑 Add context to the engine through scanning the databases, adding database level instructions, adding descriptions to tables and columns and adding golden records

  • 🙋‍♀️ Ask natural language questions from the relational data

Our APIs have resource-oriented URL built around standard HTTP response codes and verbs. The core resources are described below.

Database Connections#

The database-connections object allows you to define connections to your relational data stores.

Related endpoints are:

Database connection resource example:

{
    "alias": "string",
    "use_ssh": false,
    "connection_uri": "string",
    "path_to_credentials_file": "string",
    "llm_api_key": "string",
    "ssh_settings": {
        "db_name": "string",
        "host": "string",
        "username": "string",
        "password": "string",
        "remote_host": "string",
        "remote_db_name": "string",
        "remote_db_password": "string",
        "private_key_password": "string",
        "db_driver": "string"
    }
}

Table Descriptions#

The table-descriptions object is used to add context about the tables and columns in the relational database. These are then used to help the LLM build valid SQL to answer natural language questions.

Related endpoints are:

Table description resource example:

{
    "columns": [{}],
    "db_connection_id": "string",
    "description": "string",
    "examples": [{}],
    "table_name": "string",
    "table_schema": "string"
}

Database Instructions#

The database-instructions object is used to set constraints on the SQL that is generated by the LLM. These are then used to help the LLM build valid SQL to answer natural language questions based on your business rules.

Related endpoints are:

Instruction resource example:

{
    "db_connection_id": "string",
    "instruction": "string",
}

Prompts#

The prompt object is used to ask questions or pass any natural language text to the LLM.

Related endpoints are:

prompt resource example:

{
    "id": "str"
    "db_connection_id": "str"
    "text": "str"
    "created_at": "datetime",
    "metadata": "dict | None"
}

SQL generations#

Given a prompt, Dataherald AI agents can generate SQL queries to efficiently answer the question or provide the necessary information.

Related endpoints are:

SQL generation resource example:

{
    "id": "str"
    "prompt_id": "str"
    "finetuning_id": "str"
    "evaluate": "bool"
    "llm_config": {
        "llm_name": "str",
        "api_base": "str"
    },
    "sql": "str"
    "status": "str"
    "completed_at": "datetime"
    "tokens_used": "int"
    "confidence_score": "float"
    "error": "str"
    "created_at": "datetime",
    "metadata": "dict | None"
}

NL generations#

For each SQL generation, the LLMs can generate a natural language response based on the SQL query results.

Related endpoints are:

Nl generation resource example:

{
    "id": "str"
    "sql_generation_id": "str"
    "llm_config": {
        "llm_name": "str",
        "api_base": "str"
    },
    "text": "str"
    "created_at": "datetime",
    "metadata": "dict | None"
}

Finetuning jobs#

The finetuning object is used to finetune the LLM to your data. This is an asynchronous process that uploads your golden records to model provider servers and creates a finetuning job. The finetuned model is going to be used inside an agent for generating SQL queries.

Related endpoints are:

Finetuning resource example:

{
    "id": "finetuing-job-id",
    "db_connection_id": "database_connection_id",
    "alias": "model name",
    "status": "finetuning_job_status", // Possible values: queued, running, succeeded, validating_files, failed, or cancelled
    "error": "The error message if the job failed", // Optional, default is None
    "base_llm": {
        "model_provider": "model_provider_name", // Currently, only 'openai'
        "model_name": "model_name", // Supported: gpt-3.5-turbo, gpt-4
        "model_parameters": {
            "n_epochs": "int or string", // Optional, default 3
            "batch_size": "int or string", // Optional, default 1
            "learning_rate_multiplier": "int or string" // Optional, default "auto"
        }
    },
    "finetuning_file_id": "File ID for finetuning file",
    "finetuning_job_id": "Finetuning job ID",
    "model_id": "Model ID after finetuning",
    "created_at": "datetime",
    "golden_sqls": "array[ids]", // Default is None, meaning use all golden records
    "metadata": "dict | None" // Optional, default None
}

Error Codes#

Certain errors are accompanied by an error code and an explanatory message. These errors trigger an HTTP 400 response code.

DB connection error code response example:

{
  "error_code": "invalid_database_uri_format",
  "message": "Invalid URI format: foo",
  "description": null,
  "detail": {
    "alias": "foo",
    "use_ssh": false,
    "connection_uri": "gdfgdgAABl5e-dfg_-wErFJdFZeVXwnmew_dfg__WU-dfgdfa=="
  }
}