Create a Database connection
=============================

We currently support connections to the following data warehouses: ``databricks``, ``postgresql``, ``snowflake``, ``bigquery`` and ``AWS Athena``. All sensitive connection data
is encrypted using the key you provide in your .env file before being stored to the application storage. 

You can also specify the engine to connect to the Database through an SSH tunnel, as demonstrated in the second example below.

You can include the API key for the LLM in the request body as an optional parameter. If you do not include the API key, the LLM will use the API key specified in the .env file.

You can find additional details on how to connect to each of the supported data warehouses :ref:`below <Supported Data warehouses>`.

You have the flexibility to configure your own file storage service credentials, such as AWS S3, to manage the storage of all CSV files generated by the /questions or /responses endpoints.

Once the database connection is established, it retrieves the table names and creates `table-descriptions` resources with a status of `NOT_SCANNED`.

**Request this POST endpoint**::

   /api/v1/database-connections

**Request body**

.. code-block:: rst

   {
    "alias": "string",
    "use_ssh": false,
    "connection_uri": "string",
    "schemas": [
      "string"
    ],
    "path_to_credentials_file": "string",
    "llm_api_key": "string",
    "ssh_settings": {
      "host": "string",
      "username": "string",
      "password": "string",
      "private_key_password": "string"
    },
    "file_storage": {
        "name": "string",
        "access_key_id": "string",
        "secret_access_key": "string",
        "region": "string",
        "bucket": "string"
      }
  }

**SSH Parameters**

- Set the **use_ssh** field to **true**
- Set the **connection_uri** field using the following structure::

    <db-driver>://<user>:<password>@<host>/<db-name>`

- Set in **path_to_credentials_file** your SSH key file location, if you use s3 like this::

    "s3://<bucket-name>/<file-name>"

.. csv-table::
   :header: "Name", "Type", "Description"
   :widths: 20, 20, 60

    "host", "string", "The hostname or IP address of the SSH server you need to access"
    "username", "string", "Your username for SSH authentication"
    "password", "string", "Your password for SSH authentication"
    "private_key_password", "string", "The password for the id_rsa private key file, if it is password-protected"


**File Storage Parameters**

.. csv-table::
   :header: "Name", "Type", "Description"
   :widths: 20, 20, 60

    "name", "string", "Set S3, it is required"
    "access_key_id", "string", "Your AWS access key, it is encrypted internally"
    "secret_access_key", "string", "Your AWS secret access key, it is encrypted internally"
    "region", "string", "Your bucket region"
    "bucket", "string", "Your bucket name"

**Responses**

HTTP 201 code response

.. code-block:: rst

    {
      "id": "64f251ce9614e0e94b0520bc",
      "alias": "string_999",
      dialect: "postgresql",
      "use_ssh": true,
      "connection_uri": "gAAAAABk8lHQNAUn5XARb94Q8H1OfHpVzOtzP3b2LCpwxUsNCe7LGkwkN8FX-IF3t65oI5mTzgDMR0BY2lzvx55gO0rxlQxRDA==",
      "path_to_credentials_file": "string",
      "llm_api_key": "gAAAAABlCz5TeU0ym4hW3bf9u21dz7B9tlnttOGLRDt8gq2ykkblNvpp70ZjT9FeFcoyMv-Csvp3GNQfw66eYvQBrcBEPsLokkLO2Jc2DD-Q8Aw6g_8UahdOTxJdT4izA6MsiQrf7GGmYBGZqbqsjTdNmcq661wF9Q==",
      "ssh_settings": {
        "host": "string",
        "username": "string",
        "password": "gAAAAABk8lHQAaaSuoUKxddkMHw7jerwFmUeiE3hL6si06geRt8CV-r43fbckZjI6LbIULWPZ4HlQUF9_YpfaYfM6FarQbhDUQ==",
        "private_key_password": "gAAAAABk8lHQWilFpIbCADvunHGYFMqgoPKIml_WRXf5Yuowqng28DVsq6-sChl695y5D_mWrr1I3hcJCZqkmhDqpma6iz3PKA=="
      },
      "file_storage": {
        "name": "S3",
        "access_key_id": "gAAAAABk8lHQAaaSuoUKxddkMHw7jerwFmUeiE3hL6si06geRt8CV-r43fbckZjI6LbIULWPZ4HlQUF9_YpfaYfM6FarQbhDUQ==",
        "secret_access_key": "gAAAAABk8lHQAaaSuoUKxddkMHw7jerwFmUeiE3hL6si06geRt8CV-r43fbckZjI6LbIULWPZ4HlQUF9_YpfaYfM6FarQbhDUQ==",
        "region": "us-east-1",
        "bucket": "my-bucket"
      }
    }

HTTP 400 code response (if the db connection fails it returns a 400 error), :doc:`here <api.error_codes>` you can find
all the error codes

.. code-block:: rst

    {
      "error_code": "string",
      "message": "string",
      "description": "string",
      "detail": {
        "alias": "string",
        "use_ssh": false,
        "connection_uri": "string"
      }
    }

**Example 1**

Without a SSH connection

.. code-block:: rst

   curl -X 'POST' \
      '<host>/api/v1/database-connections' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
      "alias": "my_db_alias_identifier",
      "use_ssh": false,
      "connection_uri": "sqlite:///mydb.db"
    }'

**Example 2**

With a SSH connection

.. code-block:: rst

    curl -X 'POST' \
      '<host>/api/v1/database-connections' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
      "alias": "my_db_alias",
      "use_ssh": true,
      "connection_uri": "postgresql+psycopg2://<user>:<password>@<host>/<db-name>"
      "ssh_settings": {
        "host": "string",
        "username": "string",
        "password": "string",
        "private_key_password": "string",
      }
    }'

**Example 3**

With a SSH connection and LLM credentials

.. code-block:: rst

    url -X 'POST' \
      '<host>/api/v1/database-connections' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
      "alias": "my_db_alias",
      "use_ssh": true,
      "connection_uri": "postgresql+psycopg2://<user>:<password>@<host>/<db-name>",
      "llm_api_key": "api_key",
      "ssh_settings": {
        "host": "string",
        "username": "string",
        "password": "string",
        "private_key_password": "string"
      }
    }'


.. _Supported Data warehouses: 

Connections to supported Data warehouses
-----------------------------------------

The format of the ``connection_uri`` parameter in the API call will depend on the data warehouse type you are connecting to. 
You can find samples and how to generate them below.

Postgres
^^^^^^^^^^^^

Uri structure::

"connection_uri": postgresql+psycopg2://<user>:<password>@<host>:<port>/<db-name>

Example::

"connection_uri": postgresql+psycopg2://admin:123456@foo.rds.amazonaws.com:5432/my-database

Specify a schema (If it isn't specified by default it uses `public`)::

"connection_uri": postgresql+psycopg2://<user>:<password>@<host>:<port>/<db-name>?options=-csearch_path=<my-schema>

MySQL/MariaDB
^^^^^^^^^^^^

Uri structure::

"connection_uri": mysql+pymysql://<user>:<password>@<host>:<port>/<db-name>

Example::

"connection_uri": mysql+pymysql://admin:123456@foo.rds.amazonaws.com:3306/my-database

Microsoft SQL Server
^^^^^^^^^^^^

Uri structure::

"connection_uri": mssql+pymssql://<user>:<password>@<host>:<port>/<db-name>

Example::

"connection_uri": mssql+pymssql://admin:123456@foo.rds.amazonaws.com:1433/my-database

To specify a schema other than the default dbo, execute the following command::

ALTER USER <your_username> WITH DEFAULT_SCHEMA = <your_schema_name>;

Databricks
^^^^^^^^^^^^

Uri structure::

"connection_uri": databricks://token:<token>@<host>?http_path=<http-path>&catalog=<catalog>&schema=<schema-name>

Example::

"connection_uri": databricks://token:abcd1234abcd1234abcd1234abcd1234@foo-bar.cloud.databricks.com?http_path=sql/protocolv1/o/123456/123-1234-abcdabcd&catalog=foobar&schema=default

Snowflake
^^^^^^^^^^^^

Uri structure::

"connection_uri": snowflake://<user>:<password>@<organization>-<account-name>/<database>/<schema>

Example::

"connection_uri": snowflake://jon:123456@foo-bar/my-database/public

Redshift
^^^^^^^^^^^^

Uri structure::

"connection_uri": redshift+psycopg2://<user>:<password>@<host>:<port>/<database>

Example::

"connection_uri": redshift+psycopg2://jon:123456@host.amazonaws.com:5439/my-database

ClickHouse
^^^^^^^^^^^^

Uri structure::

"connection_uri": clickhouse+http://<user>:<password>@<host>:<port>/<db_name>?protocol=https

Example::

"connection_uri": clickhouse+http://jon:123456@foo-bar:8443/my-database?protocol=https

AWS Athena
^^^^^^^^^^^^

Uri structure::

"connection_uri": awsathena+rest://<aws_access_key_id>:<aws_secret_access_key>@athena.<region_name>.amazonaws.com:443/<schema_name>?s3_staging_dir=<s3_staging_dir>&work_group=primary

Example::

"connection_uri": awsathena+rest://foobar:foobar@athena.us-east-2.amazonaws.com:443/db_test?s3_staging_dir=s3://my-bucket/output/&work_group=primary

MotherDuck
^^^^^^^^^^^^

Uri structure::

"connection_uri": duckdb:///md:<db_name>?motherduck_token=<token>

Example::

"connection_uri": duckdb:///md:foobar?motherduck_token=abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234abcd1234

BigQuery
^^^^^^^^^^^^

To connect to BigQuery you should create a json credential file. Please follow Steps 1-3 under "Configure BigQuery
Authentication in Google Cloud Platform" in
this `tutorial <https://www.privacydynamics.io/docs/connections/bigquery.html>`_.

    Please ensure the service account only has **"Viewer"** permissions.

Once you have your credential json file you can store it inside the project. For example given the credential file `my-db-123456acbd.json` 
in the folder `private_credentials`  you should set in the endpoint param `path_to_credentials_file` the path, for example::

    "path_to_credentials_file": "private_credentials/my-db-123456acbd.json"


and the ``connection_uri`` will be:

Uri structure::

"connection_uri": bigquery://<project>/<database>

Example::

"connection_uri": bigquery://v2-real-estate/K2


**Connecting multi-schemas**

You can connect many schemas using one db connection if you want to create SQL joins between schemas.
Currently only `BigQuery`, `Snowflake`, `Databricks` and `Postgres` support this feature.
To use multi-schemas instead of sending the `schema` in the `connection_uri` set it in the `schemas` param, like this:

**Example**

.. code-block:: rst

   curl -X 'POST' \
      '<host>/api/v1/database-connections' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
      "alias": "my_db_alias_identifier",
      "use_ssh": false,
      "connection_uri": "snowflake://<user>:<password>@<organization>-<account-name>/<database>",
      "schemas": ["foo", "bar"]
    }'
