Environment Variables#
The Dataherald engine has a number of environment variables that need to be set in order for it to work. The following is the sample provided in the .env.example file with the default values.
OPENAI_API_KEY =
ORG_ID =
LLM_MODEL = 'gpt-4-32k'
GOLDEN_RECORD_COLLECTION = 'my-golden-records'
PINECONE_API_KEY =
PINECONE_ENVIRONMENT =
API_SERVER = "dataherald.api.fastapi.FastAPI"
SQL_GENERATOR = "dataherald.sql_generator.dataherald_sqlagent.DataheraldSQLAgent"
EVALUATOR = "dataherald.eval.simple_evaluator.SimpleEvaluator"
DB = "dataherald.db.mongo.MongoDB"
VECTOR_STORE = 'dataherald.vector_store.chroma.Chroma'
CONTEXT_STORE = 'dataherald.context_store.default.DefaultContextStore'
DB_SCANNER = 'dataherald.db_scanner.sqlalchemy.SqlAlchemyScanner'
MONGODB_URI = "mongodb://admin:admin@mongodb:27017"
MONGODB_DB_NAME = 'dataherald'
MONGODB_DB_USERNAME = 'admin'
MONGODB_DB_PASSWORD = 'admin'
ENCRYPT_KEY =
S3_AWS_ACCESS_KEY_ID =
S3_AWS_SECRET_ACCESS_KEY =
ONLY_STORE_CSV_FILES_LOCALLY =
DH_ENGINE_TIMEOUT =
UPPER_LIMIT_QUERY_RETURN_ROWS =
Variable Name |
Description |
Default Value |
Required |
---|---|---|---|
OPENAI_API_KEY |
The OpenAI key used by the Dataherald Engine |
None |
Yes |
ORG_ID |
The OpenAI Organization ID used by the Dataherald Engine |
None |
Yes |
LLM_MODEL |
The Language Model used by the Dataherald Engine. Supported values include gpt-4-32k, gpt-4, gpt-3.5-turbo, gpt-3.5-turbo-16k |
|
No |
GOLDEN_RECORD_COLLECTION |
The name of the collection in Mongo where golden records will be stored |
|
No |
PINECONE_API_KEY |
The Pinecone API key used |
None |
Yes if using the Pinecone vector store |
PINECONE_ENVIRONMENT |
The Pinecone environment |
None |
Yes if using the Pinecone vector store |
API_SERVER |
The implementation of the API Module used by the Dataherald Engine. |
|
Yes |
SQL_GENERATOR |
The implementation of the SQLGenerator Module to be used. |
|
Yes |
EVALUATOR |
The implementation of the Evaluator Module to be used. |
|
Yes |
DB |
The implementation of the DB Module to be used. |
|
Yes |
VECTOR_STORE |
The implementation of the Vector Store Module to be used. Chroma and Pinecone modules are currently included. |
|
Yes |
CONTEXT_STORE |
The implementation of the Context Store Module to be used. |
|
Yes |
DB_SCANNER |
The implementation of the DB Scanner Module to be used. |
|
Yes |
MONGODB_URI |
The URI of the MongoDB that will be used for application storage. |
|
Yes |
MONGODB_DB_NAME |
The name of the MongoDB database that will be used. |
|
Yes |
MONGODB_DB_USERNAME |
The username of the MongoDB database |
|
Yes |
MONGODB_DB_PASSWORD |
The password of the MongoDB database |
|
Yes |
ENCRYPT_KEY |
The key that will be used to encrypt data at rest before storing |
None |
Yes |
S3_AWS_ACCESS_KEY_ID |
The key used to access credential files if saved to S3 |
None |
No |
S3_AWS_SECRET_ACCESS_KEY |
The key used to access credential files if saved to S3 |
None |
No |
DH_ENGINE_TIMEOUT |
This is used to set the max seconds the process will wait for the response to be generate. If the specified time limit is exceeded, it will trigger an exception |
None |
No |
UPPER_LIMIT_QUERY_RETURN_ROWS |
The upper limit on number of rows returned from the query engine (equivalent to using LIMIT N in PostgreSQL/MySQL/SQlite). |
None |
No |
ONLY_STORE_CSV_FILES_LOCALLY |
Set to True if only want to save generated CSV files locally instead of S3. Note that if stored locally they should be treated as ephemeral, i.e., they will disappear when the engine is restarted. |
None |
No |