Evaluation Module#

The Evaluation Module is called as a post-processing step after SQL query generation. It assigns a confidence level between 0 and 1 to the generated SQL query. Different methods can be used to assign this score using LLM Confidence and Uncertainty values, topics which are outside the scope of this documentation.

There are currently two implementations of the Evaluation Module in the repo: the EvaluationAgent and the SimpleEvaluator.


The “EvaluationAgent” method utilizes an agent that interacts with a set of tools to generate evaluation scores. This method involves the following steps:

  1. InfoSQLDatabaseTool: This tool takes a list of tables as input and returns the schema along with sample rows for the given tables.

  2. QuerySQLDataBaseTool: This tool runs a provided query on the database.

  3. EntityFinder: This tool checks the existence of a given entity within a column.

Due to the reliance on these tools and the need to interact with them, the evaluation process takes around 40 to 50 seconds to evaluate a single query.


The “SimpleEvaluator” method is implemented using LLMs and is designed to be much faster compared to the “EvaluationAgent” method. In this method:

  1. A list of common problems associated with SQL queries is provided to the model.

  2. The model checks the generated query against all of these common issues.

  3. The evaluation process doesn’t require interactions with external tools.

This method is preferred when speed is crucial, as it doesn’t involve tool interactions.

Abstract Evaluator Class#

All implementations of the Evaluation component must inherit from the Evaluator abstract class and implement the following methods:

class Evaluation#

Represents the evaluation result with attributes.

  • id (str) – The evaluation’s ID.

  • question_id (str) – The associated question’s ID.

  • answer_id (str) – The associated answer’s ID.

  • score (float) – The confidence score, ranging from 0 to 1.

class Evaluator(Component, ABC)#

An abstract base class for evaluators.

  • database (SQLDatabase) – The SQLDatabase instance for evaluation.

  • acceptance_threshold (float) – The threshold for accepting generated responses.

  • system (System) – The system containing the evaluator.

get_confidence_score(question, generated_answer, database_connection)#

Determines if a generated response from the engine is acceptable based on the ACCEPTANCE_THRESHOLD.

  • question (Question) – The natural language question.

  • generated_answer (Response) – The generated SQL query response.

  • database_connection (DatabaseConnection) – The database connection.


The confidence score.

Return type:


evaluate(question, generated_answer, database_connection)#

Abstract method to evaluate a question with an SQL pair. Subclasses must implement this method.

  • question (Question) – The natural language question.

  • generated_answer (Response) – The generated SQL query response.

  • database_connection (DatabaseConnection) – The database connection.


An Evaluation instance.

Return type:
