Skip to content

Quickstart Guide 1

This guide will help you get started with db-ally. We will use a simple example to demonstrate how to use db-ally to query a database using an AI model. We will use OpenAI's GPT to generate SQL queries based on natural language questions and SqlAlchemy to interact with the database.

Note

For examples of using db-ally with other data sources and AI models, please refer to our other how-to guides.

We will cover the following topics:

Installation

To install db-ally, execute the following command:

pip install dbally

Since we will be using OpenAI's GPT, you also need to install the openai extension:

pip install dbally[openai]

Database Configuration

In this guide, we will use an example SQLAlchemy database containing a single table named candidates. This table includes columns such as id, name, country, years_of_experience, position, university, skills, and tags. You can download the example database from candidates.db. Alternatively, you can use your own database and models.

To connect to the database using SQLAlchemy, you need an engine and your database models. Start by creating an engine:

from sqlalchemy import create_engine

engine = create_engine('sqlite:///candidates.db')

Next, define an SQLAlchemy model for the candidates table. You can either declare the Candidate model using declarative mapping or generate it using automap. For simplicity, we'll use automap:

from sqlalchemy.ext.automap import automap_base

Base = automap_base()
Base.prepare(autoload_with=engine)
Candidate = Base.classes.candidates

View Definition

To use db-ally, define the views you want to use. A structured view is a class that specifies what to select from the database and includes methods that the AI model can use to filter rows. These methods are known as "filters".

from dbally import decorators, SqlAlchemyBaseView
import sqlalchemy

class CandidateView(SqlAlchemyBaseView):
    """
    A view for retrieving candidates from the database.
    """

    def get_select(self) -> sqlalchemy.Select:
        """
        Create the initial SQLAlchemy select object, used to build the query.
        """
        return sqlalchemy.select(Candidate)

    @decorators.view_filter()
    def at_least_experience(self, years: int) -> sqlalchemy.ColumnElement:
        """
        Filter candidates with at least `years` of experience.
        """
        return Candidate.years_of_experience >= years

    @decorators.view_filter()
    def senior_data_scientist_position(self) -> sqlalchemy.ColumnElement:
        """
        Filter candidates eligible for a senior data scientist position.
        """
        return sqlalchemy.and_(
            Candidate.position.in_(["Data Scientist", "Machine Learning Engineer", "Data Engineer"]),
            Candidate.years_of_experience >= 3,
        )

    @decorators.view_filter()
    def from_country(self, country: str) -> sqlalchemy.ColumnElement:
        """
        Filter candidates from a specific country.
        """
        return Candidate.country == country

By setting up these filters, you enable the LLM to fetch candidates while optionally applying filters based on experience, country, and eligibility for a senior data scientist position.

Note

The from_country filter defined above supports only exact matches, which is not always ideal. Thankfully, db-ally comes with a solution for this problem - Similarity Indexes, which can be used to find the most similar value from the ones available. Refer to Quickstart Part 2: Semantic Similarity for an example of using semantic similarity when filtering candidates by country.

OpenAI Access Configuration

To use OpenAI's GPT, configure db-ally and provide your OpenAI API key:

from dbally.llm_client.openai_client import OpenAIClient

llm = OpenAIClient(model_name="gpt-3.5-turbo", api_key="...")

Replace ... with your OpenAI API key. Alternatively, you can set the OPENAI_API_KEY environment variable with your API key and omit the api_key parameter altogether.

Collection Definition

Next, create a db-ally collection. A collection is an object where you register views and execute queries. It also requires an AI model to use for generating IQL queries (in this case, the GPT model defined above).

import dbally

async def main():
    collection = dbally.create_collection("recruitment", llm)
    collection.add(CandidateView, lambda: CandidateView(engine))

Note

While this guide uses a single view, you can create multiple views, registering them with the collection. Based on the query, the AI model will determine which view to use.

Query Execution

Once you have defined and registered the views with the collection, you can run a query. Add the following code to the main function:

result = await collection.ask("Find me French candidates suitable for a senior data scientist position.")

print(f"The generated SQL query is: {result.context.get('sql')}")
print()
print(f"Retrieved {len(result.results)} candidates:")
for candidate in result.results:
    print(candidate)

This code will return a list of French candidates eligible for a senior data scientist position and display them along with the generated SQL query.

To finish, run the main function:

import asyncio

if __name__ == "__main__":
    asyncio.run(main())

Now you can run the script and view the results. The database has a single candidate from France who is eligible for a senior data scientist position.

The expected output
The generated SQL query is: SELECT candidates.name, candidates.country, candidates.years_of_experience, candidates.position, candidates.university, candidates.skills, candidates.tags, candidates.id
FROM candidates
WHERE candidates.country = 'France' AND candidates.position IN ('Data Scientist', 'Machine Learning Engineer', 'Data Engineer') AND candidates.years_of_experience >= 3

Retrieved 1 candidates:
{'name': 'Sophie Dubois', 'country': 'France', 'years_of_experience': 4, 'position': 'Data Engineer', 'university': 'École Polytechnique', 'skills': 'SQL;Python;ETL', 'tags': 'Data Warehousing;Big Data', 'id': 46}

Full Example

Access the full example here: quickstart_code.py

Next Steps

Explore Quickstart Part 2: Semantic Similarity to expand on the example and learn about using semantic similarity.