Skip to content

Collection

Tip

To understand the general idea better, visit the Collection concept page.

dbally.create_collection

create_collection(name: str, llm_client: LLMClient, event_handlers: Optional[List[EventHandler]] = None, view_selector: Optional[ViewSelector] = None, nl_responder: Optional[NLResponder] = None) -> Collection

Create a new Collection that is a container for registering views and the main entrypoint to db-ally features.

Unlike instantiating a Collection directly, this function provides a set of default values for various dependencies like LLM client, view selector, IQL generator, and NL responder.

Example

    from dbally import create_collection
    from dbally.llm_client.openai_client import OpenAIClient

    collection = create_collection("my_collection", llm_client=OpenAIClient())
PARAMETER DESCRIPTION
name

Name of the collection is available for Event handlers and is used to distinguish different db-ally runs.

TYPE: str

llm_client

LLM client used by the collection to generate views and respond to natural language queries.

TYPE: LLMClient

event_handlers

Event handlers used by the collection during query executions. Can be used to log events as CLIEventHandler or to validate system performance as LangSmithEventHandler.

TYPE: Optional[List[EventHandler]] DEFAULT: None

view_selector

View selector used by the collection to select the best view for the given query. If None, a new instance of LLMViewSelector will be used.

TYPE: Optional[ViewSelector] DEFAULT: None

nl_responder

NL responder used by the collection to respond to natural language queries. If None, a new instance of NLResponder will be used.

TYPE: Optional[NLResponder] DEFAULT: None

RETURNS DESCRIPTION
Collection

a new instance of db-ally Collection

RAISES DESCRIPTION
ValueError

if default LLM client is not configured

Source code in src/dbally/_main.py
def create_collection(
    name: str,
    llm_client: LLMClient,
    event_handlers: Optional[List[EventHandler]] = None,
    view_selector: Optional[ViewSelector] = None,
    nl_responder: Optional[NLResponder] = None,
) -> Collection:
    """
    Create a new [Collection](collection.md) that is a container for registering views and the\
    main entrypoint to db-ally features.

    Unlike instantiating a [Collection][dbally.Collection] directly, this function\
    provides a set of default values for various dependencies like LLM client, view selector,\
    IQL generator, and NL responder.

    ##Example

    ```python
        from dbally import create_collection
        from dbally.llm_client.openai_client import OpenAIClient

        collection = create_collection("my_collection", llm_client=OpenAIClient())
    ```

    Args:
        name: Name of the collection is available for [Event handlers](event_handlers/index.md) and is\
        used to distinguish different db-ally runs.
        llm_client: LLM client used by the collection to generate views and respond to natural language\
        queries.
        event_handlers: Event handlers used by the collection during query executions. Can be used to\
        log events as [CLIEventHandler](event_handlers/cli_handler.md) or to validate system performance as\
        [LangSmithEventHandler](event_handlers/langsmith_handler.md).
        view_selector: View selector used by the collection to select the best view for the given query.\
        If None, a new instance of [LLMViewSelector][dbally.view_selection.llm_view_selector.LLMViewSelector]\
        will be used.
        nl_responder: NL responder used by the collection to respond to natural language queries. If None,\
        a new instance of [NLResponder][dbally.nl_responder.nl_responder.NLResponder] will be used.

    Returns:
        a new instance of db-ally Collection

    Raises:
        ValueError: if default LLM client is not configured
    """
    view_selector = view_selector or LLMViewSelector(llm_client=llm_client)
    nl_responder = nl_responder or NLResponder(llm_client=llm_client)
    event_handlers = event_handlers or []

    return Collection(
        name,
        nl_responder=nl_responder,
        view_selector=view_selector,
        llm_client=llm_client,
        event_handlers=event_handlers,
    )

dbally.Collection

Collection(name: str, view_selector: ViewSelector, llm_client: LLMClient, event_handlers: List[EventHandler], nl_responder: NLResponder, n_retries: int = 3)

Collection is a container for a set of views that can be used by db-ally to answer user questions.

Tip

It is recommended to create new collections using the dbally.create_colletion function instead of instantiating this class directly.

PARAMETER DESCRIPTION
name

Name of the collection is available for Event handlers and is used to distinguish different db-ally runs.

TYPE: str

view_selector

As you register more then one View within single collection, before generating the IQL query, a View that fits query the most is selected by the ViewSelector.

TYPE: ViewSelector

llm_client

LLM client used by the collection to generate views and respond to natural language queries.

TYPE: LLMClient

event_handlers

Event handlers used by the collection during query executions. Can be used to log events as CLIEventHandler or to validate system performance as LangSmithEventHandler.

TYPE: List[EventHandler]

nl_responder

Object that translates RAW response from db-ally into natural language.

TYPE: NLResponder

n_retries

IQL generator may produce invalid IQL. If this is the case this argument specifies how many times db-ally will try to regenerate it. Previous try with the error message is appended to the chat history to guide next generations.

TYPE: int DEFAULT: 3

Source code in src/dbally/collection.py
def __init__(
    self,
    name: str,
    view_selector: ViewSelector,
    llm_client: LLMClient,
    event_handlers: List[EventHandler],
    nl_responder: NLResponder,
    n_retries: int = 3,
) -> None:
    """
    Args:
        name: Name of the collection is available for [Event handlers](event_handlers/index.md) and is\
        used to distinguish different db-ally runs.
        view_selector: As you register more then one [View](views/index.md) within single collection,\
        before generating the IQL query, a View that fits query the most is selected by the\
        [ViewSelector](view_selection/index.md).
        llm_client: LLM client used by the collection to generate views and respond to natural language queries.
        event_handlers: Event handlers used by the collection during query executions. Can be used\
        to log events as [CLIEventHandler](event_handlers/cli_handler.md) or to validate system performance\
        as [LangSmithEventHandler](event_handlers/langsmith_handler.md).
        nl_responder: Object that translates RAW response from db-ally into natural language.
        n_retries: IQL generator may produce invalid IQL. If this is the case this argument specifies\
        how many times db-ally will try to regenerate it. Previous try with the error message is\
        appended to the chat history to guide next generations.
    """
    self.name = name
    self.n_retries = n_retries
    self._views: Dict[str, Callable[[], BaseView]] = {}
    self._builders: Dict[str, Callable[[], BaseView]] = {}
    self._view_selector = view_selector
    self._nl_responder = nl_responder
    self._event_handlers = event_handlers
    self._llm_client = llm_client

name instance-attribute

name = name

n_retries instance-attribute

n_retries = n_retries

T class-attribute instance-attribute

T = TypeVar('T', bound=BaseView)

add

add(view: Type[T], builder: Optional[Callable[[], T]] = None, name: Optional[str] = None) -> None

Register new View that will be available to query via the collection.

PARAMETER DESCRIPTION
view

A class inherithing from BaseView. Object of this type will be initialized during query execution. We expect Class instead of object, as otherwise Views must have been implemented stateless, which would be cumbersome.

TYPE: Type[T]

builder

Optional factory function that will be used to create the View instance. Use it when you need to pass outcome of API call or database connection to the view and it can change over time.

TYPE: Optional[Callable[[], T]] DEFAULT: None

name

Custom name of the view (defaults to the name of the class).

TYPE: Optional[str] DEFAULT: None

RAISES DESCRIPTION
ValueError

if view with the given name is already registered or views class possess some non-default arguments.

Example of custom builder usage

    def build_dogs_df_view():
        dogs_df = request.get("https://dog.ceo/api/breeds/list")
        return DogsDFView(dogs_df)

    collection.add(DogsDFView, build_dogs_df_view)
Source code in src/dbally/collection.py
def add(self, view: Type[T], builder: Optional[Callable[[], T]] = None, name: Optional[str] = None) -> None:
    """
    Register new [View](views/index.md) that will be available to query via the collection.

    Args:
        view: A class inherithing from BaseView. Object of this type will be initialized during\
        query execution. We expect Class instead of object, as otherwise Views must have been implemented\
        stateless, which would be cumbersome.
        builder: Optional factory function that will be used to create the View instance. Use it when you\
        need to pass outcome of API call or database connection to the view and it can change over time.
        name: Custom name of the view (defaults to the name of the class).

    Raises:
        ValueError: if view with the given name is already registered or views class possess some non-default\
        arguments.

    **Example** of custom `builder` usage

    ```python
        def build_dogs_df_view():
            dogs_df = request.get("https://dog.ceo/api/breeds/list")
            return DogsDFView(dogs_df)

        collection.add(DogsDFView, build_dogs_df_view)
    ```
    """
    if name is None:
        name = view.__name__

    if name in self._views or name in self._builders:
        raise ValueError(f"View with name {name} is already registered")

    non_default_args = any(
        p.default == inspect.Parameter.empty for p in inspect.signature(view).parameters.values()
    )
    if non_default_args and builder is None:
        raise ValueError("Builder function is required for views with non-default arguments")

    builder = builder or view

    # instantiate view to check if the builder is correct
    view_instance = builder()
    if not isinstance(view_instance, view):
        raise ValueError(f"The builder function for view {name} must return an instance of {view.__name__}")

    self._views[name] = view
    self._builders[name] = builder

get

get(name: str) -> BaseView

Returns an instance of the view with the given name

PARAMETER DESCRIPTION
name

Name of the view to return

TYPE: str

RETURNS DESCRIPTION
BaseView

View instance

RAISES DESCRIPTION
NoViewFoundError

If there is no view with the given name

Source code in src/dbally/collection.py
def get(self, name: str) -> BaseView:
    """
    Returns an instance of the view with the given name

    Args:
        name: Name of the view to return

    Returns:
        View instance

    Raises:
         NoViewFoundError: If there is no view with the given name
    """

    if name not in self._views:
        raise NoViewFoundError

    return self._builders[name]()

list

list() -> Dict[str, str]

Lists all registered view names and their descriptions

RETURNS DESCRIPTION
Dict[str, str]

Dictionary of view names and descriptions

Source code in src/dbally/collection.py
def list(self) -> Dict[str, str]:
    """
    Lists all registered view names and their descriptions

    Returns:
        Dictionary of view names and descriptions
    """
    return {
        name: (textwrap.dedent(view.__doc__).strip() if view.__doc__ else "") for name, view in self._views.items()
    }

ask async

ask(question: str, dry_run: bool = False, return_natural_response: bool = False) -> ExecutionResult

Ask question in a text form and retrieve the answer based on the available views.

Question answering is composed of following steps
  1. View Selection
  2. IQL Generation
  3. IQL Parsing
  4. Query Building
  5. Query Execution
PARAMETER DESCRIPTION
question

question posed using natural language representation e.g "What job offers for Data Scientists do we have?"

TYPE: str

dry_run

if True, only generate the query without executing it

TYPE: bool DEFAULT: False

return_natural_response

if True (and dry_run is False as natural response requires query results), the natural response will be included in the answer

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
ExecutionResult

ExecutionResult object representing the result of the query execution.

RAISES DESCRIPTION
ValueError

if collection is empty

IQLError

if incorrect IQL was generated n_retries amount of times.

ValueError

if incorrect IQL was generated n_retries amount of times.

Source code in src/dbally/collection.py
async def ask(self, question: str, dry_run: bool = False, return_natural_response: bool = False) -> ExecutionResult:
    """
    Ask question in a text form and retrieve the answer based on the available views.

    Question answering is composed of following steps:
        1. View Selection
        2. IQL Generation
        3. IQL Parsing
        4. Query Building
        5. Query Execution

    Args:
        question: question posed using natural language representation e.g\
        "What job offers for Data Scientists do we have?"
        dry_run: if True, only generate the query without executing it
        return_natural_response: if True (and dry_run is False as natural response requires query results),
                                 the natural response will be included in the answer

    Returns:
        ExecutionResult object representing the result of the query execution.

    Raises:
        ValueError: if collection is empty
        IQLError: if incorrect IQL was generated `n_retries` amount of times.
        ValueError: if incorrect IQL was generated `n_retries` amount of times.
    """
    start_time = time.monotonic()

    event_tracker = EventTracker.initialize_with_handlers(self._event_handlers)

    await event_tracker.request_start(RequestStart(question=question, collection_name=self.name))

    # select view
    views = self.list()

    if len(views) == 0:
        raise ValueError("Empty collection")
    if len(views) == 1:
        selected_view = next(iter(views))
    else:
        selected_view = await self._view_selector.select_view(question, views, event_tracker)

    view = self.get(selected_view)

    start_time_view = time.monotonic()
    view_result = await view.ask(
        query=question,
        llm_client=self._llm_client,
        event_tracker=event_tracker,
        n_retries=self.n_retries,
        dry_run=dry_run,
    )
    end_time_view = time.monotonic()

    textual_response = None
    if not dry_run and return_natural_response:
        textual_response = await self._nl_responder.generate_response(view_result, question, event_tracker)

    result = ExecutionResult(
        results=view_result.results,
        context=view_result.context,
        execution_time=time.monotonic() - start_time,
        execution_time_view=end_time_view - start_time_view,
        view_name=selected_view,
        textual_response=textual_response,
    )

    await event_tracker.request_end(RequestEnd(result=result))

    return result

get_similarity_indexes

get_similarity_indexes() -> Dict[AbstractSimilarityIndex, List[Tuple[str, str, str]]]

List all similarity indexes from all structured views in the collection.

RETURNS DESCRIPTION
Dict[AbstractSimilarityIndex, List[Tuple[str, str, str]]]

Dictionary with similarity indexes as keys and values containing lists of places where they are used

Dict[AbstractSimilarityIndex, List[Tuple[str, str, str]]]

(represented by a tupple containing view name, method name and argument name)

Source code in src/dbally/collection.py
def get_similarity_indexes(self) -> Dict[AbstractSimilarityIndex, List[Tuple[str, str, str]]]:
    """
    List all similarity indexes from all structured views in the collection.

    Returns:
        Dictionary with similarity indexes as keys and values containing lists of places where they are used
        (represented by a tupple containing view name, method name and argument name)
    """
    indexes: Dict[AbstractSimilarityIndex, List[Tuple[str, str, str]]] = {}
    for view_name in self._views:
        view = self.get(view_name)

        if not isinstance(view, BaseStructuredView):
            continue

        filters = view.list_filters()
        for filter_ in filters:
            for param in filter_.parameters:
                if param.similarity_index:
                    indexes.setdefault(param.similarity_index, []).append((view_name, filter_.name, param.name))
    return indexes

update_similarity_indexes async

update_similarity_indexes() -> None

Update all similarity indexes from all structured views in the collection.

RAISES DESCRIPTION
IndexUpdateError

if updating any of the indexes fails. The exception provides failed_indexes attribute, a dictionary mapping failed indexes to their respective exceptions. Indexes not present in the dictionary were updated successfully.

Source code in src/dbally/collection.py
async def update_similarity_indexes(self) -> None:
    """
    Update all similarity indexes from all structured views in the collection.

    Raises:
        IndexUpdateError: if updating any of the indexes fails. The exception provides `failed_indexes` attribute,
            a dictionary mapping failed indexes to their respective exceptions. Indexes not present in
            the dictionary were updated successfully.
    """
    indexes = self.get_similarity_indexes()
    update_corutines = [index.update() for index in indexes]
    results = await asyncio.gather(*update_corutines, return_exceptions=True)
    failed_indexes = {
        index: exception for index, exception in zip(indexes, results) if isinstance(exception, Exception)
    }
    if failed_indexes:
        failed_locations = [loc for index in failed_indexes for loc in indexes[index]]
        description = ", ".join(
            f"{view_name}.{method_name}.{param_name}" for view_name, method_name, param_name in failed_locations
        )
        raise IndexUpdateError(f"Failed to update similarity indexes for {description}", failed_indexes)

dbally.data_models.execution_result.ExecutionResult dataclass

ExecutionResult(results: List[Dict[str, Any]], context: Dict[str, Any], execution_time: float, execution_time_view: float, view_name: str, textual_response: Optional[str] = None)

Represents the collection-level result of the query execution.

PARAMETER DESCRIPTION
results

List of dictionaries containing the results of the query execution, each dictionary represents a row in the result set with column names as keys. The exact structure of the result set depends on the view that was used to execute the query, which can be obtained from the view_name attribute.

TYPE: List[Dict[str, Any]]

context

Dictionary containing addtional metadata about the query execution.

TYPE: Dict[str, Any]

execution_time

Time taken to execute the entire query, including view selection and all other operations, in seconds.

TYPE: float

execution_time_view

Time taken that the selected view took to execute the query, in seconds.

TYPE: float

view_name

Name of the view that was used to execute the query.

TYPE: str

textual_response

Optional text response that can be used to display the query results in a human-readable format.

TYPE: Optional[str] DEFAULT: None

results instance-attribute

results: List[Dict[str, Any]]

context instance-attribute

context: Dict[str, Any]

execution_time instance-attribute

execution_time: float

execution_time_view instance-attribute

execution_time_view: float

view_name instance-attribute

view_name: str

textual_response class-attribute instance-attribute

textual_response: Optional[str] = None

dbally.collection.IndexUpdateError

IndexUpdateError(message: str, failed_indexes: Dict[AbstractSimilarityIndex, Exception])

Bases: Exception

Exception for when updating any of the Collection's similarity indexes fails.

Provides a dictionary mapping failed indexes to their respective exceptions as the failed_indexes attribute.

PARAMETER DESCRIPTION
failed_indexes

Dictionary mapping failed indexes to their respective exceptions.

TYPE: Dict[AbstractSimilarityIndex, Exception]

Source code in src/dbally/collection.py
def __init__(self, message: str, failed_indexes: Dict[AbstractSimilarityIndex, Exception]) -> None:
    """
    Args:
        failed_indexes: Dictionary mapping failed indexes to their respective exceptions.
    """
    self.failed_indexes = failed_indexes
    super().__init__(message)

failed_indexes instance-attribute

failed_indexes = failed_indexes