How-To Use Elastic to Store Similarity Index#
ElasticStore can be used as a store in SimilarityIndex. In this guide, we will show you how to execute a similarity search using Elasticsearch. In the example, the Elasticsearch engine is provided by the official Docker image. There are two approaches available to perform similarity searches: Elastic Search Store and Elastic Vector Search. Elastic Search Store uses embeddings and kNN search to find similarities, while Elastic Vector Search, which performs semantic search, uses the ELSER (Elastic Learned Sparse EncodeR) model to encode and search the data.
Prerequisites#
Download and deploy the Elasticsearch Docker image. Please note that for Elastic Vector Search, the Elasticsearch Docker container requires at least 8GB of RAM and license activation to use Machine Learning capabilities.
docker network create elastic
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.13.4
docker run --name es01 --net elastic -p 9200:9200 -it -m 2GB docker.elastic.co/elasticsearch/elasticsearch:8.13.4
Copy the generated elastic password and enrollment token. These credentials are only shown when you start Elasticsearch for the first time once. You can regenerate the credentials using the following commands.
docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .
curl --cacert http_ca.crt -u elastic:$ELASTIC_PASSWORD https://localhost:9200
To manage elasticsearch engine create Kibana container.
By default, the Kibana management dashboard is deployed at link
For vector search, it is necessary to enroll in an appropriate subscription level or trial version that supports machine learning.
Additionally, the ELSER model must be downloaded, which can be done through Kibana. Instructions can be found in the hosted Kibana instance under tabs:
downloading and deploying model - Analytics -> Machine Learning -> Trained Model,
vector search configuration - Search -> Elastic Search -> Vector Search.
- Install elasticsearch extension
Implementing a SimilarityIndex#
To use similarity search it is required to define data fetcher and data store.
Data fetcher#
class DummyCountryFetcher(SimilarityFetcher):
async def fetch(self):
return ["United States", "Canada", "Mexico"]
Data store#
Elastic store similarity search works on embeddings. For create embeddings the embedding client is passed as an argument. You can use one of dbally embedding clients, such as LiteLLMEmbeddingClient.
from dbally.embeddings.litellm import LiteLLMEmbeddingClient
embedding_client=LiteLLMEmbeddingClient(api_key="your-api-key")
to define your ElasticsearchStore
.
from dbally.similarity.elasticsearch_store import ElasticsearchStore
data_store = ElasticsearchStore(
index_name="country_similarity",
host="https://localhost:9200",
ca_cert_path="path_to_cert/http_ca.crt",
http_user="elastic",
http_password="password",
embedding_client=embedding_client,
),
After this setup, you can initialize the SimilarityIndex
from dbally.similarity import SimilarityIndex
country_similarity = SimilarityIndex(
fetcher=DummyCountryFetcher(),
store=data_store
)
and update it and find the closest matches in the same way as in built-in similarity indices
You can then use this store to create a similarity index that maps user input to the closest matching value. To use Elastic Vector search download and deploy ELSER v2 model and create ingest pipeline. Now you can use this index to map user input to the closest matching value. For example, a user may type 'United States' and our index would return 'USA'.