Using LangChain with Nuclia

LangChain is a great Python framework leveraging language models to build powerful NLP applications.

In this post, we will see how LangChain can be used to collect data from various sources and make them searchable using Nuclia.

Install

First, you need to install LangChain and Nuclia.

pip install langchain
pip install nuclia

You also need to create a free Nuclia account on nuclia.cloud.

Once your account is created, go to your Knowledge Box dashboard and navigate to the API keys entry on the left menu to generate an API key. Make sure to select the Contributor role.

In this example, we will use LangChain to collect data from Wikipedia and make it searchable using Nuclia.

Let’s first install the required dependency.

pip install wikipedia

Then, we can create a simple script that will collect information about Hedy Lamarr in Wikipedia using the LangChain’s Wikipedia document loader (named WikipediaLoader) and push the extracted texts to Nuclia.

(If you do not know who is Hedy Lamarr, that’s the perfect opportunity to learn about her, she is my favorite inventor, and her life is fascinating!)

from langchain.document_loaders import WikipediaLoader
from langchain.vectorstores.nucliadb import NucliaDB

API_KEY = "YOUR-API-KEY"

documents = WikipediaLoader(query="Hedy Lamarr").load()

ndb = NucliaDB(knowledge_box="YOUR-KNOWLEDGE-BOX-ID", local=False, api_key=API_KEY)
ndb.add_texts([doc.page_content for doc in documents])

Once the script is executed, you can go to your Knowledge Box dashboard and see the documents that have been added in the Resources section.

It may take a few minutes for the documents to be indexed and searchable.

Once they are indexed, you can use LangChain to search among all the indexed paragraphs.

from langchain.document_loaders import WikipediaLoader
from langchain.vectorstores.nucliadb import NucliaDB

API_KEY = "YOUR-API-KEY"

ndb = NucliaDB(knowledge_box="YOUR-KNOWLEDGE-BOX-ID", local=False, api_key=API_KEY)

results = ndb.similarity_search("What did Hedy Lamarr invent?", 10)
print([result for result in results])

Indexing a website from its sitemap

In this example, we will use LangChain to index a full website in Nuclia.

LangChain has a SitemapLoader that can be used to extract all the URLs from a website sitemap.

It requires lxml to parse the sitemap. Let’s install it:

pip install lxml

This script will load all the pages mentioned in the sitemap and push them to Nuclia.

from langchain.document_loaders.sitemap import SitemapLoader
from langchain.vectorstores.nucliadb import NucliaDB

API_KEY = "YOUR-API-KEY"

sitemap_loader = SitemapLoader(web_path="https://nuclia.com/post-sitemap.xml")
documents = sitemap_loader.load()

ndb = NucliaDB(knowledge_box="YOUR-KNOWLEDGE-BOX-ID", local=False, api_key=API_KEY)
ndb.add_texts([doc.page_content for doc in documents])

Related articles

Nuclia’s latest articles and updates, right in your inbox

Pick up the topics you are the most interested in, we take care of the rest!

Want to know more?

If you want to lear more and how we can help you to implement this, please use this form or join our community on Discord for technical support .

See you soon!