Book a demo

NucliaDB, the open-source AI search database.

NucliaDB is the open-source multi-modal database built by Nuclia specially designed for AI Search and RAG (Retrival augmented generation). 

NucliaDB, unlike other vector databases, is ultra-focused on knowledge search on top of unstructured data. Within NucliaDB, you’ll find four different indexes: a full-text index, a paragraph index, a Knowledge Graph, and a Vector index.

At query time, NucliaDB delivers results based on these four indexes, ensuring the best possible outcomes.


You can store data in your own 100% isolated NucliaDB instance, that acts as a centralized knowledge repository that can power diverse AI applications for your unstructured data.

NucliaDB guarantees 100% data governance and can be deployed in Nuclia’s cloud or your own infrastructure. Both solutions offer complete control over your data and are 100% secure.

Get started!

Getting started with NucliaDB is easy. You can install it locally using docker or pip, and once it’s up and running, you can start using it by installing the nucliadb-dataset and nucliadb-sdk libraries. 

1. Install NucliaDB and run it locally

pip install nucliadb
nucliadb

2. Create your first KnowledgeBox

A KnowledgeBox is a data container in NucliaDB.  with just a few lines of code, and start filling it with data.

from nucliadb_sdk.utils import create_knowledge_box

my_kb = create_knowledge_box("my_new_kb")

3. Upload data

from nucliadb_sdk.knowledgebox import KnowledgeBox
from sentence_transformers import SentenceTransformer

encoder = SentenceTransformer("all-MiniLM-L6-v2")
resource_id = my_kb.upload(
       key="mykey1",
       binary=File(data=b"asd", filename="data"),
       text="I'm Sierra, a very happy dog",
       labels=["emotion/positive"],
       entities=[Entity(type="NAME", value="Sierra", positions=[(4, 9)])],
       vectors={"all-MiniLM-L6-v2": encoder.encode(["I'm Sierra, a very happy dog"])[0]},
   )
 uknowledgebox[resource_id] == knowledgebox["mykey1"]

4. Search

4.1.  Semantic search

from nucliadb_sdk.knowledgebox import KnowledgeBox
from sentence_transformers import SentenceTransformer

encoder = SentenceTransformer("all-MiniLM-L6-v2")

query_vectors = encoder.encode(["To be in love"])[0]

results = my_kb.search(vector = query_vectors, vectorset="all-MiniLM-L6-v2",min_score=0.25)

Iterate over the results: 

for result in results:
print(f"Text: {result.text}")
print(f"Labels: {result.labels}")
print(f"Score: {result.score}")
print(f"Key: {result.key}")
print(f"Score Type: {result.score_type}")
print("------")

The results: 

Text: love is tough
Labels: ['negative']
Score: 0.4688602387905121
Key: a027ee34f3a7489d9a264b9f3d08d3a5
Score Type: COSINE
------
Text: he is heartbroken
Labels: ['negative']
Score: 0.27540814876556396
Key: 25bc7b22b4fb4f64848a1b7394fb69b1
Score Type: COSINE

4.2. Full text search

from nucliadb_sdk.knowledgebox import KnowledgeBox

results = my_kb.search(
    text="dog"
    )

Iterate over the results: 

for result in results:
print(f"Text: {result.text}")
print(f"Labels: {result.labels}")
print(f"Score: {result.score}")
print(f"Key: {result.key}")
print(f"Score Type: {result.score_type}")

Get results: 

Resource key: 4f1f570398c543e0b8c3b86e87ee2fbd
Text: Dog in catalan is gos
Score type: BM25
Score: 0.8871671557426453
Labels: ['neutral']
Resource key: 665e85f0fb2e4b2fbde8b4957b7462c1
Text: I'm Sierra, a very happy dog
Score type: BM25
Score: 0.7739118337631226
Labels: ['positive']

4.3. Search by label

results = my_kb.search(
    filter=["emotion/positive"]
)

Get results:

for result in results:
print(f"Resource key: {result.key}")
print(f"Text: {result.text}")
print(f"Labels: {result.labels}")

Results:

Resource key: f1de1c1e3fac43aaa53dcdc54ffd07fc
Text: I'm Sierra, a very happy dog
Labels: ['positive']
Resource key: b445359d434b47dfb6a37ca45c14c2b3
Text: what a delighful day
Labels: ['positive']

Main features

db__v copia 22

It’s a cloud-native database

Install NucliaDB in multiple cloud storage providers such
Amazon S3, Google Cloud Storage, Azure File Storage, or
Alibaba file cloud storage.

db__v copia 21

Ultra-high read performance

NucliaDB offers an ultra-high read performance to provide queries at scale.

db__v copia 20

Multimodel indexing

One database, multiple indexes.

db__v copia 19

Open Source

NucliaDB is an open-source project open to external developers.