Next.js is a React framework that allows you to build server-side rendered applications with React. It is a great tool to build static websites, but it also allows you to build dynamic websites, with server-side rendering and static generation.
What are your options if you want to offer a search feature in your application?
With Next.js and NodeJS, you can definitely implement a very naive search engine able to find a word in any page of your website. But if you expect anything smarter than just exact word matching and case-insensitiveness, it will be tough. Now, imagine if your web site contains videos, PDFs or audio files and you want their content to be searchable as well… That’s where Nuclia comes in!
Nuclia is an API able to index and process any kind of data, including audio and video files, to boost applications with powerful search capability. Nuclia uses natural language processing and machine learning to understand the searcher’s intent and return results that are more relevant to the searcher’s needs.
Using the Nuclia widget: as simple as copy/paste
To add Nuclia search feature to your Next.js application, you need to create a Nuclia account. You can do it here.
Nuclia manages contents in knowledge boxes. When creating your account, Nuclia automatically creates a default knowledge box for you.
After completing the account creation, you will be redirected to the Nuclia dashboard where you can manage your knowledge box. As you want to allow visitors to run search on your website, you must make your knowledge box public. To do so, click on the Publish button on the top right of the page.
If you go to the “Widgets” entry in the left menu, you can create a new search widget for your Next.js application.
Let’s call it nextjs-search-widget
.
Change the mode from input
to form
, and save it.
It generates a code snippet similar to:
<script src="https://cdn.nuclia.cloud/nuclia-widget.umd.js"></script>
<nuclia-search
knowledgebox="YOU-KNOWLEDGE-BOX-ID"
zone="europe-1"
widgetid="nextjs-search-widget"
type="form"
></nuclia-search>
You just need to copy/paste it in your Next.js application to get your search feature up and running.
Indexing page contents automatically
As your knowledge box is empty, this is for now a very disappointing feature 🙂
Of course, you can use the Nuclia Dashboard to index files or web pages by yourself. That’s nice for testing purpose, but it would be much better if you could index your Next.js pages automatically.
Let’s do a NodeJS script that will collect all Markdown files in your Next.js application and index them in your knowledge box.
First you need to install the Nuclia SDK and the dependencies allowing to use it in NodeJS:
npm install @nuclia/core localstorage-polyfill isomorphic-unfetch
# OR
yarn add @nuclia/core localstorage-polyfill isomorphic-unfetch
Here is how a typical NodeJS script can use the Nuclia API:
const { Nuclia } = require("@nuclia/core");
require("localstorage-polyfill");
require("isomorphic-unfetch");
const nuclia = new Nuclia({
backend: "https://nuclia.cloud/api",
zone: "europe-1",
knowledgeBox: "<YOUR-KB-ID>",
apiKey: "<YOUR-API-KEY>",
});
// code to push data to Nuclia (detailed later)
As you can see, you need to provide a Nuclia API key. An API key is necessary when adding or modifying contents in a knowledge box. You can get your API key in the Nuclia Dashboard, in the “API keys” section:
- Create a new Service Access (name it
nodejs-upload
for example) with Contributor role - Click on the
+
sign to generate a new token for this service access - Copy the generated token and paste it in your NodeJS script
Then you can write a script named upload-posts.js
that will index all the Markdown files from ./pages/posts
:
const fs = require("fs");
const path = require("path");
const { Nuclia } = require("@nuclia/core");
require("localstorage-polyfill");
require("isomorphic-unfetch");
const nuclia = new Nuclia({
backend: "https://nuclia.cloud/api",
zone: "europe-1",
knowledgeBox: "<YOUR-KB-ID>",
apiKey: "<YOUR-API-KEY>",
});
const uploadPosts = (kb) => {
// Get posts
const postsDir = path.join(process.cwd(), "pages", "posts");
const posts = fs.readdirSync(postsDir);
posts
.filter((post) => post.endsWith(".mdx"))
.forEach((post) => {
const postPath = path.join(postsDir, post);
const postContent = fs.readFileSync(postPath, "utf8");
const postTitle = postContent.split("\n")[0].replace("# ", "");
const postSlug = post.replace(".mdx", "");
// Upload post to Nuclia
const resource = {
title: postTitle,
slug: postSlug,
texts: {
text: {
format: "MARKDOWN",
body: postContent,
},
},
};
kb.createResource(resource).subscribe({
next: () => console.log(`Uploaded ${postSlug} to Nuclia`),
error: (err) => console.error(`Error with ${postSlug}`, err),
});
});
};
nuclia.db.getKnowledgeBox().subscribe((kb) => uploadPosts(kb));
This script does the following:
- iterate on the
.mdx
files in./pages/posts
, - for each file, extract its markdown content, and get its title from its first line,
- and then upload it to Nuclia using the
createResource
method.
You can run this script with:
node upload-posts.js
Now if you check your Nuclia Dashboard, you should see your posts uploaded in your knowledge box!
If they are marked with a yellow dot in the resource list, it means that they are still being processed. When the processing is fully done, the dot will turn green and the corresponding resource can be searched from the search widget.
Indexing external links and media files
Nuclia can index any kind of data, not just text. Let’s say you have some posts containing links to YouTube videos or to local PDF files.
It would be nice to make their content searchable too.
So what about finding in the blog posts any link to local files or to external web pages and index them?
First you need to find links in markdown files. They are always written like [some-title](some-url)
. So you can use a regular expression to extract them:
const markdownLinks = [...postContent.matchAll(/\[.*?\]\((.*?)\)/g)].map(
(match) => match[1]
);
Then we have 2 cases:
- The link starts with
http
: it is an external link, so you will add it to the resource as alink
field. - The link starts with
/media
: it is a media file, so you will add it to the resource as afile
field.
link
fields can be added directly in the creation payload just like you did with the text
field in the previous step:
const links = markdownLinks
.filter((link) => link.startsWith("http"))
.reduce((all, link, index) => {
all[`link-${index}`] = { uri: link };
return all;
}, {});
const resource = {
title: postTitle,
slug: postSlug,
texts: {
text: {
format: "MARKDOWN",
body: postContent,
},
},
links,
};
Note: the nice thing about link
fields is Nuclia will automatically choose the right thing to index: if it is a regular web page, it indexes its text content, but if it is a YouTube video, it will index the video itself, not the YouTube page.
At the contrary, file
fields cannot be added directly because they are binaries, so you need to get the resource once created and then use its upload()
method.
As it involves asynchronous operations, you need to install rxjs
:
npm install rxjs
# OR
yarn add rxjs
Then you can write the following code:
const localFiles = markdownLinks.filter((link) => link.startsWith("/medias"));
kb.createResource(resource, true).pipe(
switchMap((data) =>
localFiles.length > 0
? kb.getResource(data.uuid, [], []).pipe(
switchMap((resource) =>
forkJoin(
localFiles.map((file) => {
const filePath = path.join(process.cwd(), "public", file);
const fileContent = fs.readFileSync(filePath).buffer;
const fileName = file.split("/").pop();
return resource.upload(fileName, fileContent);
})
)
)
)
: of(true)
)
);
And now, by running the script again, you should see your posts with their links and media files indexed in Nuclia.
That’s it! You can now search in your blog posts from the search widget!
Implementing a custom search component
Ok, so far you have seen how to use the Nuclia SDK to index data in Nuclia and you use the Nuclia search widget to provide a search experience to your users.
But what if this widget is not the perfect fit for your app?
You can definitely implement your own search component with React and use the Nuclia SDK to query the API.
Let’s create a new component in ./components/Search.js
containing a minimal search input and a list of results (see full code example at the end of the article).
You can access your knowledge box with:
const kb = new Nuclia({
backend: "https://nuclia.cloud/api",
zone: "europe-1",
knowledgeBox: "<YOUR-KB-ID>",
}).knowledgeBox;
Note that you do not need an API key here because you are not going to create or update any resource (and actually you should never put a contributor key in your client code).
Then you can use the search()
method to query the API from the onChange
handler of your input field
const [query, setQuery] = useState("");
const [results, setResults] = useState([]);
const onChange = useCallback((e) => {
const query = e.target.value;
setQuery(query);
kb.search(query).subscribe((results) =>
setResults(
results?.sentences?.results.map((result) => ({
text: result.text,
title: results.resources[result.rid]?.title || "No title",
})) || []
)
);
}, []);
Note that the search()
method returns an Observable
so you need to subscribe to it to get the results. But Nuclia SDK also provides Promises via the asyncKnowledgeBox
wrapper if you prefer.
Now you can display the results in your component:
<div className={styles.container}>
{results.map((result, index) => (
<div key={`result-${index}`}>
<h4>{result.title}</h4>
<p>{result.text}</p>
</div>
))}
</div>
Enriching the search results
In the previous step, you have just displayed the sentences returned by the Nuclia API. These sentences are the ones matching semantically your query (“semantically” means they have been picked by the Nuclia search engine not because they contains the query words but because their meaning is close to your query meaning).
But Nuclia offers a lot more information about the resources that can be useful to display in your search results. For example, it returns paragraphs (which are the fuzzy search results, so the results matching your query words even if they are mispelled or derivated); it returns named entities (a.k.a. NER) which are the important concepts mentioned in the resources (people, dates, places, organizations, etc.); it also returns relations between resources (based on their entities); thumbnails, and many other interesting metadata, and all of that is automatically extracted from your content as soon as it is pushed to the Nuclia API.
Let’s play a bit with the entities.
Change the onChange
method that way:
const onChange = useCallback((e) => {
const query = e.target.value;
setQuery(query);
kb.search(query, [], {
show: ["basic", "extracted"],
extracted: ["text", "metadata"],
}).subscribe((results) => {
const sentences = results?.sentences?.results.map((result) => ({
text: result.text,
rid: result.rid,
}));
const resultsByRID = (sentences || []).reduce((acc, result) => {
if (!acc[result.rid]) {
const resource = new ReadableResource(results?.resources[result.rid]);
const ner = resource.getNamedEntities();
acc[result.rid] = {
title: resource.title,
sentences: [result.text],
ner,
};
} else {
acc[result.rid].sentences.push(result.text);
}
return acc;
}, {});
setResults(Object.values(resultsByRID));
});
}, []);
What’s new here?
- First, you call the
search()
method with options so it retrieves metadata (because NERs are part of these metadata). - Then, you iterate over the sentences and group them by resource ID (because a resource can have multiple sentences matching your query).
- For each resource, you create a
ReadableResource
instance (which is a wrapper around the raw resource data returned by the API) and you call thegetNamedEntities()
method to get the NERs of the resource.
Now you are able to display the NERs in your results!
Conclusion
In this article, you have seen how to integrate Nuclia in a Next.js site just by copy/pasting the search widget snippet, how to use the Nuclia SDK to index markdown contents and their related links and media files in Nuclia, and how to implement a custom search component with React.
There are plenty of other exciting things you can do with Nuclia, so don’t hesitate to check the Nuclia documentation if you want to know more!
The full code example discussed here is available on GitHub.