Writing full text search in Javascript for Next.js static site

There are so many ways to create search functionality for next.js web application. But today we are going to implement our own search engine for next.js static site. We will use inverted index for indexing the data.

Inverted index

If you have been working with Elasticsearch, or Apache Solr you probably have been heard about inverted index, it's data structure to implement full text search. So basically with inverted index we are storing a list of word that could possibly used to query our documents.

So let's imagine we have this list of documents.

[
    {
        "title":"Working with inverted index for full text search"
    },
    {
        "title":"How to use inverted index for simple website"
    }
]

If we use traditional way to perform search to this document the step would be.

  • Loop to documents
  • Check for every word to match with the query
  • Return document with matching query

The code would be like this

function traditional_search(query) {
    let result = []
    documents.forEach(item => {
        if (item.title.includes(query)) {
            result.push(item)
        }
    })
    return result
}

Using this implementation solved our problem of finding documents with matching queries "How". However, what if we need to search for other attributes, for example, there is only title but what if there are five more attributes that need scanning and the document contains more than 100 items? This solution doesn't scale.

Searching with Inverted Index

We can simplify this process with inverted index data structure. So basically inverted index is HashMap data structure that store reference to document by word.

Let's extract this documents to inverted index with Javascript.

let index = {}

function add(str, id) {
    if (!index[str]) {
        index[str] = [id]
    } else {
        index[str].push(id)
    }
}

documents.forEach((item, index) => {
    item.title.split(" ").forEach(word => {
        add(word, index + 1)
    })
})

The inverted index for this document will look like this.

{
    "Working":[1],
    "with":[1],
    "inverted":[1,2],
    "index":[1,2],
    "for":[1,2],
    "full":[1],
    "text":[1],
    "search":[1],
    "How":[2],
    "to":[2],
    "use":[2],
    "simple":[2],
    "website":[2]
}

By using this index, we do not have to loop over each attribute data in each document in order to find out if there are matches to our query in the documents.

Now we can implement simple search to this documents.

function search(query) {
    let result = []
    query.split(" ").forEach(word => {
        let res = index[word]
        if (res) {
            res.forEach(id => {
                let doc = documents[id-1]
                if (doc) {
                    result.push(doc)
                }
            })
        }
    })
    return result
}

const result = search("How")

// The result
[{"title":"How to use inverted index for simple website"}]

So we get search working perfectly with this inverted index, but there is a problem to this implementation. Right now we can search for example using keyword "How" and we can find document reference for that word, but if we search using query "Ho" it will not return any document because we don't have that word in our index.

To solve this issue you can just adding more word to the index for example you can split the string to n-gram string.

"How" => ["Ho", "ow" ...]

Using Inverted Index in Next.js

As I mentioned before, we will implement this search engine in next.js. In fact, I use this search engine on this page. Since this blog is static site it is make sense to generate the index on build time and the load it the page as json.

So here is example data for my cheatsheet page index.

[
  {
    "lang": "go",
    "slug": "ParseFloat64",
    "id": 1
  },
  ...
]

And the index.

{
  "pa": [1],
  "ar": [1,4],
  "rs": [1],
  "fl": [1],
  "lo": [1],
  "oa": [1],
  "fo": [2],
  ...
]

As we have the data now we can just import to the component.

import store from "../../../cheatsheets/store.json"
import index from "../../../cheatsheets/index.json"
import indexer from "../../lib/index"
indexer.loadIndex(store, index)

As the data is fetched locally, we do not need to debounce the event from the input form.

const [result, search] = useState([])

<input onChange={(e) => {
        search(() => {
            return indexer.search(e.target.value)
        })
    }}
/>

That's it for now, let me know if you have any questions. Thank you.