Using MeiliSearch with Pelican

Date 2020-07-03 Modified 2020-08-24
By Kappa Category webdev
Tags meilisearch search pelican static site generator ssg blog
Estimated Read Time: 14 min read
Ninja Level:

Photo credit: SI Janko Ferlic (Unsplash license)

In my previous post, I mentioned about MeiliSearch. Let's have a short article on how to integrate it with my theme.

Edit: This post was edited for MeiliSearch version 0.13.

Part I: MeiliSearch

Meilisearch is written in Rust. Possible solutions for hosting it are VPS, serverless hosting and docker. Here is the link for official installation guide.

Steps for setting up Rust

Get a release binary from the official web site
Start up the service

./meilisearch-linux-amd64 --db-path ./meili.database --http-addr 0.0.0.0:7700 \
--no-analytics true --no-sentry true --master-key "123-your-own-master-key" \
--env production

There are three keys: master key, public key and private key. Specifying the master key and the public key and private key would be generated. Public key and private key are not shown in the console log if you start it in production mode. There are two ways to get the public key and private key:

Start it as development mode to get the keys from the console log
After starting in production mode, perform a query to get the public and private keys by providing your master key, like below

curl \
-H "X-Meili-API-Key: 123-your-own-master-key"
-X GET 'http://localhost:7700/keys'

Each key has specific usage.

The master key is used for starting the MeiliSearch server
The public key is used by clients to authenticate with master server, for search queries. docs-searchbar.js would use this key
The private key is used by developers to perform administrative tasks for MeiliSearch server. You should use the docs-scraper with this key

Part II: docs-scraper

docs-scraper is written in Python. Here is the link for the official installation guide. It supports installation with pipenv or Docker.

I would use pipenv to use it:

cd ~
python -m venv docs-scraper-venv
source ~/docs-scraper-venv/bin/activate
pip install pipenv
cd ~
git clone https://github.com/meilisearch/docs-scraper.git docs-scraper
cd docs-scraper

# Note:
# The default Pipfile specified Python 3.6 for docs-scraper < v0.10.1
# My distro uses Python 3.8.
# Luckily, I got it working by changing the python version in Pipfile
# from 3.6 to 3.8:
# For docs-scraper version 0.10.1, it had already specified Python version 3.8
[requires]
python_version = "3.8"

# Then run the installation command
pipenv install

# You need to configure these environment variables
export MEILISEARCH_API_KEY=Private-key-configured-in-Part-1
export MEILISEARCH_HOST_URL='http://127.0.0.1:7700'
pipenv run ./docs_scraper <path-to-your-config-file>

Here is a sample config for the docs_scraper.conf that I am using for my website.

{
  "index_uid": "docs",
  "start_urls": [
    "https://www.kappawingman.com"
  ],
  "stop_urls": [
    "https://www.kappawingman.com/archives/",
    "https://www.kappawingman.com/posts/2020/",
    "https://www.kappawingman.com/tags/",
    "https://www.kappawingman.com/categories/",
    "https://www.kappawingman.com/category/",
    "https://www.kappawingman.com/authors/",
    "https://www.kappawingman.com/author/kappa/"
  ],
  "selectors": {
    "lvl0": {
      "selector": ".navbar-nav .active",
      "global": true,
      "default_value": "Kappa ICT Wingman"
    },
    "lvl1": "#content h1",
    "lvl2": "#content h2",
    "text": "#content p, #content li"
  },
  "custom_settings": {
    "synonyms": {
      "static site generator": [
        "ssg"
      ],
      "ssg": [
        "static site generator"
      ]
    },
    "stopWords": [
      "a", "and", "as", "at", "be", "but", "by",
      "do", "does", "doesn't", "for", "from",
      "in", "is", "it", "no", "nor", "not",
      "of", "off", "on", "or",
      "so", "should", "than", "that", "that's", "the",
      "then", "there", "there's", "these",
      "this", "those", "to", "too",
      "up", "was", "wasn't", "what", "what's", "when", "when's",
      "where", "where's", "which", "while", "who", "who's",
      "with", "won't", "would", "wouldn't"
    ]
  },
  "scrap_start_urls": false,
  "nb_hits": 232
}

Thanks to Clémentine, she explained to me about how to configure the docs scraper and provided some reference configuration. The related discussion is in here.

The above config is specific to my theme. It relies on the '.navbar-nav .active' classes. These are the active item in the top navigation bar, used as the selector.

Edit: For docs-scraper version 0.10.1. If you have problem running it like:

$ pipenv run ./docs_scraper docs_scraper-config.json
Courtesy Notice: Pipenv found itself running within a virtual environment, so it will automatically use that environment, instead of creating its own for any project. You can set PIPENV_IGNORE_VIRTUALENVS=1 to force pipenv to ignore that environment and create its own instead. You can set PIPENV_VERBOSITY=-1 to suppress this warning.
2020-08-07 10:02:57 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.kappawingman.com> (referer: None)
Traceback (most recent call last):
File "/home/username/venv/docs-scraper-0.10.1/lib64/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/username/venv/docs-scraper-0.10.1/lib64/python3.6/site-packages/scrapy/spiders/init.py", line 93, in parse
raise NotImplementedError('{}.parse callback is not defined'.format(self.class.name))
NotImplementedError: DocumentationSpider.parse callback is not defined

The problem is discussed in Github issue #61.

Here's my workaround. Use Scrapy version < 2.3.0

$ pip uninstall Scrapy
$ pip install Scrapy=2.2.1

Part III: docs-searchbar.js

docs-searchbar.js is written in JavaScript. It is used in the front end or used by the client browser.

The docs-searchbar.js project website provides a simple example on how to use it. I had integrated it into my theme, the related source code or template is in GitHub.

{% if MEILISEARCH %}

<form>
  <div class="form-group">
<input type="search" class="form-control mt-3" id="search-bar-input"
  placeholder="Search"/>
    <script>
      docsSearchBar({
        hostUrl: "{{ MEILISEARCH_SERVER_URL }}",
        apiKey: "{{ MEILISEARCH_API_KEY }}",
        indexUid: "{{ MEILISEARCH_INDEX_UID }}",
        inputSelector: "#search-bar-input",
        debug: true // Set debug to true if you want to inspect the dropdown
    });
    </script>
  </div>
</form>
{% endif %}

Actually, the template was included in the base.html template. If the parameter MEILISEARCH is enabled or not empty (in pelicanconf.py), this template would be included. Also, the related JavaScript and CSS for doc-searchbar.js from CDN would also be included.

For pelicanconf.py, you need:

MEILISEARCH = True
MEILISEARCH_SERVER_URL = 'http://YourMeilisearchServer:7700'
MEILISEARCH_API_KEY = 'Put your public key here that we had mentioned in above'
# INDEX_UID should match with the one you used with the docs scraper
MEILISEARCH_INDEX_UID = 'docs'

For the SERVER_URL, point to your MeiliSearch Server. For production usage, it is recommended to enable HTTPS and use the standard TCP port 443.
You should put your public key we had mentioned before into the template that call docs-seacherbar.js. You should not expose your master key or private key.

Now regenerate your static web pages in Pelican. You should be able to use the MeiliSearch. MeiliSearch is instant search and searches as you type.

Using MeiliSearch with Pelican

Part I: MeiliSearch

Part II: docs-scraper

Part III: docs-searchbar.js

Further reading