Setting Up A Typesense Website Search Engine Setting Up A Typesense Website Search Engine

December 6, 2022

laravel typesense

I wanted to setup a search engine for the urls and pages for my londinium.com site. I looked a options and found typesense which is an opensource solution and looks interesting.

I installed it locally with these 2 commands

curl -O https://dl.typesense.org/releases/0.23.1/typesense-server-0.23.1-amd64.deb
sudo apt install ./typesense-server-0.23.1-amd64.deb

Now I can run the typesense server thus:

typesense-server  --data-dir=/var/lib/typesense --api-key=API_KEY_HERE

The API_KEY is created in the /etc/typesense/typesense-server.ini file

and to check the situation with this command.

systemctl status typesense-server.service

resulting in:

● typesense-server.service - Typesense Server
     Loaded: loaded (/etc/systemd/system/typesense-server.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2022-12-01 05:21:39 GMT; 7h ago
       Docs: https://typesense.org
   Main PID: 1096 (typesense-serve)
      Tasks: 91 (limit: 13630)
     Memory: 89.7M
        CPU: 46.755s
     CGroup: /system.slice/typesense-server.service
             └─1096 /usr/bin/typesense-server --config=/etc/typesense/typesense-server.ini

systemd[1]: Started Typesense Server.
typesense-server[1096]: Log directory is configured as: /var/log/typesense
typesense-server[1096]: E20221201 05:22:17.530272  1778 raft_server.h:62] Peer refresh failed, error: Doing another configuration change

and to run a status check.

curl http://localhost:8108/health
{"ok":true}

Using the php version of the setup, i am going to create a simple json for websites

<?php

$websiteSchema = [
  'name' => 'websites',
  'fields' => [
    ['name' => 'url', 'type' => 'string'],
    ['name' => 'title', 'type' => 'string'],
    ['name' => 'description', 'type' => 'string[]', 'facet' => true]
  ],
  'default_sorting_field' => 'url'
];

$client->collections->create($websiteSchema);

import some data with this json file. Jsonl is a form of json with multiple lines of json. this article explains

This is an example of 1 row in the json.

{"id":"100275575","nwr":"way","title":"Home | Dover Castle Hostel","description":"Dover Castle, The Dover Castle Hostel is in the perfect location to explore London for budget travellers and groups. Being so close to the heart of London just check in at Dover Castle and enjoy the lively atmosphere with the friendly staff","url":"/way/100275575"}

And to import it

$websitesData = file_get_contents('websites.jsonl');

$client->collections['websites']->documents->import($websitesData);

Now to test it, here is a command line curl command

curl -H "X-TYPESENSE-API-KEY: API_KEY_HERE" \
"http://localhost:8108/collections/websites/documents/search\
?q=dover&query_by=description"

Which returns this json result

{"facet_counts":[],"found":1,"hits":[{"document":{"description":"Dover Castle, The Dover Castle Hostel is in the perfect location to explore London for budget travellers and groups. Being so close to the heart of London just check in at Dover Castle and enjoy the lively atmosphere with the friendly staff","id":"100275575","nwr":"way","title":"Home | Dover Castle Hostel","url":"/way/100275575"},"highlights":[{"field":"description","matched_tokens":["Dover","Dover"],"snippet":"<mark>Dover</mark> Castle, The <mark>Dover</mark> Castle"}],"text_match":72341265420648449}],"out_of":100,"page":1,"request_params":{"collection_name":"websites","per_page":10,"q":"dover"},"search_cutoff":false,"search_time_ms":304}

and in laravel this is the controller

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;
use Typesense\Client;

class SearchController extends Controller
{
    public function index(Request $request)
    {
        echo "search for ";
        $q = $request->input('q');
        echo $q;

        $client = new Client(
            [
              'api_key'         => 'API_KEY_HERE',
              'nodes'           => [
                [
                  'host'     => 'localhost', // For Typesense Cloud use xxx.a1.typesense.net
                  'port'     => '8108',      // For Typesense Cloud use 443
                  'protocol' => 'http',      // For Typesense Cloud use https
                ],
              ],
              'connection_timeout_seconds' => 2,
            ]
        );

        $searchParameters = [
            'q'         => $q,
            'query_by'  => 'description'
          ];

        $result = $client->collections['websites']->documents->search($searchParameters);

        dd($result);

    }
}

which results in

array:8 [▼ // app/Http/Controllers/SearchController.php:37
  "facet_counts" => []
  "found" => 13
  "hits" => array:10 [▼
    0 => array:3 [▼
      "document" => array:5 [▶]
      "highlights" => array:1 [▶]
      "text_match" => 72341265420648449
    ]
    1 => array:3 [▼
      "document" => array:5 [▼
        "description" => "Dover Castle, The Dover Castle Hostel is in the perfect location to explore London for budget travellers and groups. Being so close to the heart of London just  ▶"
        "id" => "100275575"
        "nwr" => "way"
        "title" => "Home | Dover Castle Hostel"
        "url" => "/way/100275575"
      ]
      "highlights" => array:1 [▼
        0 => array:3 [▼
          "field" => "description"
          "matched_tokens" => array:1 [▶]
          "snippet" => "perfect location to explore <mark>London</mark> for budget travellers and"
        ]
      ]
      "text_match" => 72341265420648449
    ]
    2 => array:3 [▶]
    3 => array:3 [▶]
    4 => array:3 [▶]
    5 => array:3 [▶]
    6 => array:3 [▶]
    7 => array:3 [▶]
    8 => array:3 [▶]
    9 => array:3 [▶]
  ]
  "out_of" => 100
  "page" => 1
  "request_params" => array:3 [▼
    "collection_name" => "websites"
    "per_page" => 10
    "q" => "london"
  ]
  "search_cutoff" => false
  "search_time_ms" => 415
]

This is a great start, I am going to publish this and explore the options for displaying this resulting search json.

Questions/Ideas:

  1. Can i search on multiple fields, ie url, description and title at the same time.
  2. Adding the geosearch to the schema and json db, typesense.org geosearch
  3. Extracting the results from the json and the paging.
  4. looking at the javascript instantsearch

if anyone has any suggestions, links or tips, please comment below or via twitter


If you would like to contact me with this form on londinium.com, ilminster.net or via Twitter @andylondon