Full Text Search & Relevance ranking in Ktor using PostgreSQL & Exposed
Preface
While working on a project of my own, I found myself requiring a “global” search functionality for the mobile app frontend. Since I was already using Ktor with Exposed & PostgreSQL, I decided to look into Postgres’s Full Text Search. Through my research, I found that I had to basically do two things:
Use a QueryBuilder to create the “@@ to_tsquery” Postgres specific query for the FTS and
Implement a custom Exposed column with the appropriate operators in order to keep using Exposed’s DSL and avoiding raw queries for basic column operations.
For the rest of the article, we are going to be using as an example a research lab application, so all table-related operations, triggers & data will be based off of that.
Here is a quick example of a lab definition in Kotlin:
data class ResearchLab(
val id: Int? = null,
val name: String,
val fek: String,
val website: String = "",
val scientificField: String,
val scientificSubfield: String,
val department: String = "",
val division: String = "",
val groupLeader: String = "",
val email: String = "",
val leaderUrl: String = "",
val url: String = "",
val createdAt: Instant? = null,
val updatedAt: Instant? = null
)With all that out of the way, let’s jump straight into the implementation details!
PostgreSQL Setup
First and foremost, we will need to create a ts_vector table in our database that will hold the searchable content for each lab record, along with the relevant lab record ID.
CREATE TABLE IF NOT EXISTS public.labs_tsvector
(
lab_id integer NOT NULL,
tsvector_content tsvector,
CONSTRAINT lab_tsvector_pkey PRIMARY KEY (lab_id)
);Once we have the table, we need a way to keep the searchable data always up-to-date as the various columns in the original labs table change. To do that, we define a trigger function that runs on every insert & update operation in order to recalculate the ts_vector content.
-- Create trigger functions to automatically update the tsvector columns
CREATE OR REPLACE FUNCTION labs_tsvector_trigger_function()
RETURNS trigger AS $$
BEGIN
INSERT INTO labs_tsvector (lab_id, tsvector_content)
VALUES (NEW.id, to_tsvector('english', coalesce(NEW.name, '') || ' ' || coalesce(NEW.group_leader, '') || ' ' || coalesce(NEW.scientific_field, '') || ' ' || coalesce(NEW.department, '') || ' ' || coalesce(NEW.fek, '')))
ON CONFLICT (lab_id) DO UPDATE
SET tsvector_content = EXCLUDED.tsvector_content;
RETURN NEW;
END
$$ LANGUAGE plpgsql;
-- Trigger for labs table
CREATE OR REPLACE TRIGGER labs_tsvector_update
AFTER INSERT OR UPDATE ON labs
FOR EACH ROW
EXECUTE FUNCTION labs_tsvector_trigger_function();Exposed Setup
The database part is now complete and the “real” implementation begins! Since Exposed does not provide such a database-specific feature, such as FTS, we are going to have to figure out a way to simply define the tsvector_content in the labs_tsvector table as a new column type.
Custom TsVector Column
In its essence, the ts_vector content that we store is something like a map of lexical tokens, and so we could treat our column as a String type to simplify our code. Here is the implementation for such a custom column type in Exposed:
import org.jetbrains.exposed.sql.ColumnType
import org.postgresql.util.PGobject
private const val TS_VECTOR_SQL_TYPE = "tsvector"
class TsVectorColumnType : ColumnType<String>(nullable = true) {
override fun sqlType(): String = TS_VECTOR_SQL_TYPE
override fun valueFromDB(value: Any): String? {
return value as String
}
override fun valueToDB(value: String?): Any? {
return PGobject().apply {
type = TS_VECTOR_SQL_TYPE
this.value = value
}.value
}
override fun notNullValueToDB(value: String): Any {
return PGobject().apply {
type = TS_VECTOR_SQL_TYPE
this.value = value
}
}
override fun nonNullValueToString(value: String): String {
return "'$value'"
}
}As you can see, it is simply a String column that uses the “tsvector” postgres data type to create a new PGobject when storing the content to the database, and simply a String? when retrieving data from the database.
Now we can use our new column type in our table definitions by registering the column type like so:
import example.com.db.tables.labs.ResearchLabTable
import example.com.db.utils.TsVectorColumnType
import org.jetbrains.exposed.dao.id.IntIdTable
import org.jetbrains.exposed.sql.ReferenceOption
object ResearchLabsTsVectorTable : IntIdTable("labs_tsvector") {
val researchLab = reference("research_lab_id", ResearchLabTable, onDelete = ReferenceOption.CASCADE)
val tsvectorContent = registerColumn<String>("tsvector_content", TsVectorColumnType()).databaseGenerated()
}Since the actual table’s column is created via the SQL code we discussed earlier, we mark the column as databaseGenerated().
Custom Exposed Operators
We are now ready to move on to defining a custom Exposed operator that will construct the full-text search query. This query will then be used along with Exposed’s select {} function to match the user’s query using the tsvector_content ‘s dictionary.
The idea for the operator’s implementation is simple and can be broken down into simple parts:
The operator will accept the
tsvectorcolumn and the user’s search queryIn its query builder function, it will append the “@@ to_tsquery” Postgres operator on the column and
lastly, follow it up with the user’s search query
Here is the operator class definition:
class TsQueryOp(val tsVectorColumn: Expression<String>, val query: String) : Op<Boolean>() {
override fun toQueryBuilder(queryBuilder: QueryBuilder) {
queryBuilder.append(tsVectorColumn)
queryBuilder.append(" @@ to_tsquery('english', ")
queryBuilder.append(stringLiteral(sanitizeQuery(query))) // Ensures proper escaping of the query string
queryBuilder.append(")")
}
}and the sanitizeQuery function:
fun sanitizeQuery(query: String): String {
// Replace spaces with '&' for logical AND (to match all terms)
return query.trim().replace(Regex("\\s+"), " & ")
}Everything is now set-up and ready for FTS support! We could already stop here and simply return the results of the search to the user in the order they come in from Postgres, but we can do better than that!
Ranking by relevance
Postgres provides another very neat feature via its ts_rank operator, which allows us to rank the results of a ts vector match by how relevant they are to the initial query!
From the documentation:
Ranking attempts to measure how relevant documents are to a particular query, so that when there are many matches the most relevant ones can be shown first. PostgreSQL provides two predefined ranking functions, which take into account lexical, proximity, and structural information; that is, they consider how often the query terms appear in the document, how close together the terms are in the document, and how important is the part of the document where they occur.
For our implementation, we need to create a custom ExpressionWithColumnType in order to provide the query builder and the data type.
Let’s see the implementation:
class CustomTsRankFunction(
private val tsVectorColumn: Column<String>,
private val query: String
) : ExpressionWithColumnType<Double>() {
override fun toQueryBuilder(queryBuilder: QueryBuilder) {
queryBuilder.append("ts_rank(")
queryBuilder.append(tsVectorColumn)
queryBuilder.append(", to_tsquery('greek', ")
queryBuilder.append(stringLiteral(sanitizeQuery(query))) // Ensures proper escaping
queryBuilder.append("))")
}
override val columnType: IColumnType<Double>
get() = DoubleColumnType()
}This “function” will be used to rank the results of the “ts_query” operator, as we can see, based on their relevance. For more information on the ts_rank function, you can take a look at the relevant postgres docs.
Putting it all together
We now have everything that is needed to perform a FTS on the lab data and return the results of the user’s search ranked by relevance. We are going to take a quick look at an example usage of all the aforementioned components and then break it down with an explanation.
Let’s see a sample search function implementation in a repository class:
override suspend fun searchResearchLabs(query: String, page: Int, pageSize: Int): List<ResearchLabSearchResult> {
return suspendTransaction {
val rankAlias = CustomTsRankFunction(ResearchLabsTsVectorTable.tsvectorContent, query).alias("rank")
ResearchLabTable
.join(
ResearchLabsTsVectorTable,
/*
We use an INNER JOIN since we assume the corresponding tsvector row must exist
*/
JoinType.INNER,
ResearchLabTable.id,
ResearchLabsTsVectorTable.researchLab
)
.slice(
ResearchLabTable.id,
ResearchLabTable.name,
ResearchLabTable.scientificField,
ResearchLabTable.scientificSubfield,
ResearchLabTable.createdAt,
ResearchLabTable.updatedAt,
rankAlias
)
.select {
TsQueryOp(ResearchLabsTsVectorTable.tsvectorContent, query)
}
.orderBy(
rankAlias to SortOrder.DESC
)
.limit(pageSize, offset = page * pageSize.toLong())
.map {
val relevance = it[rankAlias]
ResearchLabSearchResult(
id = it[ResearchLabTable.id].value,
name = it[ResearchLabTable.name],
scientificField = it[ResearchLabTable.scientificField],
scientificSubfield = it[ResearchLabTable.scientificSubfield],
createdAt = it[ResearchLabTable.createdAt],
updatedAt = it[ResearchLabTable.updatedAt],
relevance = relevance,
)
}
}
}Let’s break down everything now that we have a complete view of our components’ usage:
Rank Alias
val rankAlias = CustomTsRankFunction(ResearchLabsTsVectorTable.tsvectorContent, query).alias("rank")This line uses a custom function, CustomTsRankFunction, to calculate the relevance ranking based on the tsvectorContent column in the ResearchLabsTsVectorTable table.
The alias("rank") creates an alias, rankAlias, for this rank value to simplify referencing it later.
Combining lab info & FTS content
ResearchLabTable
.join(
ResearchLabsTsVectorTable,
JoinType.INNER,
ResearchLabTable.id,
ResearchLabsTsVectorTable.researchLab
)Here, we simply join the two tables we defined earlier on the lab ID to retrieve the FTS dictionary for each lab.
An INNER JOIN is used to ensure only records with matching tsvectorContent are returned.
Applying the FTS operators
The select clause uses our custom query operator in order to do the FTS:
.select {
TsQueryOp(ResearchLabsTsVectorTable.tsvectorContent, query)
}The TsQueryOp, as we saw earlier, is an operation that:
Compares the
tsvectorContentwith thequeryterm.Ensures only results containing the query term are selected.
Ordering results by relevance
.orderBy(
rankAlias to SortOrder.DESC
)This orderBy clause sorts the results in descending order of relevance (highest-ranked results first). Using rankAlias means, the results are ordered by the relevance score calculated by CustomTsRankFunction.
Other clauses
The slice clause simply selects only the specified columns, while the last clauses limit the results in order to use pagination and then map the records to the DTOs returned to the mobile frontend.
As for the suspendTransaction {} block, it is simply a call to newSuspendedTransaction:
suspend fun <T> suspendTransaction(block: Transaction.() -> T): T =
newSuspendedTransaction(Dispatchers.IO, statement = block)Closing Thoughts
With a few lines of code, we have achieved a pretty fast and efficient FTS by leveraging custom column types, operators & expressions so that we can continue writing FTS-related queries using Exposed’s DSL! We also ranked the results by their relevance, improving the overall user experience by getting the “best” matches to show first for end users.
I’m certain that there are better ways of implementing FTS this way (in terms of efficiency, accuracy of matching, and many other criteria), so feel free to add your comments, suggestions and constructive criticism down below so that I can learn and improve!

