A trie is an effective data structure that could be an excellent addition to your problem-solving toolbox. Let’s see which areas the trie usage is more applicable than a standard set or map implementation. And what other benefits you can get from using it.
I like the name trie more than prefix tree or digital tree, although prefix tree is the most accurate definition. The name trie comes from retrieval but is pronounced as “try,” not “tree.” The idea was to distinguish the trie from the tree.
Try to guess what a trie is. I built trie from 4 words: apple, banana, bastard, ban:
I highly recommend to use a visualization tool from the University of San Francisco to play with a trie.
Trie stores keys that are represented as a sequence (usually a string) compactly. Then you can use it as a set or a hash map to query keys. In addition to that, you get an opportunity to query keys and values by the key prefix.
I do not focus on the optimal algorithm. The goal of this article is to get you familiar with trie and when you can use it.
Let’s design a desired API:
It will be enough to implement these two functions: Insert and Contains, to grasp the idea. And it will be a reasonable basis for the following improvements.
I start by implementing generic rune trie. In Go rune is a superset of ASCII and represents Unicode code points. Think of it as char type in Java or C, but for Unicode.
So, there is Insert function without any optimizations, just the first naive implementation:
The algorithm is simple:
- Set current node value to the root node.
- For each letter of the word: a) Check if the current node contains a letter and if it does not have it, create a new node for the letter and attach it to the current node. b) But if it includes the letter, get the node under this letter and set it as the current node. Return to 2.
- Mark the last processed node as the end of the word.
Contains function is much simpler and seems straightforward:
We traverse a tree by letters until the last node.
Let’s benchmark this implementation. Go has tools to write simple benchmark tests like this one:
And then we run the benchmark:
Go bench tool does not count stack allocations, so I see that Contains function does not contain any heap allocations, which is excellent.
Trie structure enables us to add one more function that is not feasible for a regular hash map — SearchByPrefix:
The function is not optimized, but it shows that this is possible and not so complex. It calls the function defined earlier
Since the trie allows to search words by prefix, it is a good candidate for autocompletion algorithms.
fasthttp library for Go uses trie as an internal implementation for routing. It is not only the fastest (probably) implementation of the routing, but it also does not produce any garbage.
You can dive deeper if you wish.
If you need:
- A set in which you might need to perform a lookup by the prefix.
- Compactly store a lot of similar data.
- You build the structure once and query it after.
Trie might be a good candidate for you.
For example, you build a set of domains, including subdomains in memory at the application startup, and then only query it. You might consider a trie for this task.
But anyway, always check your performance assumptions with benchmarks.
If you do not need to lookup by prefix and your data structure is changing a lot, a hash map or a hash set is a better alternative.
Known trie implementations
I found two exciting libraries with an exemplary implementation of trie:
You can also use mine, but it is not optimized and created for learning purposes
Originally published at https://scalabledeveloper.com.