Developer Andrew Quinn achieved a 300× size reduction for Taskusanakirja, a Finnish-English dictionary app, by replacing a 3 GB SQLite database with a 10 MB finite state transducer (FST) binary. The optimization enabled the app to store 40-60 million inflected word forms while maintaining fast search-as-you-type functionality.
Finite State Transducers Exploit Finnish Language Structure
The breakthrough leveraged a key property of finite state transducers: unlike tries that only compress prefixes, FSTs compress both prefixes and suffixes. Finnish is heavily agglutinative, meaning thousands of words share identical inflectional endings. This suffix-sharing characteristic makes the language ideal for FST compression.
Quinn's initial trie-based approach maxed out at approximately 400,000 items in 50 MB. The SQLite Full Text Search solution worked functionally but produced a 3 GB database—an unacceptable download size for a mobile application. The FST approach solved both the capacity and size constraints simultaneously.
Minimal Rust Implementation Produces Static Binary
The implementation required a minimal Rust program using the 'fst' crate. Quinn extracted data from the existing SQLite database and compiled it into a finite state automaton. Because the dictionary data is static and doesn't require runtime updates, the approach avoided FSTs' typical weakness with dynamic insertions.
The final Pro version 2.0 reached approximately 20 MB total size—three times smaller than the free version 1.0 had ever been. The static data structure maintained fast search performance without perceptible delays during user interaction.
Pragmatic Engineering Process Revealed Optimal Solution
Quinn noted the project illustrated pragmatic engineering philosophy: the initial "bad easy thing" SQLite hack enabled discovery of the optimal FST approach through accumulated knowledge. The working but suboptimal solution provided the operational experience necessary to identify the better technical path.
The post, shared on Hacker News on May 10, 2026, received 95 points and 12 comments, reflecting developer interest in data structure optimization techniques.
Key Takeaways
- Finite state transducers reduced a Finnish-English dictionary database from 3 GB (SQLite) to 10 MB, achieving 300× compression
- FSTs compress both prefixes and suffixes, making them ideal for agglutinative languages like Finnish where thousands of words share inflectional endings
- The implementation used Rust with the 'fst' crate, compiling dictionary data into a static finite state automaton
- The final Pro version 2.0 reached approximately 20 MB total, three times smaller than the original version while storing 40-60 million inflected word forms
- Static data structure eliminated runtime update requirements, avoiding FSTs' typical weakness with dynamic insertions while maintaining fast search performance