Self-hosted LanguageTool for grammar checking

More power to self-hosters

I finally found a solution to eliminate grammar and spelling errors in articles: a self-hosted LanguageTool instance. With this, I can finally eliminate grammar and spelling mistakes reliably.

The war on grammar and spelling

I’ve been struggling with this problem for quite a bit. Most, if not every, article I’ve posted have grammar or spelling mistakes in them. The text editor I use, Neovim, has a spell checker built in, but it’s rudimentary and can’t do grammar checking. There’s a limit to the errors I can catch with my own eyes, so errors were always prone to show up.

I had to find a spelling and grammar checker that:

  1. Is available offline. I’m not going to make a Grammarly account and manually copy and paste every time
  2. Is FOSS, if possible
  3. Does not require much work to use each time

And finding a solution was tough initially. FOSS spell checker software is abundant, but they are somewhat inconvenient to use. Also, I couldn’t find any FOSS grammar checking utilities. The alternative was proprietary services like Grammarly. Sure, they have all the nice features and “just work”. But they aren’t FOSS, require internet connection, and these commercial services restrict users that don’t pay.

LanguageTool to the rescue

LanguageTool is basically a FOSS version of Grammarly, which makes it immediately better. However, the real killer feature of LanguageTool is that you can self-host it. This means that you can have LanguageTool running on your own machine, which avoids any restrictions that the default public cloud server would impose. This also means that LanguageTool works offline, so it fits my needs perfectly. I’ve tested LanguageTool on a few articles, and it caught lots of mistakes that I overlooked. It works, and even has n-gram data to catch subtle mistakes.

Installation

I decided to install LanguageTool on my home server, not my actual desktop. This was to keep my own system less cluttered with dependencies, but the program can be run anywhere. The commands below are what I used to get everything ready. My server runs Debian 11 and is headless, so don’t blindly copy and paste the commands below if you use a different OS.

LanguageTool needs at least Java 8, so we install that first.

sudo apt install openjdk-11-jre-headless

Then download the server program and extract it to some directory.

wget https://languagetool.org/download/LanguageTool-stable.zip
unzip LanguageTool-stable.zip -d language-tool

Test if the server runs. cd into the extracted directory and run:

java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081 --allow-origin

A locally installed server can be tested by the command below. If you’re running LanguageTool on another machine, append --public and switch the IP address for the server’s address. If the server is running, you’ll get some data returned.

curl -d "language=en-US" -d "text=a simple test" http://localhost:8081/v2/check

To connect a client to the server, install the LanguageTool browser add-on. Go to the extension Settings, then Advanced settings, and switch the server option from ‘Cloud server’ to ’local’ or ‘other’, depending on your install location. If installed on another machine, the address should be: http://ip.address.of.server:8081/v2. That v2 at the end is needed. Everything should be ready now. When you type into a text box in a browser tab, LanguageTool should be working. It will work offline, is easy to access, and is fully under your control.

Extra: n-gram

Optionally, you can download n-gram data for a few languages. This allows LanguageTool to catch subtler errors, such as “don’t forget to put on the breaks”. A list of all n-gram data sets can be found on this page. Append the name of the data you want to wget https://languagetool.org/download/ngram-data/. For example, input the command below to download the English data. Each data set is a few gigabytes in size, so downloading might take several minutes.

wget https://languagetool.org/download/ngram-data/ngrams-en-20150817.zip

The data needs to be extracted into a directory with the code of the language. For example, if you extract the English zip file to a directory n-gram/, that n-gram directory must have en as a subdirectory. French data should be in n-gram/fr/, and so on.

I recommend keeping the n-gram data in the same directory as LanguageTool. So if LanguageTool is in ~/language-tool/, extract the data to ~/language-tool/n-gram/en, ~/language-tool/n-gram/de, and so on.

unzip ngrams-en-20150817.zip -d language-tool/n-gram/en

To use the n-gram data, the Java command from before needs --languageModel and the n-gram data path as the argument. So if the n-gram data is in a subdirectory of the LanguageTool installation, this is the command to run.

java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081 --allow-origin --languageModel n-gram

That’s the full installation. Now you have a local grammar and spell checking server without any strings attached. If you want to take this a step further, consider making your LanguageTool server accessible over public internet. Then you can connect any computer with the browser add-on to your server from anywhere.