Post Image
By Matt GaidicaJuly 20, 2012In Uncategorized

Lazy Levenshtein: Using Abbreviations and Spellchecked Inputs in Ruby

I have been spending a lot of time writing Ruby programs that take in data through the terminal. One of the problems is that mis-spelling something can cause the program to crash, and I want to be as quick as possible when doing data entry.

One of my programs asks which server environment I would like to use before I start messing with any data (development, integration, staging, production). It would be great if all of the following abbreviations or misspellings would choose the development environment, and keep the program rolling:

  • dev

  • development

  • devel

  • deevleopmnt

You get the idea- abbreviations and spellchecking from known inputs. To accomplish this I leverage the Levenshtein distance algorithm, more commonly known as “edit distance”. This algorithm compares two strings and returns an integer that is equal to the amount of edits needed to transform the first string into the second.

Here is the Github Gist for Lazy Levenshtein, with the sample code below so we can dig through it.

The three parameters are the input itself, an array of possible matches, and a boolean that tells the method whether or not you want to match abbreviations. The method sets up a Levenshtein comparison for each potential match (using the Ruby Amatch library), and scores the comparison. We are playing golf here, because the lowest score wins the game!

The method also reverses the array in the main loop, which puts priority to the first items in the array if there happens to be a tie between matches. Unlike typical “spellcheck”, this method will never return “not found”, it will always return a match, and if the “matches” array is empty, it simply returns the provided input.

This has helped me make inputting much faster with smarter defaults, and given me the piece of mind that my misspellings will always turn into known/safe values.

svgArticle Analysis: Matching People's Names to Email Addresses
svg
svgMulti-environment post_build_hook using tddium, Heroku, and Ruby Sinatra