technologies used: node.js, git
For a number of projects, I've wanted to be able to convert regular text to phonemes, for rhyme detection, certain speech pattern recognition, or other purposes. There are a number of projects that implement cmudict, which is helpful in the majority of cases, but inevitably you come across a word that's not in cmudict. Proper nouns, slang, typos, it could be anything. It seemed there was a void that needed to be filled. With the help of some friends, I created a package that addresses this.
Phonemify takes any string, separates each word (that is, splits the string on spaces and hyphens/dashes) and attempts to convert it to Arpabet using cmudict. If the word is not present in cmudict, it will translate it via the Navy Research Laboratory's Automatic Translation of English Text to Phonetics by Means of Letter-to-Sound Rules. Although converting letters to sounds is far from perfect in English, this algorithm represents the absolute best execution of this method available. By not relying on it exclusively (only for words not present in cmudict), I believe the best of both worlds is achieved and a very high success rate can be accomplished without using any actual natural language processing or machine learning techniques. This allows the package to be added and all work done on one server without sending API requests over the internet and it's all totally free and customizable because it's open source.