Soundex [16.04.03] (updated 20.01.05)
This code is my first stab at doing something in the Io Language (www.iolanguage.com)[http://www.iolanguage.com/about/] which is a small footprint scripting language. The Soundex algorithm for database lookup applications and genealogy research. It finds certain classes of similar-sounding names. Not implemented is Name prefixes and zero-padding.
Implementation
#------------------------------------------------------------------------------
#
# soundex.io -- generate soundex encodings.
#
# 1. Keep the first letter.
#
# 2. Drop all the vowels A E I O U W Y H.
#
# 3. Encode the remaining 3 letters into:
#
# B P F V 1
# C S K G J Q X Z 2
# D T 3
# L 4
# M N 5
# R 6
#
# 4. Double letters are treated as one.
#
# 5. Double letters with same encoding is treated as one.
#
#
# Tov Are Jacobsen April, 2003 <mail@tovare.com>
#
# Language: http://www.iolanguage.com
# Version: 2005-01-19 (Jon Kleiser)
#------------------------------------------------------------------------------
Soundex := Object clone do(
encmap := Map clone do(
atPut("B", 1)
atPut("P", 1)
atPut("F", 1)
atPut("V", 1)
atPut("C", 2)
atPut("S", 2)
atPut("K", 2)
atPut("G", 2)
atPut("J", 2)
atPut("Q", 2)
atPut("X", 2)
atPut("Z", 2)
atPut("D", 3)
atPut("T", 3)
atPut("L", 4)
atPut("M", 5)
atPut("N", 5)
atPut("R", 6)
)
)
#--------------------------------------------
#
# Converts a string into soundex encoding.
#
# Arguments:
# name String containing a surname
#
#--------------------------------------------
Soundex encode := method(name,
name = name upper
ename := ""
name foreach(i, s,
if (i == 0) then (
ename = ename append(s asCharacter)
) else (
# Drop all unmapped characters
code := encmap at(s asCharacter)
if (code != Nil,
if (code asString != ename at(-1) asCharacter,
ename = ename append(code)
if (ename length == 4, return ename)
)
)
)
)
return ename
)
Test code
#
#
# testSoundex.io - Test the Soundex.io implementation.
#
# Tov Are Jacobsen 2003 <mail@tovare.com>
#
#
doFile(launchPath appendPath("soundex.io"))
"The soundex coding is used to encode surnames, so that similar" linePrint
"sounding names has identical codes." linePrint
surnames := List clone do(
add("Jensen")
add("Jonassen")
add("Jonasen")
add("Jacobsen")
add("Jakobsen")
add("Fischer")
add("Fisher")
add("Gustavsen")
add("Gustafsson")
add("Gustafsen")
add("Handeland")
)
surnames foreach(j, name,
write(Soundex encode(name)," (",name, ")\n")
)
References
- The Soundex Indexing System http://www.archives.gov/publications/general-info-leaflets/55.html NARA - U.S. National Archives & Records Administration