Tov Are Jacobsen

21.01.2005

Soundex in the Io Language

Soundex [16.04.03] (updated 20.01.05)

This is my first stab at doing something in the Io Language www.iolanguage.com which is a small footprint scripting language. The soundex algorithm is used in database lookup applications and genalogy research. It finds certain classes of similar-sounding names. Name prefixes and zero-padding is not done in this implementation.

Implementation

     1	#------------------------------------------------------------------------------
     2	#
     3	# soundex.io -- generate soundex encodings.
     4	#
     5	# 1. Keep the first letter.
     6	#
     7	# 2. Drop all the vowels A E I O U W Y H.
     8	#
     9	# 3. Encode the remaining 3 letters into:
    10	#
    11	#      B P F V            1
    12	#      C S K G J Q X Z    2
    13	#      D T                3
    14	#      L                  4
    15	#      M N                5
    16	#      R                  6
    17	#
    18	# 4. Double letters are treated as one.
    19	#
    20	# 5. Double letters with same encoding is treated as one.
    21	#
    22	#
    23	# Tov Are Jacobsen April, 2003 <mail@tovare.com>
    24	# 
    25	# Language: http://www.iolanguage.com
    26	# Version:  2005-01-19 (Jon Kleiser)
    27	#------------------------------------------------------------------------------
       
    28	Soundex := Object clone do(
    29		encmap := Map clone do(
    30			atPut("B", 1)
    31			atPut("P", 1)
    32			atPut("F", 1)
    33			atPut("V", 1)
    34			atPut("C", 2)
    35			atPut("S", 2)
    36			atPut("K", 2)
    37			atPut("G", 2)
    38			atPut("J", 2)
    39			atPut("Q", 2)
    40			atPut("X", 2)
    41			atPut("Z", 2)
    42			atPut("D", 3)
    43			atPut("T", 3)
    44			atPut("L", 4)
    45			atPut("M", 5)
    46			atPut("N", 5)
    47			atPut("R", 6)
    48		)
    49	)
       
    50	#=========================================================================
    51	#
    52	# Converts a string into soundex encoding.
    53	# 
    54	# Arguments:
    55	#    name     String containing a surname
    56	#
    57	#=========================================================================
       
    58	Soundex encode := method(name,
    59		name = name upper
    60		ename := ""
    61		name foreach(i, s,
    62			if (i == 0) then (
    63				ename = ename append(s asCharacter)
    64			) else (
    65				# Drop all unmapped characters
    66				code := encmap at(s asCharacter)
    67				if (code != Nil,
    68					if (code asString != ename at(-1) asCharacter,
    69						ename = ename append(code)
    70						if (ename length == 4, return ename)
    71					)
    72				)
    73			)
    74		)
    75		return ename
    76	)

Test code

     1	#=============================================================================
     2	#
     3	# testSoundex.io - Test the Soundex.io implementation.
     4	#
     5	# Tov Are Jacobsen 2003 <mail@tovare.com>
     6	#
     7	#=============================================================================
       
     8	doFile(launchPath appendPath("soundex.io"))
       
     9	"The soundex coding is used to encode surnames, so that similar" linePrint
    10	"sounding names has identical codes." linePrint
       
    11	surnames := List clone do(
    12		add("Jensen")
    13		add("Jonassen")
    14		add("Jonasen")
    15		add("Jacobsen")
    16		add("Jakobsen")
    17		add("Fischer")
    18		add("Fisher")
    19		add("Gustavsen")
    20		add("Gustafsson")
    21		add("Gustafsen")
    22		add("Handeland")
    23	)
       
    24	surnames foreach(j, name,
    25		write(Soundex encode(name)," (",name, ")\n")
    26	)

References

The Soundex Indexing System
NARA - U.S. National Archives & Records Administration
Unknown soundex.io 1.69KB
Plain Text File soundex.txt 2.23KB
Unknown testSoundex.io 721B
Plain Text File testSoundex.txt 931B