April 16, 2003

Soundex in the Io Language

Soundex [16.04.03] (updated 20.01.05)

This code is my first stab at doing something in the Io Language (www.iolanguage.com)[http://www.iolanguage.com/about/] which is a small footprint scripting language. The Soundex algorithm for database lookup applications and genealogy research. It finds certain classes of similar-sounding names. Not implemented is Name prefixes and zero-padding.

Implementation

#------------------------------------------------------------------------------
#
# soundex.io -- generate soundex encodings.
#
# 1. Keep the first letter.
#
# 2. Drop all the vowels A E I O U W Y H.
#
# 3. Encode the remaining 3 letters into:
#
#      B P F V            1
#      C S K G J Q X Z    2
#      D T                3
#      L                  4
#      M N                5
#      R                  6
#
# 4. Double letters are treated as one.
#
# 5. Double letters with same encoding is treated as one.
#
#
# Tov Are Jacobsen April, 2003 <mail@tovare.com>
# 
# Language: http://www.iolanguage.com
# Version:  2005-01-19 (Jon Kleiser)
#------------------------------------------------------------------------------

Soundex := Object clone do(
	encmap := Map clone do(
		atPut("B", 1)
		atPut("P", 1)
		atPut("F", 1)
		atPut("V", 1)
		atPut("C", 2)
		atPut("S", 2)
		atPut("K", 2)
		atPut("G", 2)
		atPut("J", 2)
		atPut("Q", 2)
		atPut("X", 2)
		atPut("Z", 2)
		atPut("D", 3)
		atPut("T", 3)
		atPut("L", 4)
		atPut("M", 5)
		atPut("N", 5)
		atPut("R", 6)
	)
)

#--------------------------------------------
#
# Converts a string into soundex encoding.
# 
# Arguments:
#    name     String containing a surname
#
#--------------------------------------------

Soundex encode := method(name,
	name = name upper
	ename := ""
	name foreach(i, s,
		if (i == 0) then (
			ename = ename append(s asCharacter)
		) else (
			# Drop all unmapped characters
			code := encmap at(s asCharacter)
			if (code != Nil,
				if (code asString != ename at(-1) asCharacter,
					ename = ename append(code)
					if (ename length == 4, return ename)
				)
			)
		)
	)
	return ename
)

Test code

#
#
# testSoundex.io - Test the Soundex.io implementation.
#
# Tov Are Jacobsen 2003 <mail@tovare.com>
#
#

doFile(launchPath appendPath("soundex.io"))

"The soundex coding is used to encode surnames, so that similar" linePrint
"sounding names has identical codes." linePrint

surnames := List clone do(
	add("Jensen")
	add("Jonassen")
	add("Jonasen")
	add("Jacobsen")
	add("Jakobsen")
	add("Fischer")
	add("Fisher")
	add("Gustavsen")
	add("Gustafsson")
	add("Gustafsen")
	add("Handeland")
)

surnames foreach(j, name,
	write(Soundex encode(name)," (",name, ")\n")
)

References

© Tov Are Jacobsen 1997-2021 Privacy and cookies