Way to find words that differ by only one consonant in a large word list

Question

I have a list of almost 5000 "fantasy" words written in ASCII text. Some of the words are as follows: txintoqtxiqbaltxiqfuntxiqwektxiqyaltxiytontxonmiqtxoqwultxoqxik I want to design an algorithm that checks/verifies that there are no two words in the list that differ by only one "similar consonant". So I'll define "sets of similar consonants" like this (for now): zsxjpbtdkg There may be 3 or more consonants in a set, but I'll just

P粉238433862 · Answer

Choose one consonant in each group to be the "representative" of that group. Then, build a map that groups words together such that they become identical when their consonants are replaced by their representative consonants.

Important note: This method only works when the consonant groups form equivalence classes. In particular, consonant similarity must be transitive. If 'bp' is similar, 'bv' is similar, but 'pv' is not similar, then this method is invalid.

The following is the code for the example in Python; I let you write the JavaScript code.

f is a mapping that maps each consonant to its representative consonant;
d is a map that maps each representative word to a list of words with this representative.

bigwordlist = '''dolbar
dolpar
jumaq
txindan
txintan
txintoq
txiqbal
txiqfun
txiqwek
txiqyal
txinton
txonmiq
txoqwul
txoqxik
xumaq'''.splitlines()

consonant_groups = '''zs
xj
pb
td
kg'''.splitlines()

f = {}
for g in consonant_groups:
    for c in g:
        f[c] = g[0]

print(f)
# {'z': 'z', 's': 'z', 'x': 'x', 'j': 'x', 'p': 'p', 'b': 'p', 't': 't', 'd': 't', 'k': 'k', 'g': 'k'}
    
d = {}
for word in bigwordlist:
    key = ''.join(f.get(c, c) for c in word)
    d.setdefault(key, []).append(word)

print(d)
# {'tolpar': ['dolbar', 'dolpar'], 'xumaq': ['jumaq', 'xumaq'], 'txintan': ['txindan', 'txintan'], 'txintoq': ['txintoq'], 'txiqpal': ['txiqbal'], 'txiqfun': ['txiqfun'], 'txiqwek': ['txiqwek'], 'txiqyal': ['txiqyal'], 'txinton': ['txinton'], 'txonmiq': ['txonmiq'], 'txoqwul': ['txoqwul'], 'txoqxik': ['txoqxik']}

Finally, we can see which words are similar:

print([g for g in d.values() if len(g) > 1])
# [['dolbar', 'dolpar'], ['jumaq', 'xumaq'], ['txindan', 'txintan']]

Way to find words that differ by only one consonant in a large word list

reply all(1)I'll reply