使用优化的Levenshtein算法查找最接近的邻居

我最近发布了一个有关优化算法以计算Levenshtein距离的问题,而答复使我进入Wikipedia上有关Levenshtein距离的文章。

文章提到,如果最大距离上有一个 k ,则可能来自给定查询的结果可能是运行时间从 O(mn) 减少到 O(kn)mn

是字符串。我查了一下算法,但是我真的不知道如何实现它。我希望在这里获得一些线索。

优化是“可能的改进”下的#4。

让我感到困惑的部分是,我们只需要计算以主对角线为中心(主对角线定义为坐标(i,i))的宽度为 2k + 1 的对角线条。

如果有人可以提供帮助/见解,我将不胜感激。如果需要,我可以在此处将关于算法的完整描述发布在书中作为答案。

回答:

我已经做过很多次了。我这样做的方式是对可能更改的游戏树进行深度优先的递归树遍历。我用来修剪树的预算为 k 。有了这个例程,首先我以k = 0,然后k =

1,然后k = 2来运行它,直到我被击中或者我不想再走高了。

char* a = /* string 1 */;

char* b = /* string 2 */;

int na = strlen(a);

int nb = strlen(b);

bool walk(int ia, int ib, int k){

/* if the budget is exhausted, prune the search */

if (k < 0) return false;

/* if at end of both strings we have a match */

if (ia == na && ib == nb) return true;

/* if the first characters match, continue walking with no reduction in budget */

if (ia < na && ib < nb && a[ia] == b[ib] && walk(ia+1, ib+1, k)) return true;

/* if the first characters don't match, assume there is a 1-character replacement */

if (ia < na && ib < nb && a[ia] != b[ib] && walk(ia+1, ib+1, k-1)) return true;

/* try assuming there is an extra character in a */

if (ia < na && walk(ia+1, ib, k-1)) return true;

/* try assuming there is an extra character in b */

if (ib < nb && walk(ia, ib+1, k-1)) return true;

/* if none of those worked, I give up */

return false;

}

添加了解释特里搜索的说明:

// definition of trie-node:

struct TNode {

TNode* pa[128]; // for each possible character, pointer to subnode

};

// simple trie-walk of a node

// key is the input word, answer is the output word,

// i is the character position, and hdis is the hamming distance.

void walk(TNode* p, char key[], char answer[], int i, int hdis){

// If this is the end of a word in the trie, it is marked as

// having something non-null under the '\0' entry of the trie.

if (p->pa[0] != null){

if (key[i] == '\0') printf("answer = %s, hdis = %d\n", answer, hdis);

}

// for every actual subnode of the trie

for(char c = 1; c < 128; c++){

// if it is a real subnode

if (p->pa[c] != null){

// keep track of the answer word represented by the trie

answer[i] = c; answer[i+1] = '\0';

// and walk that subnode

// If the answer disagrees with the key, increment the hamming distance

walk(p->pa[c], key, answer, i+1, (answer[i]==key[i] ? hdis : hdis+1));

}

}

}

// Note: you have to edit this to handle short keys.

// Simplest is to just append a lot of '\0' bytes to the key.

现在,要将其限制为预算,如果hdis太大,则拒绝下降。

以上是 使用优化的Levenshtein算法查找最接近的邻居 的全部内容, 来源链接: utcz.com/qa/420305.html

回到顶部