如何在R中的字符串向量中找到相似的词?

有时字符串向量中的字符串有拼写错误,我们希望提取相似的单词以避免这种拼写错误,因为相似的单词可能表示单词的正确和不正确形式。这可以通过使用 agrep 和 lapply 函数来实现。

在线示例1

x1<-c("India","United Kingdoms","Indiaa","Egyypt","United

Kingdom","Turkey","Egypt","Belaarus","Belarus")

lapply(x1,agrep,x1,value=TRUE)

输出结果

[[1]]

[1] "India" "Indiaa"

[[2]]

[1] "United Kingdoms" "United Kingdom"

[[3]]

[1] "India" "Indiaa"

[[4]]

[1] "Egyypt" "Egypt"

[[5]]

[1] "United Kingdoms" "United Kingdom"

[[6]]

[1] "Turkey"

[[7]]

[1] "Egyypt" "Egypt"

[[8]]

[1] "Belaarus" "Belarus"

[[9]]

[1] "Belaarus" "Belarus"

在线示例2

x2<-c("Alhadi","Umair","Omar","Alhadi","Shanti","Shant","Umaer","Peter","Rahul","Pattrick","P

eeter","Rahuls")

lapply(x2,agrep,x2,value=TRUE)

输出结果

[[1]]

[1] "Al-hadi" "Alhadi"

[[2]]

[1] "Umair" "Umaer"

[[3]]

[1] "Omar"

[[4]]

[1] "Al-hadi" "Alhadi"

[[5]]

[1] "Shanti" "Shant"

[[6]]

[1] "Shanti" "Shant"

[[7]]

[1] "Umair" "Umaer"

[[8]]

[1] "Peter" "Peeter"

[[9]]

[1] "Rahul" "Rahuls"

[[10]]

[1] "Pattrick"

[[11]]

[1] "Peter" "Peeter"

[[12]]

[1] "Rahul" "Rahuls"

在线示例3

x3<-c("Alabamaa","New Yorky","New

Yok","Alabma","Florida","Illinois","Texas","Illinoise")

lapply(x3,agrep,x3,value=TRUE)

输出结果

[[1]]

[1] "Alabamaa"

[[2]]

[1] "New Yorky"

[[3]]

[1] "New Yorky" "New Yok"

[[4]]

[1] "Alabamaa" "Alabma"

[[5]]

[1] "Florida"

[[6]]

[1] "Illinois" "Illinoise"

[[7]]

[1] "Texas"

[[8]]

[1] "Illinois" "Illinoise"

以上是 如何在R中的字符串向量中找到相似的词? 的全部内容, 来源链接: utcz.com/z/345253.html

回到顶部