我怎么能执行一个函数一次在所有对

我在这里问一个问题,这是非常难以对付how can I group based on similarity in strings发生。我发现了一个好主意,我想尝试一下。我怎么能执行一个函数一次在所有对

这是我的思想和数据(相同的数据作为问题)

df <-structure(list(label = structure(c(5L, 6L, 7L, 8L, 3L, 1L, 2L, 

9L, 10L, 4L), .Label = c(" holand", " holandindia", " Holandnorway",

" USAargentinabrazil", "Afghanestan ", "Afghanestankabol", "Afghanestankabolindia",

"indiaAfghanestan ", "USA", "USAargentina "), class = "factor"),

value = structure(c(5L, 4L, 1L, 9L, 7L, 10L, 6L, 3L, 2L,

8L), .Label = c("1941029507", "2367321518", "2849255881",

"2913128511", "2927576083", "4550996370", "457707181.9",

"637943892.6", "796495286.2", "89291651.19"), class = "factor")), .Names = c("label",

"value"), class = "data.frame", row.names = c(NA, -10L))

1-我尝试计算每行中每每个串字母的数目 2-我试图执行adist每对

如果adist输出类似于1之间,它们属于一个组,如果没有它们是在两个不同的组

为了解决上述问题,我需要知道如何执行adjst我的数据的第一列的所有字符串。

所以我的问题是下面

1是有,做相反adjst的功能? 2-我怎样才能在所有组合执行adjst(基于最长的一个时间到最短,例如,

adist("Afghanestankabolindia","Afghanestan") 

adist("Afghanestankabolindia","Afghanestankabol")

adist("Afghanestankabolindia","indiaAfghanestan")

adist("Afghanestankabolindia","Holandnorway")

adist("Afghanestankabolindia","holand")

adist("Afghanestankabolindia","holandindia")

.

.

.

棘手的部分是,它应该参考,另一个例如之间发生一次,它应该只计算一次

Afghanestankabolindia and Afghanestan 

,而不是

Afghanestan and Afghanestankabolindia 

之间的距离是指参考始终是最长的字符串

回答:

不能确定你的期望输出格式,但我认为这你想要做什么:

ref = as.character(df$label) 

all_combs = as.data.frame(t(combn(ref[order(nchar(ref),decreasing = T)],2)))

all_combs$val = mapply(adist,all_combs$V1,all_combs$V2)

首先,我们创建的所有组合(排序ref向量所以第一个元素总是较长一个(即参考资料)。然后我们使用mapply计算adist所有组合。

输出:

     V1     V2 val 

1 Afghanestankabolindia USAargentinabrazil 15

2 Afghanestankabolindia indiaAfghanestan 15

3 Afghanestankabolindia Afghanestankabol 5

4 Afghanestankabolindia Holandnorway 17

5 Afghanestankabolindia USAargentina 17

6 Afghanestankabolindia Afghanestan 10

7 Afghanestankabolindia holandindia 13

8 Afghanestankabolindia holand 16

9 Afghanestankabolindia USA 21

10 USAargentinabrazil indiaAfghanestan 16

11 USAargentinabrazil Afghanestankabol 13

12 USAargentinabrazil Holandnorway 14

13 USAargentinabrazil USAargentina 7

14 USAargentinabrazil Afghanestan 15

15 USAargentinabrazil holandindia 13

16 USAargentinabrazil holand 16

17 USAargentinabrazil USA 16

18 indiaAfghanestan Afghanestankabol 10

19 indiaAfghanestan Holandnorway 14

... ..... ..... ..

希望这有助于!

以上是 我怎么能执行一个函数一次在所有对 的全部内容, 来源链接: utcz.com/qa/257502.html

回到顶部