如何在R中查找字符串列的每一行中的字符数?
如果我们在 R 数据框中有一个字符串列,并且字符串与数字混合,并且我们想要找到字符串列的每一行中的字符数,那么 nchar 函数可以与 gsub 函数一起使用,如下例所示。
由于 R 区分大小写,因此在进行此类分析时,我们需要确保对小写和大写字母使用正确的表示法。
示例 1
以下代码段创建了一个示例数据框 -
x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")df1<-data.frame(x)
df1
创建以下数据框 -
x1 A01K
2 140AL
3 A142R
4 A255SW
5 A2474EZ
6 CA214N
7 C14O
8 CGSLT
9 DC23QW
10 D2411RWEDE
11 FL233EGV
12 G36521VCLPBA
13 G54TRU
14 H214FI
15 245IA
16 ID3699
17 IL01
18 IFDFDN
19 K2254FDES
20 KY244RLPKJ
要查找 x 列每行中的字符数,请将以下代码添加到上述代码段中 -
x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")输出结果df1<-data.frame(x)
df1$No_of_Chars<-nchar(gsub("[^A-Z]","",df1$x))
df1
如果您将上述所有给定的片段作为单个程序执行,它会生成以下输出 -
x No_of_Chars1 A01K 2
2 140AL 2
3 A142R 2
4 A255SW 3
5 A2474EZ 3
6 CA214N 3
7 C14O 2
8 CGSLT 5
9 DC23QW 4
10 D2411RWEDE 6
11 FL233EGV 5
12 G36521VCLPBA 7
13 G54TRU 4
14 H214FI 3
15 245IA 2
16 ID3699 2
17 IL01 2
18 IFDFDN 6
19 K2254FDES 5
20 KY244RLPKJ 7
示例 2
以下代码段创建了一个示例数据框 -
y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")df2<-data.frame(y)
df2
创建以下数据框 -
y1 ala5412bama
2 ala1475ska
3 american11022samoa
4 arizona3652
5 arkan1475sas
6 califor2365nia
7 co1475lorado
8 0014connecticut
9 dela25366ware
10 district257of22columbia
11 florid02535a
12 57412georgia
13 gu25987am
14 hawaii36250
15 20057idaho
16 i369852llinois
17 indiana0146563
18 3255iowa
19 kansas3682701
20 kentucky2574
要查找 y 列的每一行中的字符数,请将以下代码添加到上述代码段中 -
y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")输出结果df2<-data.frame(y)
df2$No_of_Chars<-nchar(gsub("[^a-z]","",df2$y))
df2
如果您将上述所有给定的片段作为单个程序执行,它会生成以下输出 -
y No_of_Chars1 ala5412bama 7
2 ala1475ska 6
3 american11022samoa 13
4 arizona3652 7
5 arkan1475sas 8
6 califor2365nia 10
7 co1475lorado 8
8 0014connecticut 11
9 dela25366ware 8
10 district257of22columbia 18
11 florid02535a 7
12 57412georgia 7
13 gu25987am 4
14 hawaii36250 6
15 20057idaho 5
16 i369852llinois 8
17 indiana0146563 7
18 3255iowa 4
19 kansas3682701 6
20 kentucky2574 8
以上是 如何在R中查找字符串列的每一行中的字符数? 的全部内容, 来源链接: utcz.com/z/360744.html