Flex/Bison - 我的正则表达式不匹配两个或更多个X的实例,例如XXY-1或XXY-1

我使用flex和bison为虚构的编程语言创建解析器。将会有有效和无效的变量名称。Flex/Bison - 我的正则表达式不匹配两个或更多个X的实例,例如XXY-1或XXY-1

XXXX XY-1 // valid 

XXXXX Z // valid

XXX Y // valid

XXX 5Aet // invalid

XXXX XXAB-Y // invalid

x的开头只是指定变量的大小。变量5Aet是无效,因为它以数字开头。我已成功地匹配这个

[\_\-0-9][a-zA-Z][a-zA-Z0-9\-\_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME; 

变量XXAB-Y正则表达式是无效因为变量名称不能以两个或两个以上x字符开始。

我试图匹配这个正则表达式,但我一直不成功。我已经尝试过以下表达式的各种组合,但没有任何工作。变量保持匹配有效。

[X]{2,}[A-Z0-9\-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME; 

[X]{2,0}[\_\-0-9][a-zA-Z][a-zA-Z0-9\-\_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

lexer.l片断

[\t ]+ // ignore whitespaces 

\n // Ignore new line

[\"][^"]*[\"] yylval.string = strdup(yytext); return TERM_STR;

";" return TERM_SEPARATOR;

"." return TERM_FULLSTOP;

[0-9]+ yylval.integer = atoi(yytext); return TERM_INT;

XX[A-Z0-9-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

[\_\-0-9]+[a-zA-Z][a-zA-Z0-9\-\_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

[A-Z][A-Z0-9\-]* yylval.string = strdup(yytext); return TERM_VARIABLE_NAME;

[X]+ yylval.integer = yyleng; return TERM_SIZE;

. return TERM_INVALID_TOKEN;

parser.y片​​断

program: 

/* empty */ |

begin middle_declarations body grammar_s end {

printf("\nParsing complete\n");

exit(0);

};

begin:

TERM_BEGINING TERM_FULLSTOP;

body:

TERM_BODY TERM_FULLSTOP;

end:

TERM_END TERM_FULLSTOP;

middle_declarations:

/* empty */ |

//Left recursive to allow for many declearations

middle_declarations declaration TERM_FULLSTOP;

declaration:

TERM_SIZE TERM_VARIABLE_NAME {

createVar($1, $2);

}

|

TERM_SIZE TERM_INVALID_VARIABLE_NAME {

printInvalidVarName($2);

};

grammar_s:

/* empty */ |

grammar_s grammar TERM_FULLSTOP;

grammar:

add | move | print | input;

add:

TERM_ADD TERM_INT TERM_TO TERM_VARIABLE_NAME {

addIntToVar($2, $4);

}

|

TERM_ADD TERM_VARIABLE_NAME TERM_TO TERM_VARIABLE_NAME {

addVarToVar($2, $4);

}

;

move:

TERM_MOVE TERM_VARIABLE_NAME TERM_TO TERM_VARIABLE_NAME {

moveVarToVar($2, $4);

}

|

TERM_MOVE TERM_INT TERM_TO TERM_VARIABLE_NAME {

moveIntToVar($2, $4);

}

;

print:

/* empty */ |

TERM_PRINT rest_of_print {

printf("\n");

};

rest_of_print:

/* empty */ |

rest_of_print other_print;

other_print:

TERM_VARIABLE_NAME {

printVarValue($1);

}

|

TERM_SEPARATOR {

// do nothing

}

|

TERM_STR {

printf("%s", $1);

}

;

input:

// Fullstop declares grammar

TERM_INPUT other_input;

other_input:

/* empty */ |

// Input var1

TERM_VARIABLE_NAME {

inputValues($1);

}

|

// Can be input var1; var2;...varN

other_input TERM_SEPARATOR TERM_VARIABLE_NAME {

inputValues($2);

}

;

调试输出:

Starting parse 

Entering state 0

Reading a token: Next token is token TERM_BEGINING (1.1:)

Shifting token TERM_BEGINING (1.1:)

Entering state 1

Reading a token: Next token is token TERM_FULLSTOP (1.1:)

Shifting token TERM_FULLSTOP (1.1:)

Entering state 4

Reducing stack by rule 3 (line 123):

$1 = token TERM_BEGINING (1.1:)

$2 = token TERM_FULLSTOP (1.1:)

-> $$ = nterm begin (1.1:)

Stack now 0

Entering state 3

Reducing stack by rule 6 (line 131):

-> $$ = nterm middle_declarations (1.1:)

Stack now 0 3

Entering state 6

Reading a token: Next token is token TERM_SIZE (1.1:)

Shifting token TERM_SIZE (1.1:)

Entering state 8

Reading a token: Next token is token TERM_VARIABLE_NAME (1.1:)

Shifting token TERM_VARIABLE_NAME (1.1:)

Entering state 13

Reducing stack by rule 8 (line 137):

$1 = token TERM_SIZE (1.1:)

$2 = token TERM_VARIABLE_NAME (1.1:)

-> $$ = nterm declaration (1.1:)

Stack now 0 3 6

Entering state 10

Reading a token: Next token is token TERM_FULLSTOP (1.1:)

Shifting token TERM_FULLSTOP (1.1:)

Entering state 15

Reducing stack by rule 7 (line 134):

$1 = nterm middle_declarations (1.1:)

$2 = nterm declaration (1.1:)

$3 = token TERM_FULLSTOP (1.1:)

-> $$ = nterm middle_declarations (1.1:)

Stack now 0 3

Entering state 6

Reading a token: Next token is token TERM_SIZE (1.1:)

Shifting token TERM_SIZE (1.1:)

Entering state 8

Reading a token: Next token is token TERM_VARIABLE_NAME (1.1:)

Shifting token TERM_VARIABLE_NAME (1.1:)

Entering state 13

Reducing stack by rule 8 (line 137):

$1 = token TERM_SIZE (1.1:)

$2 = token TERM_VARIABLE_NAME (1.1:)

-> $$ = nterm declaration (1.1:)

Stack now 0 3 6

Entering state 10

Reading a token: Next token is token TERM_FULLSTOP (1.1:)

Shifting token TERM_FULLSTOP (1.1:)

Entering state 15

Reducing stack by rule 7 (line 134):

$1 = nterm middle_declarations (1.1:)

$2 = nterm declaration (1.1:)

$3 = token TERM_FULLSTOP (1.1:)

-> $$ = nterm middle_declarations (1.1:)

Stack now 0 3

Entering state 6

Reading a token: Next token is token TERM_SIZE (1.1:)

Shifting token TERM_SIZE (1.1:)

Entering state 8

Reading a token: Next token is token TERM_VARIABLE_NAME (1.1:)

Shifting token TERM_VARIABLE_NAME (1.1:)

Entering state 13

Reducing stack by rule 8 (line 137):

$1 = token TERM_SIZE (1.1:)

$2 = token TERM_VARIABLE_NAME (1.1:)

-> $$ = nterm declaration (1.1:)

Stack now 0 3 6

Entering state 10

Reading a token: Next token is token TERM_FULLSTOP (1.1:)

Shifting token TERM_FULLSTOP (1.1:)

Entering state 15

Reducing stack by rule 7 (line 134):

$1 = nterm middle_declarations (1.1:)

$2 = nterm declaration (1.1:)

$3 = token TERM_FULLSTOP (1.1:)

-> $$ = nterm middle_declarations (1.1:)

Stack now 0 3

Entering state 6

Reading a token: Next token is token TERM_BODY (1.1:)

Shifting token TERM_BODY (1.1:)

Entering state 7

Reading a token: Next token is token TERM_FULLSTOP (1.1:)

Shifting token TERM_FULLSTOP (1.1:)

Entering state 11

Reducing stack by rule 4 (line 126):

$1 = token TERM_BODY (1.1:)

$2 = token TERM_FULLSTOP (1.1:)

-> $$ = nterm body (1.1:)

Stack now 0 3 6

Entering state 9

Reducing stack by rule 10 (line 145):

-> $$ = nterm grammar_s (1.1:)

Stack now 0 3 6 9

Entering state 14

Reading a token: Next token is token TERM_PRINT (1.1:)

Shifting token TERM_PRINT (1.1:)

Entering state 20

Reducing stack by rule 22 (line 180):

-> $$ = nterm rest_of_print (1.1:)

Stack now 0 3 6 9 14 20

Entering state 34

Reading a token: Next token is token TERM_STR (1.1:)

Shifting token TERM_STR (1.1:)

Entering state 41

Reducing stack by rule 26 (line 194):

$1 = token TERM_STR (1.1:)

-> $$ = nterm other_print (1.1:)

Stack now 0 3 6 9 14 20 34

Entering state 44

Reducing stack by rule 23 (line 182):

$1 = nterm rest_of_print (1.1:)

$2 = nterm other_print (1.1:)

-> $$ = nterm rest_of_print (1.1:)

Stack now 0 3 6 9 14 20

Entering state 34

Reading a token: Next token is token TERM_FULLSTOP (1.1:)

Reducing stack by rule 21 (line 176):

$1 = token TERM_PRINT (1.1:)

$2 = nterm rest_of_print (1.1:)

"hEllo"

-> $$ = nterm print (1.1:)

Stack now 0 3 6 9 14

Entering state 25

Reducing stack by rule 14 (line 150):

$1 = nterm print (1.1:)

-> $$ = nterm grammar (1.1:)

Stack now 0 3 6 9 14

Entering state 22

Next token is token TERM_FULLSTOP (1.1:)

Shifting token TERM_FULLSTOP (1.1:)

Entering state 35

Reducing stack by rule 11 (line 147):

$1 = nterm grammar_s (1.1:)

$2 = nterm grammar (1.1:)

$3 = token TERM_FULLSTOP (1.1:)

-> $$ = nterm grammar_s (1.1:)

Stack now 0 3 6 9

Entering state 14

Reading a token: Next token is token TERM_END (1.1:)

Shifting token TERM_END (1.1:)

Entering state 16

Reading a token: Next token is token TERM_FULLSTOP (1.1:)

Shifting token TERM_FULLSTOP (1.1:)

Entering state 27

Reducing stack by rule 5 (line 129):

$1 = token TERM_END (1.1:)

$2 = token TERM_FULLSTOP (1.1:)

-> $$ = nterm end (1.1:)

Stack now 0 3 6 9 14

Entering state 21

Reducing stack by rule 2 (line 113):

$1 = nterm begin (1.1:)

$2 = nterm middle_declarations (1.1:)

$3 = nterm body (1.1:)

$4 = nterm grammar_s (1.1:)

$5 = nterm end (1.1:)

样品输入:

BeGiNInG. 

X XXAB-.

XX XXX7.

XX XXXY.

BoDY.

print "hEllo".

EnD.

回答:

[X]{2,}[A-Z0-9\-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME; 

应该工作得很好,它对我来说工作正常。因为任何进一步X字符

XX[A-Z0-9-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME; 

将匹配[A-Z0-9-]字符类:然而,它可以被简化成。 (请注意,这是没有必要写\-字符类中; -会做什么,只要它是无论是在字符类的第一个或最后一件事。)

这种模式(像你这样)也只匹配XX,但[X]+模式将在Flex输入文件中早些时候获胜。

{2,0}是不是一个有效间隔表达,因为0小于2.要指定“2或更多个X”,写X{2,}(或[X]{2,},如果你喜欢。"X"{2,}也有效。)这应该从产生错误消息flex,结果是没有生成词汇扫描仪。 (但是,一个旧的可能仍然躺在附近,这可能会造成混乱。)

以上是 Flex/Bison - 我的正则表达式不匹配两个或更多个X的实例,例如XXY-1或XXY-1 的全部内容, 来源链接: utcz.com/qa/257185.html

回到顶部