如何在源代码中找到所有注释?
注释有两种样式,C样式和C ++样式,如何识别它们?
/* comments */// comments
我可以随意使用任何方法和第3库。
回答:
为了可靠地在Java源文件中找到所有注释,我不会使用regex,而是使用真正的词法分析器(aka Tokenizer)。
Java的两个流行选择是:
- JFlex:http://jflex.de
- ANTLR:http://www.antlr.org
与流行的看法相反,ANTLR也可用于 仅 创建词法分析器而不使用语法分析器。
这是ANTLR快速演示。您需要在同一目录中包含以下文件:
- antlr-3.2.jar
- JavaCommentLexer.g(语法)
- Main.java
- Test.java(有效(!)的Java源文件,带有奇异注释)
回答:
lexer grammar JavaCommentLexer;options {
filter=true;
}
SingleLineComment
: FSlash FSlash ~('\r' | '\n')*
;
MultiLineComment
: FSlash Star .* Star FSlash
;
StringLiteral
: DQuote
( (EscapedDQuote)=> EscapedDQuote
| (EscapedBSlash)=> EscapedBSlash
| Octal
| Unicode
| ~('\\' | '"' | '\r' | '\n')
)*
DQuote {skip();}
;
CharLiteral
: SQuote
( (EscapedSQuote)=> EscapedSQuote
| (EscapedBSlash)=> EscapedBSlash
| Octal
| Unicode
| ~('\\' | '\'' | '\r' | '\n')
)
SQuote {skip();}
;
fragment EscapedDQuote
: BSlash DQuote
;
fragment EscapedSQuote
: BSlash SQuote
;
fragment EscapedBSlash
: BSlash BSlash
;
fragment FSlash
: '/' | '\\' ('u002f' | 'u002F')
;
fragment Star
: '*' | '\\' ('u002a' | 'u002A')
;
fragment BSlash
: '\\' ('u005c' | 'u005C')?
;
fragment DQuote
: '"'
| '\\u0022'
;
fragment SQuote
: '\''
| '\\u0027'
;
fragment Unicode
: '\\u' Hex Hex Hex Hex
;
fragment Octal
: '\\' ('0'..'3' Oct Oct | Oct Oct | Oct)
;
fragment Hex
: '0'..'9' | 'a'..'f' | 'A'..'F'
;
fragment Oct
: '0'..'7'
;
回答:
import org.antlr.runtime.*;public class Main {
public static void main(String[] args) throws Exception {
JavaCommentLexer lexer = new JavaCommentLexer(new ANTLRFileStream("Test.java"));
CommonTokenStream tokens = new CommonTokenStream(lexer);
for(Object o : tokens.getTokens()) {
CommonToken t = (CommonToken)o;
if(t.getType() == JavaCommentLexer.SingleLineComment) {
System.out.println("SingleLineComment :: " + t.getText().replace("\n", "\\n"));
}
if(t.getType() == JavaCommentLexer.MultiLineComment) {
System.out.println("MultiLineComment :: " + t.getText().replace("\n", "\\n"));
}
}
}
}
回答:
\u002f\u002a <- multi line comment startmulti
line
comment // not a single line comment
\u002A/
public class Test {
// single line "not a string"
String s = "\u005C" \242 not // a comment \\\" \u002f \u005C\u005C \u0022;
/*
regular multi line comment
*/
char c = \u0027"'; // the " is not the start of a string
char q1 = '\u005c''; // == '\''
char q2 = '\u005c\u0027'; // == '\''
char q3 = \u0027\u005c\u0027\u0027; // == '\''
char c4 = '\047';
String t = "/*";
\u002f\u002f another single line comment
String u = "*/";
}
现在,要运行演示,请执行以下操作:
bart@hades:~/Programming/ANTLR/Demos/JavaComment$ java -cp antlr-3.2.jar org.antlr.Tool JavaCommentLexer.gbart@hades:~/Programming/ANTLR/Demos/JavaComment$ javac -cp antlr-3.2.jar *.java
bart@hades:~/Programming/ANTLR/Demos/JavaComment$ java -cp .:antlr-3.2.jar Main
并且您将看到以下内容打印到控制台:
MultiLineComment :: \u002f\u002a <- multi line comment start\nmulti\nline\ncomment // not a single line comment\n\u002A/SingleLineComment :: // single line "not a string"
SingleLineComment :: // a comment \\\" \u002f \u005C\u005C \u0022;
MultiLineComment :: /*\n regular multi line comment\n */
SingleLineComment :: // the " is not the start of a string
SingleLineComment :: // == '\''
SingleLineComment :: // == '\''
SingleLineComment :: // == '\''
SingleLineComment :: \u002f\u002f another single line comment
回答:
当然,您可以使用正则表达式自己创建一种词法分析器。但是,以下演示不处理源文件中的Unicode文字:
回答:
/* <- multi line comment startmulti
line
comment // not a single line comment
*/
public class Test2 {
// single line "not a string"
String s = "\" \242 not // a comment \\\" ";
/*
regular multi line comment
*/
char c = '"'; // the " is not the start of a string
char q1 = '\''; // == '\''
char c4 = '\047';
String t = "/*";
// another single line comment
String u = "*/";
}
回答:
import java.util.*;import java.io.*;
import java.util.regex.*;
public class Main2 {
private static String read(File file) throws IOException {
StringBuilder b = new StringBuilder();
Scanner scan = new Scanner(file);
while(scan.hasNextLine()) {
String line = scan.nextLine();
b.append(line).append('\n');
}
return b.toString();
}
public static void main(String[] args) throws Exception {
String contents = read(new File("Test2.java"));
String slComment = "//[^\r\n]*";
String mlComment = "/\\*[\\s\\S]*?\\*/";
String strLit = "\"(?:\\\\.|[^\\\\\"\r\n])*\"";
String chLit = "'(?:\\\\.|[^\\\\'\r\n])+'";
String any = "[\\s\\S]";
Pattern p = Pattern.compile(
String.format("(%s)|(%s)|%s|%s|%s", slComment, mlComment, strLit, chLit, any)
);
Matcher m = p.matcher(contents);
while(m.find()) {
String hit = m.group();
if(m.group(1) != null) {
System.out.println("SingleLine :: " + hit.replace("\n", "\\n"));
}
if(m.group(2) != null) {
System.out.println("MultiLine :: " + hit.replace("\n", "\\n"));
}
}
}
}
如果运行Main2
,则会在控制台上打印以下内容:
MultiLine :: /* <- multi line comment start\nmulti\nline\ncomment // not a single line comment\n*/SingleLine :: // single line "not a string"
MultiLine :: /*\n regular multi line comment\n */
SingleLine :: // the " is not the start of a string
SingleLine :: // == '\''
SingleLine :: // another single line comment
以上是 如何在源代码中找到所有注释? 的全部内容, 来源链接: utcz.com/qa/408109.html