Analisis statik ialah alat teguh yang membantu pembangun mengawal kualiti kod. Mari cuba bangunkan penganalisis mudah untuk Lua menggunakan Java dan lihat apa yang ada di bawah tudung penganalisis statik.
Kami akan menulis penganalisis statik untuk Lua di Java.
Kenapa Lua? Sintaksnya mudah, dan tidak perlu diselitkan dengan butiran. Ia juga merupakan bahasa yang baik, terutama jika dibandingkan dengan JavaScript. Kenapa Java? Ia mempunyai susunan teknologi serba lengkap untuk kami, dan Java juga merupakan bahasa yang mesra pengguna untuk pembangunan.
Tidak :) Walau bagaimanapun, artikel itu membincangkan cara kami membangunkan juruterbang penganalisis tulen di hackathon dalaman. Jadi, tajuk itu bukan sekadar umpan klik—kami benar-benar mempunyai 48 jam. Pasukan kami mempunyai tiga pembangun, jadi jika anda ingin mencuba solo ini, bersedialah untuk meluangkan sedikit masa lagi.
Memandangkan ini hanyalah juruterbang, kami akan merujuk penganalisis kami sebagai "Mun" dalam artikel.
Sebelum kita bermula, lebih baik kita sedar apa itu penganalisis dan memetakan skop kerja. Secara keseluruhannya, ia adalah jelas: ambil kod dan rungut jika terdapat sesuatu yang tidak kena di sini. Apa sebenarnya yang kita perlukan? Kami berminat dengan aspek analisis statik berikut:
Bunyinya membosankan, tetapi ini hanya teras penganalisis. Btw, pengkompil mempunyai perkara yang sama di bahagian hadapan. Walau bagaimanapun, bahagian paling rumit (penjanaan kod) hanya bersembunyi di bahagian belakang.
Apakah rancangan untuk teras? Skopnya adalah seperti berikut:
Sememangnya, terlalu banyak untuk 48 jam:) Ada yang perlu ditinggalkan, ada yang diperkemas, dan ada yang digunakan semula. Semasa kami meneruskan artikel, kami akan mempertimbangkan kes sedemikian.
Untuk mendapatkan pemahaman yang lebih baik, mari kita susun skema:
Itu sudah cukup untuk bermula. Anda boleh mengetahui lebih lanjut tentang analisis statik di tapak web kami.
Jadi, lexer dan parser adalah asas penganalisis, kita tidak boleh pergi lebih jauh tanpa mereka. Jika kita mahukan sesuatu yang melebihi ungkapan biasa.
Ia agak mudah dengan lexer: adalah menyusahkan untuk mengendalikan kod sumber sebagai teks. Jadi, kita perlu menterjemahkannya ke dalam perwakilan pertengahan, iaitu membahagikannya kepada token. Yang berkata, lexer itu harus tidak pandai. Ini mesti dimiliki untuknya kerana semua perkara yang paling sukar dan paling kucar-kacir pergi ke penghurai.
Kemudian mari kita beralih ke penghuraikan. Ia mengambil token masuk, menghitungnya dan membina AST. Berikut ialah latar belakang ringkas.
Bahasa mempunyai tatabahasa dan terdapat pelbagai jenis penghurai untuk digunakan dengannya. Jika kita menulisnya sendiri, adalah lebih baik untuk menulis penghurai LL(1) mudah untuk tatabahasa tanpa konteks. Seperti yang kami katakan sebelum ini, Lua mempunyai tatabahasa intuitif jadi ini sudah memadai.
LL bermaksud rentetan input dibaca dari kiri ke kanan dan output kiri ke kanan dibina untuknya. Secara umum, tidak cukup hanya melihat token semasa, (LL(0)) , jadi kita mungkin perlu melihat token k di hadapan. Penghurai sedemikian dipanggil LL(k).
Namun, kami tidak memerlukannya, kerana kami tidak akan menulis apa-apa. kenapa? Kami hanya mempunyai 48 jam dan tiada masa untuk mencipta penghurai—terutamanya jika anda tidak pernah membangunkannya.
Apakah alternatif yang kita ada untuk menulis lexer dan parser kita? Penjana! Ini ialah keseluruhan kelas utiliti yang hanya memerlukan kami memberikan fail tatabahasa yang diterangkan khas sebagai input untuk menjana penghurai.
Kami telah memilih ANTLR v4. Alat ini juga ditulis dalam Java, yang menjadikannya sangat mudah untuk digunakan. Selama bertahun-tahun pembangunan, ia telah mula berjalan dengan baik.
Isunya terselit di sini juga—kita perlu tahu cara menulis fail tatabahasa untuk penghurai. Nasib baik, tidak sukar untuk mencari pilihan siap sedia di GitHub, jadi kami hanya mengambilnya dari sini.
Setelah kami mengkonfigurasi projek dengan ANTLR, mari kita teruskan untuk membina pepohon sintaks abstrak.
Bercakap tentang pokok: ANTLR direka bentuk dengan baik untuk menggambarkan analisis kod. Sebagai contoh, berikut ialah pengiraan faktorial:
function fact (n) if n == 0 then return 1 else return n * fact(n-1) end end print("enter a number:") a = io.read("*number") print(fact(a))
Kita boleh dapatkan pokok berikut:
Kami berpendapat ia mungkin lebih dekat dengan pokok parse berkenaan dengan pengelasan. Kita boleh berhenti di sini dan mula bekerja dengannya.
Tetapi kami tidak akan berbuat demikian dan menukarnya kepada AST. kenapa? Lebih mudah bagi kami, sebagai pasukan Java, untuk bekerja dengan pokok yang serupa dengan pokok dalam penganalisis kami. Anda boleh membaca lebih lanjut mengenai pembangunan Java di sini. Hierarki kelas sampel daripada Spoon kelihatan seperti ini:
Nah, cukup untuk menangguhkan kerja manual, sudah tiba masanya untuk menulis kod. Kami tidak menunjukkan keseluruhan kod sepenuhnya—ia tidak masuk akal kerana ia besar dan tidak sedap dipandang. Untuk memberi anda sedikit gambaran tentang cara pemikiran kami, saya akan meninggalkan beberapa coretan kod di bawah spoiler untuk orang yang berminat.
Kami mula mengendalikan dari atas pokok:
public void enterStart_(LuaParser.Start_Context ctx) { _file = new CtFileImpl(); _file.setBlocks(new ArrayList<>()); for (var chunk : ctx.children) { if (chunk instanceof LuaParser.ChunkContext) { CtBlock block = getChunk((LuaParser.ChunkContext)chunk); if (block == null) continue; block.setParent(_file); _file.getBlocks().add(block); } } } private CtBlock getChunk(LuaParser.ChunkContext ctx) { for (var block : ctx.children) { if (block instanceof LuaParser.BlockContext) { return getBlock((LuaParser.BlockContext)block); } } return null; } private CtBlock getBlock(LuaParser.BlockContext ctx) { CtBlock block = new CtBlockImpl(); block.setLine(ctx.start.getLine()); block.setColumn(ctx.start.getCharPositionInLine()); block.setStatements(new ArrayList<>()); for (var statement : ctx.children) { if (statement instanceof LuaParser.StatContext) { var statements = getStatement((LuaParser.StatContext)statement); for (var ctStatement : statements) { ctStatement.setParent(block); block.getStatements().add(ctStatement); } } } return block; }
Kami hanya melalui pokok dari atas ke bawah, membina pokok kami di sepanjang jalan. Lambat laun, kita akan sampai ke nod terminal. Berikut ialah parameter fungsi:
private List<CtParameter> parseParameters( LuaParser.NamelistContext ctx, CtElement parent ) { var parameters = new ArrayList<CtParameter>(); for (var child : ctx.children) { if (Objects.equals(child.toString(), ",")) continue; var parameter = new CtParameterImpl(); parameter.setParameterName(child.toString()); parameter.setParent(parent); parameters.add(parameter); } return parameters; }
Ia juga nampaknya tidak begitu mencabar—kita hanya menukar satu objek kepada objek lain. Mari kita bungkus penyenaraian kod di sini juga. Kami berharap bahawa cara ia berfungsi adalah jelas.
Terus terang, pendekatan ini tidak membawa kepada keuntungan besar dalam jangka masa panjang. Daripada menukar teks kepada pokok, kami menukar satu pokok kepada pokok yang lain. Kedua-dua tugas ini agak membosankan. Apakah pilihan yang kita ada?
Bahagian tersebut jelas tidak lengkap jika kami tidak membawa contoh analisis kod untuk AST kami. Anda boleh meneliti cetakan cantik pokok itu untuk pengiraan faktorial yang sama di bawah spoiler.
CtGlobal: CtFile: CtFunction: func Parameters: CtParameter: n CtBlock: CtIf: Condition: CtBinaryOperator: Equals Left: CtVariableRead: n Right: CtLiteral: 0 Then block CtBlock: CtReturn: CtLiteral: 1 Else block CtBlock: CtReturn: CtBinaryOperator: Multiplication Left: CtVariableRead: n Right: CtInvocation: fact Arguments: CtBinaryOperator: Minus Left: CtVariableRead: n Right: CtLiteral: 1 CtInvocation: print Arguments: CtLiteral: "enter a number:" CtAssignment: Left: CtVariableWrite: a Right: CtInvocation: CtFieldRead: Target: io Field: read Arguments: CtParameter: "*number" CtInvocation: print Arguments: CtInvocation: fact Arguments: CtVariableRead: a
Mari tamatkan di sini dengan ulasan ringkas tentang istilah. Walaupun adalah lebih baik untuk memanggil entiti kita sebagai penterjemah pokok, mari kita panggil ia sebagai penghurai buat masa ini.
Seterusnya, kami akan membangunkan ciri yang menjalankan analisis dan digunakan untuk membina peraturan diagnostik. Ia adalah lintasan pokok. Secara keseluruhan, mudah untuk memahami cara melaksanakan lelaran pokok. Walau bagaimanapun, kita perlu melakukan beberapa perkara yang berguna dengan nod pokok. Corak Pelawat berada di atas pentas.
Adakah anda masih ingat corak menarik daripada buku GoF klasik yang mempunyai pelaksanaan yang menarik tetapi senario penggunaan yang agak samar-samar? Jadi, kami mempunyai kes penanda aras tentang cara ia digunakan dalam keadaan sebenar. Untuk memastikan artikel itu ringkas, saya tidak akan menunjukkan cara ia dilaksanakan dalam buku, tetapi saya akan menunjukkan cara kami melakukannya dalam penganalisis.
Mari kita mulakan dengan mudah, rentas pokok. Kami mentakrifkan kelas CtScanner dan menambah dua kaedah imbasan untuk satu item dan koleksinya.
public <T extends CtElement> void scan(T element) { if (element != null) { element.accept(this); } } public <T extends CtElement> void scan(List<T> elements) { if (elements != null) { for (var element : elements) { scan(element); } } }
Adakah anda melihat ini menerima daripada CtElement? Dalam kes kami, mana-mana kelas yang mewarisi antara muka CtVisitable adalah untuk melaksanakan kaedah void accept(CtAbstractVisitor visitor). Kita akan bercakap tentang CtAbstractVisitor sedikit kemudian. Kini, sudah cukup untuk mengetahui bahawa CtScanner diwarisi daripadanya.
Beginilah rupa terima dalam CtAssignment:
@Override public void accept(CtAbstractVisitor visitor){ visitor.visitCtAssignment(this); }
Ya, ia adalah sekeping kek. Nod memanggil kaedah mereka dalam pelawat. Dalam CtScanner kami harus ada kaedah untuk setiap kelas nod pokok:
@Override public void visitCtIf(CtIf ctIf) { scan(ctIf.getCondition()); scan((CtStatement) ctIf.getThenStatement()); scan((CtStatement) ctIf.getElseStatement()); } @Override public <T> void visitCtLiteral(CtLiteral<T> literal) { } @Override public void visitCtStatement(CtStatement statement) { } // ....
Sekarang mari kembali ke CtAbstractVisitor, antara muka yang diekstrak daripada CtScanner kami. Ia termasuk kaedah untuk melawati nod pokok—tetapi hanya kaedah ini, tiada kaedah imbasan. Dalam pelaksanaan kaedah pelawat, kami sama ada meninggalkan ruang letak untuk beban lampau pada masa hadapan—jika ia adalah nod terminal—atau kami terus membuka lipatan nod pokok dengan melakukan penurunan rekursif di sepanjangnya. Itu sahaja yang kita perlu tahu untuk meneruskan.
It'd seem that's all with the core. For example, now our analyzer can catch simple errors like the variable self-assignment. We call it the signature analysis. To provide more advanced analysis, we'd like to obtain more data about what's going on in the code. The parser has done its job, it's time to create new entities.
So far, we've completely followed the way of compilers regarding of the analyzer structure, but now our path diverges. If Lua had a compiler, it'd start analysis to ensure that the code is syntactically correct. Although our tools will continue to overlap in features.
For our needs, let's create SemanticResolver for two purposes:
However, we'll call it from the parser along with its traversal. No decorator this time.
To keep the app structure seamless, we'll contain all the semantic data in the same AST nodes. First, let's define the necessary properties in the CtVariableAccess interface:
CtBlock getScope(); void setScope(CtBlock block); TypeKind getTypeKind(); void setTypeKind(TypeKind type);
Let's start with scope; this will be our key tool for defining the variable along with its name. First, we define the variable entity inside SemanticResolver. To keep it short, we'll show you only the interface, but the gist should be clear:
public static class Variable { public Variable(String identifier); public String getIdentifier(); public CtBlock getScope(); public void setScope(CtBlock block); public void setType(TypeKind type); public TypeKind getType(); // Methods use only identifier @Override public boolean equals(Object o); @Override public int hashCode(); }
Let's also further define the stack for variable scopes:
private final Stack<Pair<CtBlock, HashSet<Variable>>> stack = new Stack<>(); private final CtGlobal global; public SemanticResolver(CtGlobal global) { pushStack(global); this.global = stack.peek(); } public void pushStack(CtBlock block) { stack.push(Pair.of(block, new HashSet<>())); } public void popStack() { stack.pop(); }
The stack entry consists of a tuple of scope and variables set registered there. The stack operation is mundane. Here's how it works in the parser:
private CtBlock getBlock(LuaParser.BlockContext ctx) { CtBlock block = new CtBlockImpl(); resolver.pushStack(block); // .... resolver.popStack(); return block; }
We have to register the variables somehow. If a variable is local, it's simple—let's take the current scope and pass the variable there:
public CtBlock registerLocal(Variable variable) { var scope = stack.pop(); variable.setScope(scope.getLeft()); scope.getRight().add(variable); stack.push(scope); return scope.getLeft(); }
If the local keyword isn't used, the variable will be either global or be declared somewhere above. So, we first go through the stack and double-check if it exists:
public CtBlock registerUndefined(Variable variable) { var pair = lookupPair(variable); pair.getRight().add(variable); return pair.getLeft(); } public Pair<CtBlock, HashSet<Variable>> lookupPair(Variable variable) { var buf = new Stack<Pair<CtBlock, HashSet<Variable>>>(); Pair<CtBlock, HashSet<Variable>> res = null; while (!stack.isEmpty()) { var scope = stack.pop(); buf.push(scope); if (scope.getRight().contains(variable)) { res = scope; break; } } while (!buf.isEmpty()) { stack.push(buf.pop()); } if (res == null) { return global; } return res; }
We can set the scope to variables when we have the entry:
private CtVariableWrite getVariableWriteInternal( ParseTree ctx, boolean isLocal ) { var node = new CtVariableWriteImpl(); node.setVariableName(ctx.getChild(0).toString()); CtBlock scope; if (isLocal) { scope = resolver.registerLocal( new SemanticResolver.Variable(node.getVariableName())); } else { scope = resolver.registerUndefined( new SemanticResolver.Variable(node.getVariableName())); } node.setScope(scope); return node; }
And defining it for readings:
private CtExpression getExpression(LuaParser.ExpContext ctx) { // .... if (child instanceof LuaParser.PrefixexpContext) { // .... var scope = resolver.lookupScope( new SemanticResolver.Variable(variableRead.getVariableName()) ); variableRead.setScope(scope); return variableRead; } // .... }
We won't show the lookupScope code. It's a single-line wrapper over lookupPair, which we can see above. Here, we can end with the variable scope. We'll check the mechanism in the diagnostic rule on a separate section. Now we continue working on the semantic parsing. Now, let's move to the variable type determination.
How to determine the variable type? Indeed, we obtain them from the literals. Let's further define the type and the enumeration for them:
public interface CtLiteral<T> extends CtExpression, CtVisitable { // .... void setTypeKind(TypeKind kind); TypeKind getTypeKind(); }
public enum TypeKind { Undefined, Number, String, Boolean, Nil }
Thus, the data type can be numeric, string, logical, and nil. However, it'll be undefined by default. The split of undefined and nil may seem far-fetched, but it's okay for the pilot.
We store the literal type only in the tree node, doing it from the parser:
private <T> CtLiteralImpl<T> createLiteral( // .... TypeKind type, ) { // .... literal.setTypeKind(type); return literal; }
However, the variable type will be both in the tree and in SemanticResolver. So, we can request it during further traversal and the AST building:
private ArrayList<CtAssignment> parseAssignments(LuaParser.StatContext ctx) { // .... for (int i = 0; i < variables.size(); i++) { var assignment = new CtAssignmentImpl(); var variable = variables.get(i); // .... variable.setTypeKind(resolver.lookupType(variable.getVariableName())); resolver.setType( variable.getVariableName(), variable.getScope(), SemanticResolver.evaluateExpressionType(expression) ); } return assignments; }
There is no error in the operator precedence. Let the stored variable have the type from its past assignment. It'll facilitate our work in the future. As for the methods used here, there's nothing incredible about the lookupType implementation—it's basically the same as lookupPair. There's nothing complex about setType:
public void setType(String variable, CtBlock scope, TypeKind type) { var opt = stack.stream() .filter(x -> Objects.equals(x.getLeft(), scope)) .findFirst(); if (opt.isPresent()) { var pair = opt.get(); var newVar = new Variable(variable); var meta = pair.getRight() .stream() .filter(x -> x.equals(newVar)) .findFirst(); meta.ifPresent(value -> value.setType(type)); } }
However, evaluateExpressionType is trickier. Computing variable types in dynamic languages can be a bit of a hassle. Just look at the jokes about JavaScript and string concatenation. However, firstly, Lua has a separate operator '..', and secondly, we're trying to keep the process simple. So, we'll only determine whether all the operands are the same type. We'll use the familiar CtScanner.
public static TypeKind evaluateExpressionType(CtExpression expression) { Mutable<TypeKind> type = new MutableObject<>(null); var typeEvaluator = new CtScanner() { private boolean stop = false; @Override public void scan(CtElement el) { if (stop) { return; } if (el instanceof CtVariableRead || el instanceof CtLiteral<?>) { var newType = el instanceof CtVariableRead ? ((CtVariableRead) el).getTypeKind() : ((CtLiteral<?>) el).getTypeKind(); if (newType.equals(TypeKind.Undefined)) { type.setValue(TypeKind.Undefined); stop = true; return; } else if (type.getValue() == null) { type.setValue(newType); } else if (!type.getValue().equals(newType)) { type.setValue(TypeKind.Undefined); stop = true; return; } } super.scan(el); } }; typeEvaluator.scan(expression); return type.getValue(); }
In parseAssignments, we've set the assignment type to the (CtVariableWrite) variable but forgot about reading (CtVariableRead). Let's fix it:
private CtExpression getExpression(LuaParser.ExpContext ctx) { // .... if (child instanceof LuaParser.PrefixexpContext) { // .... variableRead.setTypeKind( resolver.lookupType(variableRead.getVariableName()) ); var scope = resolver.lookupScope( new SemanticResolver.Variable(variableRead.getVariableName())); variableRead.setScope(scope); return variableRead; } // .... }
We've completed the semantic analysis and almost ready to start searching for bugs.
First, we make two quick stops. While the topic of the data-flow analysis deserves a series of articles, it'd be wrong to skip over it. Here, we won't dive deep into theory but just try to memorize the values set by literals.
First, let's fall into the sin of self-copying and define the variable entity for DataFlow again but in a simpler way. Again, we'll only show you the interface:
private static class Variable { private Variable(String identifier, CtBlock scope); // Methods use identifier and scope @Override public boolean equals(Object o); @Override public int hashCode(); }
Here's the rest of the class content:
public class DataFlow { private static class Variable { // .... } Map<Variable, Object> variableCache = new HashMap<>(); public void scanDataFlow(CtElement element) { if (element instanceof CtAssignment) { CtAssignment variableWrite = (CtAssignment) element; if (variableWrite.getAssignment() instanceof CtLiteral<?>) { var assigned = variableWrite.getAssigned(); var variable = new Variable( assigned.getVariableName(), assigned.getScope() ); variableCache.put( variable, getValue(variableWrite.getAssignment()) ); } } } public Object getValue(CtExpression expression) { if (expression instanceof CtVariableRead) { CtVariableRead variableRead = (CtVariableRead) expression; var variable = new Variable( variableRead.getVariableName(), variableRead.getScope() ); return variableCache.getOrDefault(variable, null); } else if (expression instanceof CtLiteral<?>) { return ((CtLiteral<?>) expression).getValue(); } return null; } }
It's quite simple: in scanDataFlow, we associate the value with the variable, and in getValue, we extract that value for the set node. Everything is simple because we don't factor in branching, loops, or even expressions. Why don't we factor them in? Branching is the very topic that deserves its own series of articles. What about expressions? Well, we didn't make it in two days. Given what we've achieved in the last two days, we think this task is feasible. We just postpone it as a homework.
That's all. It's clear that such a solution is far from a real product, but we have laid some foundation. Then we can either try to enhance the code—and we'll introduce data-flow analysis via AST. Or we can redo everything properly and build the control-flow graph.
We've done the class implementation, but we haven't discussed how to use it. We've written the class, but what shall we do with it? Let's talk about it in the following section. Now we just say that DataFlow works right before calling the diagnostic rules. Those are called when traversing the finished AST. Thus, the rules will have access to the current variable values. This is called Environment; you can see it in your debugger.
Welcome to the last section regarding the core. We've already had the AST that is full of semantic data, as well as the data-flow analysis that is just waiting to be run. It's a good time to put it all together and set the stage for our diagnostic rules.
How is the analysis performed? It's simple: we get on the topmost tree node, and then we start recursive traversal. That is, we need something that will traverse the tree. We have CtScanner for that. Based on it, we define MunInvoker:
public class MunInvoker extends CtScanner { private final List<MunRule> rules = new ArrayList<>(); private final Analyzer analyzer; public MunInvoker(Analyzer analyzer) { this.analyzer = analyzer; rules.add(new M6005(analyzer)); rules.add(new M6020(analyzer)); rules.add(new M7000(analyzer)); rules.add(new M7001(analyzer)); } @Override public <T extends CtElement> void scan(T element) { if (element != null) { analyzer.getDataFlow().scanDataFlow(element); rules.forEach(element::accept); super.scan(element); } } }
You can notice a few unknown things in the code:
Otherwise, the class operation shouldn't raise any questions. Each time we enter any tree node, the analyzer rule is called, and the variable values are evaluated right before it. Next, the traversal continues in line with to the CtScanner algorithm.
So, we have the analyzer core prototype—it's a good time to start analyzing something.
The base for our rules is ready—it's the CtAbstractVisitor class. The analysis goes as follows: the rule overloads a few visitors and analyzes the data contained in the AST nodes. Let's extend CtAbstractVisitor with the abstract MunRule class, which we use to create rules. In this class, we also define the addRule method that generates warnings.
Speaking of warnings: what data do they need? First is a warning message to demonstrate users what they may have been mistaken about. Second, the user needs to know where and what the analyzer warns. So, let's add data about the file where the analyzer has detected the troublesome code block and the location of this code fragment.
Here's what the MunRule class looks like:
public abstract class MunRule extends CtAbstractVisitor { private Analyzer analyzer; public void MunRule(Analyzer analyzer) { this.analyzer = analyzer; } protected Analyzer getAnalyzer() { return analyzer; } protected void addRule(String message, CtElement element) { var warning = new Warning(); warning.message = message; WarningPosition pos = new WarningPosition( Analyzer.getFile(), element.getLine(), element.getColumn() + 1 ); warning.positions.add(pos); analyzer.addWarning(warning); } public DataFlow getDataFlow() { return analyzer.getDataFlow(); } }
The WarningPosition and Warning classes are just data stores, so we won't list them. We'll talk about addWarning now.
The last thing to prepare is how we're going to view the diagnostic rules. To do this, we combine all our features together, using the already mentioned Analyzer class. Actually, here it is:
public class Analyzer { private DataFlow dataFlow = new DataFlow(); public DataFlow getDataFlow() { return dataFlow; } public CtElement getAst(String pathToFile) throws IOException { InputStream inputStream = new FileInputStream(pathToFile); Lexer lexer = new LuaLexer(CharStreams.fromStream(inputStream)); ParseTreeWalker walker = new ParseTreeWalker(); var listener = new LuaAstParser(); walker.walk(listener, new LuaParser( new CommonTokenStream(lexer) ).start_()); return listener.getFile(); } protected void addWarning(Warning warning) { Main.logger.info( "WARNING: " + warning.code + " " + warning.message + " (" + warning.positions.get(0).line + ", " + warning.positions.get(0).column + ")"); } public MunInvoker getMunInvoker() { return new MunInvoker(this); } public void analyze(String pathToFile) { try { var top = getAst(pathToFile); var invoker = getMunInvoker(); invoker.scan(top); } catch (IOException ex) { Main.logger.error("IO error: " + ex.getMessage()); } } }
To explain how it works, we'll give you an example of the whole analysis process:
We've got the prep work—it's time to start writing diagnostic rules.
We've decided to start writing the rules with a simple one: PVS-Studio has the Java diagnostic rule, V6005, where the analyzer check if a variable is assigned to itself. It can simply be copied and slightly adapted to our tree. Since our analyzer is called Mun, we start the numbers of our diagnostic rules with M. Let's create the M6005 class, extending MunRule, and override the visitCtAssignment method in it. The following check will be located in the method:
public class M6005 extends MunRule { private void addRule(CtVariableAccess variable) { addRule("The variable is assigned to itself.", variable); } @Override public void visitCtAssignment(CtAssignment assignment) { if (RulesUtils.equals(assignment.getAssigned(), assignment.getAssignment())) { addRule(assignment.getAssigned()); } } }
The used RulesUtils.equals method is a wrapper and overload for another equals method that checks the name and scope:
public static boolean equals(CtVariableAccess left, CtVariableAccess right) { return left.getVariableName().equals(right.getVariableName()) && left.getScope().equals(right.getScope()); }
We need to check the scope, because the next code fragment isn't an assignment of the variable to itself:
local a = 5; begin local a = a; end
Now, we can test the diagnostic rule on some simple code and see if it works. The following example will issue the warning on the marked line: "M6005 The variable is assigned to itself":
local a = 5; local b = 3; if (b > a) then a = a; <= end
Well, we've warmed up, so let's move on. The analyzer has already had the primitive data-flow analysis (DataFlow) that we can and should use. Again, let's look at one of our existing diagnostic rules, V6020, where the analyzer checks for the zero division. We try to adapt it for our analyzer. The divisor can be either the variable as zero or a literal as zero. We need to access the cache of variables to check their value.
Here's the simple implementation of such a diagnostic rule:
public class M6020 extends MunRule { private void addWarning(CtElement expression, String opText) { addRule(String.format( "%s by zero. Denominator '%s' == 0.", opText, expression instanceof CtLiteral ? ((CtLiteral) expression).getValue() : ((CtVariableRead) expression).getVariableName() ), expression ); } @Override public void visitCtBinaryOperator(CtBinaryOperator operator) { BinaryOperatorKind opKind = operator.getKind(); if (opKind != BinaryOperatorKind.DIV && opKind != BinaryOperatorKind.MOD) { return; } apply(operator.getRightHandOperand(), opKind == BinaryOperatorKind.MOD); } private void apply(CtExpression expr, boolean isMod) { Object variable = getDataFlow().getValue(expr); if (variable instanceof Integer) { if ((Integer) variable == 0) { String opText = isMod ? "Mod" : "Divide"; addWarning(expr, opText); } } } }
We can see that the diagnostic rule works on simple examples and issues the following warning: "M6020 Divide by zero. Denominator 'b' == 0" on these lines:
local a = 5; local b = 0; local c = a / b; <= local d = a / 0; <=
If you have made the expression evaluator as your homework, you may try the diagnostic rule on this code:
local b = 7; local b = b - 7; local c = a / b;
Since we're writing the Lua analyzer, we need to write diagnostic rules for this language. Let's start simple.
Lua is a dynamic scripting language. Let's use this feature and write the diagnostic rule that allows us to catch overwritten types.
We'll also need to pick a new number for the diagnostic rule. Earlier we just copied them from the Java analyzer. Now, it seems like it's time to start a new thousand—the seventh. Who knows what diagnostic rule it'll be in PVS-Studio, but at the time of writing this article, it will be Lua.
Knowing about types, we facilitate our case: we need to check that the left and right assignments are of different types. Note that we ignore Undefined for the left part and Nil for the right part. The code looks like this:
public class M7000 extends MunRule { @Override public void visitCtAssignment(CtAssignment assignment) { var assigned = assignment.getAssigned(); var exprType = SemanticResolver.evaluateExpressionType( assignment.getAssignment()); if (assigned.getTypeKind().equals(TypeKind.Undefined) || exprType.equals(TypeKind.Nil) ) { return; } if (!assigned.getTypeKind().equals(exprType)) { addRule( String.format( "Type of the variable %s is overridden from %s to %s.", assigned.getTypeKind().toString(), exprType.toString() assigned, ); } } }
It's time to check the diagnostic rule in real cases. In the following example, our analyzer issues the warning only on the last line:
local a = "string"; if (true) then local a = 5; end a = 5; <=
The analyzer warning: "M7000 Type of the variable a is overridden from Integer to String".
Let's finish smoothly. The Lua plugin for VS Code has a diagnostic rule that detects global lowercase variables. This check can help detect the forgotten local identifiers. Let's implement the same diagnostic rule in our analyzer.
Here, just as before, we'll need to use the variable scope data that is obtained via the semantic analysis. We just find where the variable was declared. That's also where the variable is assigned a value. Then check its scope and name. If the variable is global and starts with a lowercase, the analyzer will warn. Easy-peasy.
Let's create a new class and override the visitCtAssignment method in it again. That way, we can look for the problematic global variables:
public class M7001 extends MunRule { @Override public void visitCtAssignment(CtAssignment assignment) { var variable = assignment.getAssigned(); var firstLetter = variable.getVariableName().substring(0, 1); if (variable.getScope() instanceof CtGlobal && !firstLetter.equals(firstLetter.toUpperCase())) { addRule("Global variable in lowercase initial.", variable); } } }
Let's check the diagnostic rule work. It issues the warning: "M7001 Global variable in lowercase initial." It's issued on the second line in the code snippet below:
function sum_numbers(b, c) a = b + c; <= return a; end local a = sum_numbers(10, 5);
Alright, we've written the diagnostic rules, we've done with the analyzer (and it even works). Breathe out. Now we can enjoy the fruits of our labors.
Above we've already shown the code that allows us to view warnings in a console or file. Here is how the process of working with the analyzer looks like in the console. We run the analyzer using the command:
java -jar mun-analyzer.jar -input "C:\munproject\test.lua"
And we get:
However, first, the analyzer is a tool, and it should be user-friendly. Instead of working with the console, it'd be much more convenient to work with a plugin.
Es ist wieder Zeit zum Ausleihen. PVS-Studio verfügt bereits über Plugins für viele verschiedene IDEs, beispielsweise für Visual Studio Code und IntelliJ IDEA. Sie können Warnungen anzeigen und durch sie navigieren. Da Analyseberichte standardisierte Formate haben, können wir den JSON-Berichtsgenerierungsalgorithmus einfach von unserem Java-Analysator ausleihen. Der Algorithmus ist umfangreich und langweilig, daher werden wir ihn nicht zeigen. Wir müssen es immer noch über die Befehlszeile ausführen, jedoch mit dem Argument* -output „D:git/MunProject/report.json“*.
Dann können wir den Bericht in IntelliJ IDEA oder VS Code öffnen und uns die Warnungen des Lua-Analysators ansehen:
Süß! Jetzt können wir den Analysator für seinen vollen Zweck nutzen, ohne dass die Benutzerfreundlichkeit darunter leidet.
Haben wir also den vollwertigen Analysator geschrieben? Äh, nicht ganz. Am Ende dieser langen Reise haben wir den realsten Piloten, der alle Etappen durchläuft. Der Wachstumsspielraum ist jedoch enorm:
Und noch viel, viel mehr. Mit anderen Worten: Der Weg vom Piloten zum vollwertigen Werkzeug ist ziemlich steinig. Daher konzentriert sich PVS-Studio auf die bestehenden Richtungen: C#, C, C++ und Java statt auf neue. Wenn Sie in einer dieser Sprachen schreiben, können Sie unseren Analysator ausprobieren.
Der Artikel ist am Ende viel länger geworden, als wir gedacht hatten, also hinterlassen Sie bitte einen Kommentar, wenn Sie am Ende angekommen sind :) Wir würden uns über Ihr Feedback freuen.
Wenn Sie sich für das Thema interessieren, können Sie sich über die Entwicklung von Analysegeräten für unsere Sprachen informieren:
Atas ialah kandungan terperinci Bagaimana untuk membangunkan penganalisis kod dalam beberapa jam. Untuk maklumat lanjut, sila ikut artikel berkaitan lain di laman web China PHP!