Home >Operation and Maintenance >Safety >Google's case study on building static code analysis tools

Google's case study on building static code analysis tools

WBOY
WBOYforward
2023-06-05 22:22:591387browse

Software bugs cost developers and software companies a lot of time and money. Taking 2014 as an example, a ("goto fail") bug in the widely used SSL protocol implementation caused invalid SSL certificates to be accepted, and another bug related to date formatting caused widespread service outages on Twitter. Such errors can often be detected by static analysis. In fact, they can be quickly identified during the reading of code or documentation, and the final reality is that the situation still occurs in the production environment.

Previous work has well reported experiences in applying bug detection tools to software development. However, although there are so many successful cases of developers using static analysis tools, there are still the following reasons why engineers are not always willing to use static analysis tools or actively ignore the warning information generated by the tools:

  • Not properly integrated. The tool is not integrated into the developer's workflow or the program takes too long to run;

  • Invalid warning. The feasibility of the alarm information is poor;

  • is not trustworthy. Users no longer trust the results due to false positives; the actual utilization scenario of the

  • flaw is unclear. The reported bug is theoretically feasible, but the flaw is not clear in actual utilization scenarios;

  • The cost of repair is too high. Fixing detected code defects is too costly or otherwise risky;

  • alerts are difficult to understand. Users do not understand the specific information and principles of the alarm information.

The following article describes how we learned from Google’s previous experience and lessons in using FindBug for Java language analysis and academic literature, and finally successfully built software engineers at Google. Static analysis infrastructure architecture for daily use. Using input from engineers, Google's tools can detect thousands of issues that engineers fix every day before problematic code is merged into company-wide code repositories.

In terms of tool scope, we focus on integrating static analysis into Google's core development process and serving the majority of Google developers. Many static code analysis tools will be dwarfed by the 2 billion lines of code deployed at Google, so the technology to run complex analysis in large-scale scenarios is not a high priority.

Of course, it must be taken into account that Google's external developers working in specialized fields (such as aerospace and medical equipment fields) may use specific static analysis tools and workflows. Also developers whose development projects involve specific types (such as kernel code and device drivers) may require specific analysis methods. There have been many excellent results in static analysis. We do not believe that the experiences and insights we have reported back are unique, but we firmly believe that it is beneficial to organize and share our work in improving Google's code quality and development experience. of.

Definition of Terms. We use the following definition of the term: An analysis tool runs one or more "checkers" on the source code and identifies "defects" that may present software failures. If a developer fails to take proactive action after seeing an issue, we consider it an "actual false positive" if a developer encounters an identified flaw and does not take the appropriate fix. If static analysis does not accurately identify a reported defect, but the developer proactively takes steps to modify the code to improve readability and maintainability, then this is not a valid "false positive". If an analysis reports an actual code error, but the developer did not understand the code issue and took no action, this is considered an "actual false positive." We use this conceptual distinction to emphasize the importance of the R&D perspective. The developer, not the tool author, perceives and directly affects the false positive rate of the tool.

How Google compiles and builds software

Below we will outline the key points of Google’s software development process. At Google, almost all development tools (except development environments) are centralized and standardized. Much of the infrastructure is built with Scratch owned by internal teams, retaining the flexibility to be experimental.

Source code control and code ownership. Google has developed and uses a single-source code management system. And experiment with a single branch storing (almost) all Google's proprietary code. Developers use a "trunk-based" development approach that limits branches, often divided by releases rather than features. Any engineer can change any code with the approval of the code owner. Code ownership is based on paths; the owner of a directory has the same permissions on subdirectories.

System construction. All code in the Google code base is compiled with a compilation-independent version of Bazel, which means that all inputs must be explicitly declared and stored in source control to facilitate easy distribution and parallelization of the build. Java rules in Google's build system rely on the JDK and source-controlled Java compiler, and these binaries can be updated for all users by quickly introducing new versions. Builds usually come from source (via head) and rarely have binary components checked into branches. Because all developers use the same build system, any code can compile successfully without errors.

analyzing tool. The static analysis tools used by Google are generally not complex. Google infrastructure does not support running inter-procedural or program-based integrity analysis on systems at this level, nor does it use advanced static analysis techniques (such as separation logic technology) at scale. Even a simple checker requires analytics infrastructure to support integration into the workflow. Analyzer types that have been deployed as part of the general development process include:

  • Style checkers (such as Checkstyle, Pylint, and Golint);

  • Extended bug-finding compilers (such as Error Prone, ClangTidy, Clang Thread SafetyAnalysis, Govet, and CheckerFramework), including but not limited to abstract syntax tree pattern matching tools, type-based checkers, and analyzers that detect uncalled variables.

  • Call the profiler of the production service (e.g. check if the employee mentioned in the code comment still works at Google);

  • Check Properties of the build output (such as the size of the output binary).

Google's C linter can catch the "goto fail" vulnerability by checking whether there are parentheses after the if statement. A pattern-matching-based checker would identify date formatting errors, so the code that crashed Twitter wouldn't compile with Google. Google developers also use dynamic analysis tools such as AddressSanitizer to find buffer vulnerabilities and ThreadSanitizer to find data race issues. These tools run during testing and sometimes even in environments with production traffic.

Integrated Development Environment (IDE). The entry point for static analysis issues early in the development process is to be integrated into the IDE. But Google developers use a wide variety of editors, so it's difficult to consistently detect errors from all developers before calling the build tool. While Google does use analytics integrated with popular in-house IDEs, requiring a specific IDE that can analyze is a long and arduous road ahead.

test. Almost all Google code contains corresponding testing links, from unit testing to large-scale integration testing. Testing activities are the first concepts that need to be integrated in system construction. Just like the compilation process, they are independent and distributed. For most projects, developers write and maintain test cases for the code; projects typically do not have a separate testing or QA group.

Google's continuous building and testing system will run tests every time the code is submitted, and will provide timely feedback on build failures or test cases that fail due to developers' code changes. It also supports testing changes before committing to avoid breaking project dependencies.

Code review. Every code submitted to Google will first pass code review. While any developer can make changes to any part of Google's code, the owner of the code must review and approve the changes before they are submitted for merging. Additionally, even code owners must review their code before committing changes. Code reviews are conducted through a centralized, web-based tool that is tightly integrated with other development infrastructure. Static analysis results can be displayed in code reviews.

Code release. Google teams release releases frequently, and most of the release verification and deployment process is automated through "push on green" methods, which means it is difficult to rely on the laborious manual release verification process. If a Google engineer discovers a bug in production, a new version can be rolled back and deployed to production servers at a relatively low cost compared to having to disrupt service.

Learning from FindBugs

From the early exploratory research stage from 2008 to 2010, Google’s static analysis technology focused on using FindBug for Java analysis: led by William Pugh of the University of Maryland and Pennsylvania An independent tool created by David Hovemeyer of York State College. The principle is to analyze compiled Java class files and extract code structure models that can cause bugs. As of January 2018, FindBugs is only a command line tool used by a very small number of engineers at Google. A small Google team called "BugBot" worked with the original author Pugh and made three major attempts to integrate FindBugs into the Google development process.

We learned the following points through trying:

Try 1: Bug dashboard. Initially in 2006, FindBugs was integrated into a centralized tool to scan the entire Google code base every night, logging the results for engineers to view through the dashboard. Although FindBugs found hundreds of errors in Google's Java code base, the dashboard had little effect because the error message dashboard was separated from the daily development process and could not be integrated with other existing static analysis results.

Attempt 2: Focus on improving bugs.

Next, the BugBot team began to manually classify the new problems found every night and identify and process relatively important bug reports. In May 2009, hundreds of Google engineers participated in a company-wide "Fix it" week focused on solving FindBugs alerts. A total of 3,954 alerts were reviewed (42% of the total 9,473), but only 16% (640) fixed. Actually 44% of reported results (1746) have been submitted for bug feedback tracking. Although the Fixit campaign confirmed that many of the issues found by FindBugs were actual code defects, a large number were not important enough to warrant actual fixes. Manually classifying issues and submitting bug reports is difficult to sustain in a large-scale environment.

Attempt 3: Integrate into code review. Next, the BugBot team integrated and implemented such a system: when the reviewer is notified that the review is pending, FindBugs will automatically run and the scan results will be displayed as comments for the code review. The above code review team has already implemented this for coding standards/style issues. Finish. Google developers can ignore false positives and filter FindBugs results for credibility. The tool further attempts to display only new FindBugs alerts, but sometimes treats them as new issues due to incorrect classification. As the code review tool was replaced in 2011, this integration came to an end, for two reasons: high false positive rates in practice caused developers to lose confidence in the tool, and developers' freedom to customize filtering caused all parties to question the analysis. The results are inconsistent.

Incorporated into the compilation process

At the same time as the FindBugs experiment, Google's C development process continued to improve by adding new checking rules to the Clang compiler. The Clang team implemented a new compiler checker, including fix recommendation information, used ClangMR to run an updated compiler optimization check across the entire Google codebase in a distributed approach, and coded implementations to fix existing bugs in the codebase. question. Once the codebase has been marked with fixed issues, the Clang team applies a new checker to flag new issues as compiler errors (rather than warnings, which the Clang team found Google developers would ignore) to abort the build, which must be addressed. to pass. The Clang team has been very successful in improving codebase quality through this strategy.

We followed this idea and built a simple and easy-to-use Java static analysis tool based on pattern analysis called Error Prone based on the javac compiler. The first check rule introduced is called PreconditionsCheckNotNull, which is used to detect whether the method detection input parameter is empty at the beginning of the program running, such as checkNotNull ("uid is null", uid) instead of checkNotNull (uid, "uid was null").

In order to start a checker like PreconditionsCheckNotNull without breaking any consecutive builds, the Error Prone team uses it to run such checks on the entire code base using a javac-based MapReduce program, analogous to ClangMR, using the FlumeJava build call It's JavacFlume. JavacFlume will generate a series of fix suggestions, compare the differences, and then apply these fixes to the entire code base. The Error Prone team uses the internal tool Rosie to break large-scale code changes into smaller changes, each change will only affect a single project, test these changes, and send them to the appropriate team for code review. Teams only review fixes that apply to their code, and only if they are approved for inclusion will Rosie commit the actual changes. In the end, all repairs and changes to existing problems were approved, and all existing defects were resolved. The team officially opened the compiler error method.

When we surveyed developers who received these patches for feedback, 57% of those who received fixes incorporated into the code were happy to receive such information, and 41% were neutral. Only 2% of people reacted negatively and said: "This will only increase my workload"

The value of compiler checking

Compilation errors are displayed early in the development process and have been integrated into the developer workflow. We found that extending the compile checker effectively improved code quality at Google. Because the checks in Error Prone are written internally against javac's abstract syntax tree rather than bytecode (unlike FindBugs), developers outside the team can make checks relatively easily. Leveraging these external contributions is critical to increasing Error Prone's overall impact. As of January 2018, 162 authors contributed 733 checkers.

The sooner you report a problem, the better

Google's centralized build system records all build processes and build results, so we ensure that all users can see error messages within the specified time window. We sent a survey feedback to developers who had recently encountered a compiler error and to developers who had received recommendations for a fix for the same issue. Google developers believe that issues flagged at compile time (as opposed to patches merged into code) catch more important bugs; for example, survey participants believed that 74% of issues were flagged as "real issues" at compile time, while Only 21% of issues were found in the merged code. Additionally, survey participants rated 6% of issues found at compile time (versus 0% during the merge phase) as "critical." This result can be explained by the "survivor bias effect"; that is, bugs are more likely to be caught by more expensive means (such as testing and code reviews) at the time of code submission. Forcing as many checks as possible into the compiler is a sure way to avoid these costs.

Standards for compiler checks

In order to scale our work, because interrupting compilation will be a larger action, we have defined standards for enabling checks in the compiler, setting For strict high annotation mode. Compiler checks on Google should be simple to read, actionable, and easy to fix (when possible, error messages should include generally implementable fix suggestions); produce no valid false positives (analysis actions should not interrupt builds that are actually correct) bug-free code); and only report genuine bugs rather than issues with style or coding conventions. The main goal of measuring analyzers that meet these criteria is not just to simply detect problems, but to automatically fix these compiler errors throughout the code base. But these standards also limit the scope of checks that the Error Prone team can enable when compiling code; many problems that cannot be accurately detected or universally fixed are still problems before us.

Displaying alerts during code review

Once the Error Prone team has built the infrastructure needed to detect issues at compile time and has proven that the approach works, we hope to find out more There are many high-impact bugs, and the bugs are not limited to the compiler error checking we do and the analysis results we provide for languages ​​other than Java and C. The second integration entry point for static analysis results is Google's code review tool - Critique; static analysis results are displayed in Critique, Google's program analysis platform, by using Tricorder. As of January 2018, Google's C and Java versions have zero compiler errors, and all analysis results show up in compiler errors or during the code review phase.

Standards for code review checks

Unlike compile-time checks, the analysis results displayed during code review are allowed to include an effective false positive rate of up to 10%. The feedback expected during code reviews is not always perfect, and developers need to evaluate corresponding fix suggestions before actually adopting them. Google's checkers during the code audit phase should meet the following criteria:

Easy to understand. It is clear and easy to understand for engineers; the

solution is feasible and easy to repair. The fix may require more time, thought, and effort than the compiler check phase, and the results of the check should include guidance on how to define the problem;

has an effective false positive rate of less than 10%. Developers should feel that the checker finds actual bugs at least 90% of the time;

has a significant impact on code quality. Issues found may not prevent the program from running correctly, but developers should take them seriously and choose to fix them.

Some problems are serious enough to be flagged in the compiler, but it is not feasible to work on them or to mitigate or develop automatic fixes. For example, fixing some problems may require refactoring the code. Enabling these checks as compiler errors would require manual cleanup of the existing implementation, which is not feasible on a code base as large as Google's. Analysis tools showing these checks in code reviews avoid introducing new problems, allowing developers to decide whether to take steps to make appropriate fixes. Code reviews are also a good time to report relatively unimportant issues, such as specification issues or simplification of optimized code. In our experience, reporting during compilation is always difficult for developers to accept and makes rapid iteration and debugging more difficult; for example, a detector for unreachable code paths may hinder a Code block for debugging. But during a code review, developers are carefully preparing to complete their code; they are in a receptive mindset, more receptive to issues with readability and stylistic details.

Tricorder

Tricorder is designed to be easily extensible and support many different types of program analysis tools including static and dynamic analysis tools. We show some Error Prone checkers in Tricorder that cannot be enabled as compiler errors. Error Prone has also created a new set of C analysis components that integrate with Tricorder called ClangTidy. The Tricorder analyzer's reports support results in over 30 languages, supports simple syntax analysis such as style checkers, leverages compiler information for Java, JavaScript and C, and can be integrated directly with production data (e.g. about currently running tasks and jobs) ). Tricorder continues to be successful at Google because it is a plug-in model that supports an ecological platform for analyzer writers, highlights feasible fixes during the code review process, and provides a feedback channel to improve the analyzer and ensure that analyzer developers Take action on positive feedback.

Enable users to contribute. As of January 2018, Tricorder includes 146 analyzers, 125 of which are from outside the Tricorder team, and seven plugin systems for hundreds of additional checks (such as ErrorProne and ClangTidy), which are two).

Reviewers weigh in to provide fix suggestions.

The Tricorder checker can provide code review tools with reasonable repair suggestions visible to code reviewers and developers. Reviewers can ask developers to fix defective code by clicking the "Please Fix" button on the analysis results. Reviewers typically do not approve code changes for inclusion until all of their comments (manual and automatically discovered) have been addressed.

Iterate over feedback from users. In addition to the "Please fix" button, Tricorder also provides a "useless" button that reviewers or proposers can click to indicate that they disagree with the findings of the analysis. The click action will automatically submit the bug in the bug tracker and point it to the team to which the analyzer belongs. The Tricorder team follows up on these "useless" clicks and calculates the click ratio between "please fix" and "useless" clicks. If the analyzer's ratio exceeds 10% then the Tricorder team will disable the analyzer until the author improves it. While the Tricorder team rarely disables analyzers permanently, it has disabled some analyzers (in a few scenarios) until the analyzer authors have removed and modified checkers that turned out to be cumbersome and useless.

Submitted bugs often improve analyzer performance, thereby greatly increasing developer satisfaction with these analyzers; for example, the Error Prone team developed a check in 2014 that flags when Guava Passing too many arguments to a function like printf. Printf-like functions don't actually accept all printf specifiers, only %S. About once a week the Error Prone team will receive a "dumb" bug claiming that the analysis is incorrect and that the number of format wildcards in the bug matching code actually matches the number of arguments actually passed. When the user attempts to pass a wildcard placeholder other than %s, the parser is actually incorrect in any case. So the team changed the code inspection description text to directly state that the function only accepts %s placeholders and stopped getting errors about that check.

Usage scale of Tricorder. As of January 2018, Tricorder has analyzed approximately 50,000 code review changes per day. Analysis is performed three times per second during peak hours. Reviewers click "Please fix" more than 5,000 times a day, and authors apply automatic repair solutions about 3,000 times a day. Tricorder analyzer receives feedback of 250 "useless" clicks per day.

The success of code review analysis shows that it occupies a "sweet spot" in Google's developer workflow. The analysis results displayed at compile time must be of a relative quality and accuracy that cannot be met by relying on the analyzer to continue to identify serious problems. After reviews and code are merged, developers face increased resistance to making changes. As a result, developers struggle to make changes to code that has already been tested and released and are less likely to address low-risk and less important issues. Many other analytics projects in software development organizations (such as Facebook Infer analytics for Android/iOS apps) also emphasize code review as a key entry point for reporting analytics results.

Extended Analyzer

As Google developers gain acceptance of the results of the Tricorder analyzer, they continue to request further extensions to the analyzer. Tricorder solves this problem in two ways: allowing customization at the project level and adding presentation analysis results at other points in the development process. In this section, we also discuss some of the reasons why Google has not yet leveraged more sophisticated analytics as a core development process.

Project-level customization

Not all requested analyzers are of equal value to the entire Google code base; for example, some analyzers are associated with higher false positive rates and therefore have a correspondingly high The false positive rate checker may need to be configured in a specific project to be effective. These profilers are only useful to the right teams.

In order to achieve these needs, our goal is to make Tricorder customizable. Our previous experience with customization for FindBugs was less effective; user-level customization resulted in differentiation within and between teams and decreased tool usage. Because each user can see a different view of an issue, there is no way to ensure that everyone working on the same project can see a specific issue. If a developer removes all unused imports from their team's code, then even if one other developer is inconsistent about removing unused imports, the change will quickly be rejected by the rollback.

To avoid such problems, Tricorder only allows configuration at the project level, ensuring that anyone who makes changes to a specific project sees a consistent view of analysis results related to that project. Maintaining consistency in the result view enables several types of analyzers to perform the following actions:

Produce binary results. For example, Tricorder includes a parser for protocol buffer definitions that identifies backwards-incompatible changes. This is used by developer teams to ensure persistent information in protocol buffers in serialized form, but is annoying for teams that don't store data in this form. Another example is having analyzers recommend using Guava or Java code implementations that don't make sense for projects that can't use those libraries or language features;

requires specific setup or in-code annotations. For example, teams can only use the Checker Framework's nullness to analyze if their code is properly annotated. Another analyzer that, when properly configured, will check the growth of the binary size and number of function calls of a specific Android binary, and warn developers whether the growth is expected or if it is approaching the limit range;

Support specific areas language (DSL) and team-specific coding guidelines. Some Google software development teams have developed some small DSL and want to run related checkers. Other teams have implemented best practices in readability and maintainability and want to continue to enforce these checks while being highly resource-intensive. A case of hybrid analysis based on the results of the included dynamic analysis. Such analysis provides some high value for some teams but is too costly or time-consuming for everyone.

As of January 2018, there are approximately 70 optional analytics within Google, with 2,500 projects enabling at least one of them. Dozens of teams across the company are actively developing new analyzers, most affiliated outside of the development tools group.

Other Workflow Integration Points

As developers’ trust in these tools grows, they also demand further integration into their workflows. Tricorder now provides analysis results by providing command line tools, continuous integration systems and code review tools.

Command line support. The Tricorder team has added command line support for developers who are essentially code managers, frequently browsing and cleaning up various alert analytics in the team's code base. These developers are also very familiar with the types of fixes that each analyzer will produce and have a high level of trust in a specific analyzer. So developers can use command line tools to automatically apply all fixes in a given analysis and make clean changes;

Code commit threshold. Some teams want specific analyzers to block code commits rather than just showing up in code review tools. Typically requests for the ability to block commits are made by teams with highly customized checkers that guarantee no false positives, often in custom DSLs or libraries.

The code shows the results. Code presentations are best for showing the scale of problems in large projects (or entire codebases). For example, analysis results when browsing code for a deprecated API can show how much work is required to migrate; or some security and privacy analyzes are global and require a professional team to review the results before determining whether there is an issue. Because analysis results are not displayed by default, Code Browser allows specific teams to enable analysis views and then scan the entire code base and review the results without distracting other developers from these analyzers. If the analysis results have an associated fix, the developer can apply the fix by simply clicking on the code browsing tool. Code Browser is also great for showing analysis results of production data utilization, since this data is not available until the code is committed and run.

Complex Analysis

All static analysis widely deployed at Google is relatively simple, although some teams do inter-procedural analysis with project-specific analysis frameworks targeting specific domains (such as Android apps). Google-scale process analytics is technically feasible. But such analysis is very challenging to implement. As mentioned above, all Google code is stored in a separate overall source code repository, so conceptually any code in the code repository can be part of any binary file. It is therefore conceivable to imagine a situation where the analysis results of a specific code review would require analysis of the entire code repository. While Facebook's Infer focuses on inter-procedural analysis, scaling split logic-based analyzers to multi-million-line code bases, scaling such an analyzer to Google's multi-billion-line code repositories still requires significant engineering effort. . As of January 2018, implementing a more sophisticated analytics system was not a priority for Google:

would require significant investment. Upfront infrastructure investment will be prohibitive; efforts will be needed to reduce false alarm rates. Analysis teams must develop techniques to significantly reduce the false positive rate for many analyzers and/or strictly limit which error messages are displayed, as FigureInfer does; there is more to be implemented. Analytics teams still have more "simple" analyzers to implement and integrate;

high upfront costs. We find this "simple" analyzer to be very cost-effective, which is a core motivation for FindBugs. In comparison, the upfront cost is high even when determining the cost ROI for more sophisticated checkers.

Please note that this ROI can be significant for developers outside of Google who work in specialized areas (such as aerospace and medical devices) or on specific projects (such as device drivers and mobile applications) difference.

Thinking

Our experience trying to integrate static analysis into Google workflows taught us the following valuable lessons:

It’s easy to find bugs. When a code base is large enough, it contains almost any code pattern imaginable. Even in mature code bases with complete test coverage and rigorous code review processes bugs loom large. Sometimes problems are not obvious from local inspection, and sometimes errors are introduced by seemingly harmless refactorings. For example consider the following code snippet using field f of type long,

result =

31 * result


(int) (f ^ (f >>> 32));

  • Imagine what would happen if the developer changed the type of f to int. The code continues to compile, but the right offset of 32 becomes a no-op operation, the field is XORed with itself, and the hash value of the variable becomes a constant 0. The result is that f no longer affects the value generated by the hashCode method. Any tool that can calculate the type of f can correctly detect the right offset of more than 31, we have fixed 31 code in Google's code base with this error, and also included the check in the compilation in Error Pone Server error.
Since finding errors is easy, Google uses simple tools to detect error types. Next, the analysis writer makes fine-tuning based on the results of running Google code.

Most developers don't use static analysis tools as much as they think they do. As with the development of many commercial tools, Google initially relied on the implementation of FindBugs. Engineers chose to access a centralized dashboard to view the problems found in their projects, but few of them actually viewed it this way. It's too late to find bugs that have been incorporated into the code (which may have been deployed and run without users perceiving the problem). To ensure that most or all engineers see static analysis warnings, the analysis tool must be integrated into the workflow and enabled by default for everyone. Projects such as Error Prone do not provide an error dashboard, but extend the compiler with additional checkers and display the analysis results during code review.

The developer’s feelings are crucial. In our experience and material accumulation, many attempts to integrate static analysis into software development organizations have failed. Engineers are generally not authorized by Google management to use static analysis tools. Engineers working on static analysis must demonstrate their impact with valid real-world data. For a static analysis project to be successful developers must perceive that they benefit from it and enjoy the value of using it.

To build a successful analytics platform, we build tools that provide high value to developers. The Tricorder team will carefully review the fixed issues, conduct actual surveys to understand how developers feel, make it easier to submit bugs through analysis tools, and use all this data to continuously improve. Developers need to build trust in analytics tools. If a tool wastes developer time with false positives and feedback on low-level issues, developers will lose confidence and ignore the results.

Go beyond finding bugs, fix them. A typical approach to promoting static analysis tools is to list a large number of problems in the code base. The purpose is to influence action by pointing out potential errors to be corrected or to prevent bugs from occurring in the future. But if developers are not incentivized to take action, this potentially desired outcome will remain unrealized. This is a fundamental flaw: Analysis tools measure their usefulness by the number of problems they identify, while process integration fails with only a handful of bug fixes. On the contrary, the Google static analysis team will be responsible for the corresponding repair work as well as finding bugs, using this as the criterion for successful closed loop. Focusing on fixing errors ensures the tool provides actionable recommendations and minimizes false positives. In many cases, fixing errors is as easy as finding them through automated tools. Even for difficult-to-solve problems, research over the past five years has highlighted new techniques for automatically creating fixes for static analysis problems.

Analyzer development requires collective efforts. While specific static analysis tools require expert developers to write the analysis, few experts may actually know which checks produce larger impact factors. Additionally, analyzer experts are often not domain experts (such as those working with APIs, languages, and security). Integration via FindBugs Only a handful of Google employees knew how to write the new checker, so the small BugBot team had to do all the work themselves. This limits the speed with which new checks can be added and effectively prevents others from benefiting from their domain knowledge contributions. Teams like Tricorder are now focused on lowering the standards for developer-provided checks, without requiring prior static analysis experience. The Google tool Refaster, for example, allows developers to write checkers by specifying examples before and after code snippets. Since contributors are often motivated to contribute after debugging erroneous code themselves, the new checks will save developer time over time.

Conclusion

Our experience is that integrating into the development process is the key to the implementation of static analysis tools. While checker tool authors may believe that developers should be happy to be faced with lists of defects in the code they write, we have not actually found that such lists provide incentives for developers to fix those defects. As analytics tool developers, we must define measurement effectiveness in terms of defects that are actually corrected, rather than giving developers numbers. This means our responsibilities extend far beyond the analytical tools themselves.

We advocate for a system focused on driving workflow integration as early as possible. Enable the checker as a compiler error whenever possible. To avoid disruptive build tool writers being tasked with fixing all existing issues in the codebase first, allowing us to continually improve the quality of Google's codebase one step at a time. Because we present error warnings in the compiler, developers deal with them immediately after writing the code so they can still make timely changes. To achieve this, we developed the infrastructure to run analysis and generate fixes across the vast Google code base. We also benefit from code reviews and commit automation that allow changes to hundreds of files, and of course an engineering culture that often tolerates changes being incorporated into legacy code because improving the code outweighs aversion to the risk of modification.

Code review is the best entry point for displaying analysis warnings before committing code. To ensure that developers accept the analysis results, Tricorder only displays issues during the code modification phase before developers commit changes, and the Tricorder team applies a series of criteria to select which alerts to display. Tricorder further collects statistics in the code review tool, which is used to detect the root cause of the analyzer generating a large number of invalid alerts.

In order to overcome the warnings being ignored, we worked hard to regain the trust of Google engineers and found that Google developers have a strong bias to ignore static analysis, and any report with an unsatisfactory false positive rate gives them a reason to do nothing. . The analysis team is very cautious about displaying inspection results as errors or warnings only after they have been reviewed against descriptive objective criteria, so developers are rarely overwhelmed, confused, or annoyed by analysis results. Surveys and feedback channels are important quality control methods for this process. Now that developers have regained trust in analytics results, the Tricorder team is addressing the need for more analytics to be more involved in Google developer workflows.

We have built a successful static analysis infrastructure at Google that prevents hundreds of bugs from entering the Google codebase every day, both at compile time and during code review. We hope that others can benefit from our experience and successfully integrate static analysis into their own workflows.

The above is the detailed content of Google's case study on building static code analysis tools. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete