The system software that processes high-level language source programs into target programs is the "compiler". A compiler refers to a translation program that translates a source program written in a high-level programming language into an equivalent target program in machine language format. The work process of a compiler translating a source program into a target program is divided into five stages: lexical analysis; syntax analysis; intermediate code generation; code optimization; target code generation; mainly lexical analysis and syntax analysis, also known as source program Analysis, if any grammatical errors are found during the analysis, a prompt message will be given.
The operating environment of this tutorial: Windows 7 system, Dell G3 computer.
System software that can process a source program written in a high-level language into a target program is a "compiler".
Compiler, compiling program, also called a compiler, refers to a translation program that translates a source program written in a high-level programming language into an equivalent target program in machine language format. . Compilers are translation programs implemented using a generative implementation approach. It takes a source program written in a high-level programming language as input, and a target program expressed in assembly language or machine language as output. The compiled target program usually also goes through a running stage in order to run with the support of the running program, process the initial data, and calculate the required calculation results.
The compiler must analyze the source program and then synthesize it into the target program. First, check the correctness of the source program and decompose it into several basic components; secondly, establish corresponding equivalent target program parts based on these basic components. In order to complete these tasks, the compiler must create some tables during the analysis phase and transform the source program into an intermediate language form so that it can be easily referenced and processed during analysis and synthesis.
Characteristics of the compiler:
The compiler must analyze the source program and then synthesize it into the target program. First, check the correctness of the source program and decompose it into several basic components; secondly, establish corresponding equivalent target program parts based on these basic components. In order to complete these tasks, the compiler must create some tables during the analysis phase and transform the source program into an intermediate language form so that it can be easily referenced and processed during analysis and synthesis.
The main data structures used in data structure analysis and synthesis include symbol tables, constant tables and intermediate language programs. The symbol table consists of the identifiers used in the source program together with their attributes, which include types (such as variables, arrays, structures, functions, procedures, etc.), types (such as integers, real types, strings, complex types, labels) etc.), and other information required by the target program. The constant table consists of the constants used in the source program, including the machine representation of the constants, and the target program addresses assigned to them. An intermediate language program is an intermediate form of program introduced before translating the source program into the target program. The choice of its representation depends on how the compiler will use and process it later. Commonly used intermediate language forms include Polish representation, triples, quadruples, and indirect triples.
Analysis of part of the source program is achieved through three steps: lexical analysis, syntax analysis and semantic analysis. Lexical analysis is completed by a lexical analysis program (also called a scanner), whose task is to identify words (i.e. identifiers, constants, reserved words, and various operators, punctuation marks, etc.), create symbol tables and constant tables, and convert The source program is converted into an internal form that is easy to analyze and process by the compiler. The syntax analyzer is the core part of the compiler. Its main task is to check whether the source program is grammatical according to the grammatical rules of the language. If it is not grammatical, a syntax error message will be output; if it is grammatical, the grammatical structure of the source program will be decomposed and an internal program in the form of intermediate language will be constructed. The purpose of grammatical analysis is to understand how words form sentences and how statements form programs. The semantic analysis program further checks the semantic correctness of legal program structures. Its purpose is to ensure the correct use of identifiers and constants, collect and save necessary information into symbol tables or intermediate language programs, and perform corresponding semantic processing.
The working process of a compiler
A compiler is also called a compilation system, which translates process-oriented source programs written in high-level languages into The target program's language processor. The compiler's process of translating a source program into a target program is divided into five stages: lexical analysis; syntax analysis; intermediate code generation; code optimization; and target code generation. It mainly performs lexical analysis and syntax analysis, also known as source program analysis. During the analysis process, grammatical errors are found and prompt information is given.
(1) Lexical analysis
The task of lexical analysis is to process words composed of characters and scan the source program character by character from left to right. Generate word symbols one by one, and transform the source program as a string into an intermediate program of word symbol strings. A program that performs lexical analysis is called a lexer or scanner.
The word symbols in the source program are analyzed by the scanner and generally produce binary formulas: word category; the value of the word itself. Word categories are usually encoded with integers. If a category contains only one word symbol, then for this word symbol, the category encoding completely represents its own value. If a category contains many word symbols, then for each of its word symbols, in addition to the category code, its own value should also be given.
Lexical analyzers are generally constructed in two ways: manual construction and automatic generation. Manual construction can work using state diagrams, automatic generation can be implemented using deterministic finite automata.
(2) Syntax analysis
The syntax analyzer of the compiler takes word symbols as input and analyzes whether the word symbol string forms a grammatical unit that conforms to grammatical rules, such as expression Formulas, assignments, loops, etc., and finally see if it constitutes a program that meets the requirements. Analyze and check whether each statement has the correct logical structure according to the grammatical rules used in the language. The program is the final grammatical unit. The grammatical rules of a compiler can be characterized by a context-free grammar.
There are two methods of syntax analysis: top-down analysis and bottom-up analysis. Top-down means starting from the starting symbol of the grammar and deducing downward to derive the sentence. The bottom-up analysis method uses the shift-in reduction method. The basic idea is: use a registered symbol first-in-last pop-out to move the input symbols into the stack one by one. When the top of the stack forms a production of a certain When selecting a candidate expression, the top part of the stack is reduced to the left-neighboring symbol of the production.
(3) Intermediate code generation
Intermediate code is an internal representation of the source program, or intermediate language. The function of the intermediate code is to make the structure of the compiled program logically simpler and clearer, especially to make the optimization of the target code easier to implement. The intermediate code is the intermediate language program, and the complexity of the intermediate language is between the source program language and the machine language. There are many forms of intermediate language, common ones are reverse Polish notation, tetragrams, ternary forms and trees.
(4) Code optimization
Code optimization refers to performing multiple equivalent transformations on the program so that starting from the transformed program, more effective goals can be generated code. The so-called equivalence means that the running results of the program are not changed. The so-called effective mainly refers to the short running time of the target code and the small storage space occupied. This transformation is called optimization.
There are two types of optimization: one is to optimize the intermediate code after syntax analysis, which does not depend on the specific computer; the other is to perform when generating the target code, which is to a large extent Depends on specific computer. For the former type of optimization, it can be divided into three different levels: local optimization, loop optimization and global optimization according to the scope of the program involved.
(5) Target code generation
Target code generation is the last stage of compilation. The target code generator converts the syntactically analyzed or optimized intermediate code into target code. There are three forms of target code:
① Machine language code that can be executed immediately, all addresses are relocated;
② Machine language module to be assembled, which is loaded by the connection when it needs to be executed The program connects them with certain running programs and converts them into executable machine language codes;
③ Assembly language codes must be compiled by an assembler to become executable machine language codes.
The target code generation stage should consider three issues that directly affect the speed of the target code: first, how to generate shorter target code; second, how to make full use of the registers in the computer and reduce the time for the target code to access the storage unit. times; the third is how to make full use of the characteristics of the computer instruction system to improve the quality of the target code.
For more related knowledge, please visit the FAQ column!
The above is the detailed content of What is the system software that can process high-level language source programs into target programs?. For more information, please follow other related articles on the PHP Chinese website!