In Linux, flex is a lexical analysis tool that can identify lexical patterns in text; Flex reads a given input file, or if no file name is given, it reads from the standard input, thus Gets a description of the scanner to be generated.
#The operating environment of this tutorial: linux5.9.8 system, Dell G3 computer.
flex: Lexical analyzer
flex is a lexical analyzer. Used to generate a .l file into a .c program file. That is, a lexical analyzer is generated. Then read the input, match it with the regular expression, and then perform the corresponding action to realize the function of the program. We can find that flex implements the function of accepting input outside the program.
Flex is a tool that generates scanners capable of identifying lexical patterns in text. Flex reads the given input file, or standard input if no filename is given, to obtain a description of the scanner to be generated. This description is called a rule and consists of pairs of regular expressions and C code. The output of Flex is a C code file—lex.yy.c—in which the yylex() function is defined. Compiling the output file produces an executable file. When the executable is run, it analyzes the input file, looking for a match for each regular expression. When a match is found, it executes the C code associated with this regular expression. Flex is not a GNU project, but GNU has written a manual for Flex.
Usage
Install flex
sudo apt-get install flex //或者下载相应版本的安装文件安装
Then create a new text file and enter the following content:
%% [0-9]+ printf("?"); # return 0; . ECHO; %% int main(int argc, char* argv[]) { yylex(); return 0; } int yywrap() { return 1; }
Save this file as hide-digits.l. Note that %% in this file must be at the beginning of this line (that is, there cannot be any spaces in front of %%).
After that, enter in the terminal:
flex hide-digits.l
At this time in the directory There is an extra "lex.yy.c" file. Compile and run this C file:
gcc -o hide-digits lex.yy.c ./hide-digits
Then keep typing any key in the terminal and press Enter , it can be found that in the typed content, all characters except numbers are output as they are, and each string of number characters is replaced with ?. Finally, type # and the program exits. As follows:
eruiewdkfj eruiewdkfj 1245 ? fdsaf4578 fdsaf? ... #
When running flex on the command line, the second command line parameter (hide-digits.l here) is the word segmentation mode file provided to flex. This The pattern file mainly contains word segmentation matching patterns written by users with regular expressions. Use flex to translate these regular expressions into the function yylex in C code format and output it to the lex.yy.c file. This function can be viewed as a Finite state automata.
When running flex on the command line, the second command line parameter (hide-digits.l here) is the word segmentation mode file provided to flex. In this mode file It is mainly word segmentation matching patterns written by users with regular expressions. Flex will translate these regular expressions into the function yylex in C code format and output it to the lex.yy.c file. This function can be regarded as a finite state automatic machine.
Let’s explain the code in the hide-digits.l file in detail. First of all, the first paragraph is:
%% [0-9]+ printf("?"); # return 0; . ECHO; %%
flex pattern file is divided by %% and %%. The content divided above is called rules. Each line in this file is a rule. Each rule consists of matching pattern (pattern) and event ( action), the pattern is in the front, expressed by regular expressions, and the event is in the back, which is C code. Whenever a pattern is matched, the following C code is executed.
flex will translate this paragraph into a function named yylex. The function of this function is to scan the input file (standard input by default). When a complete , the longest string that can match the regular expression of a rule, this function will execute the C code behind this rule. If there is no return statement in these C codes, after executing these C codes, the yylex function will continue to run and start the next round of scanning and matching.
When the pattern of multiple rules is matched, yylex will select the rule with the longest matching length. If there are rules with equal matching length, it will be selected at the top. the rule of.
int main(int argc, char *argv[]) { yylex(); return 0; } int yywrap() { return 1; }
The main function in the second paragraph is the entry point of the program, and flex will copy these codes unchanged to the end of the lex.yy.c file. The yywrap function in the last line, flex requires such a function.
Example
word-spliter.l
%{ #define T_WORD 1 int numChars = 0, numWords = 0, numLines = 0; %} WORD([^ \t\n\r\a]+) %% \n{ numLines++; numChars++; } {WORD}{ numWords++; numChars += yyleng; return T_WORD; } <<EOF>>{ return 0; } .{ numChars++; } %% int main() { int token_type; while (token_type = yylex()) { printf("WORD:\t%s\n", yytext); } printf("\nChars\tWords\tLines\n"); printf("%d\t%d\t%d\n", numChars, numWords, numLines); return 0; } int yywrap() { return 1; }
In this example, two global variables yytext and yyleng provided by flex are used to represent the string just matched and its length respectively
Compile and execute
flex word-spliter.l gcc -o word-spliter lex.yy.c ./word-spliter < word-spliter.l 输出: WORD: %{ WORD: #define ... WORD: } Chars Words Lines 470 70 27
可见此程序其实就是一个原始的分词器,它将输入文件分割成一个个的 WORD 再输出到终端,同时统计输入文件中的字符数、单词数和行数。此处的 WORD 指一串连续的非空格字符。
扩展
(1) 列出所需的所有类型的 token;
(2) 为每种类型的 token 分配一个唯一的编号,同时写出此 token 的正则表达式;
(3) 写出每种 token 的 rule (相应的 pattern 和 action )。
第 1 类为单字符运算符,一共 15 种:
+ * - / % = , ; ! < > ( ) { }
第 2 类为双字符运算符和关键字,一共 16 种:
<=, >=, ==, !=, &&, || void, int, while, if, else, return, break, continue, print, readint
第 3 类为整数常量、字符串常量和标识符(变量名和函数名),一共 3 种。
拓展后
%{ #include "token.h" int cur_line_num = 1; void init_scanner(); void lex_error(char* msg, int line); %} /* Definitions, note: \042 is '"' */ INTEGER ([0-9]+) UNTERM_STRING (\042[^\042\n]*) STRING (\042[^\042\n]*\042) IDENTIFIER ([_a-zA-Z][_a-zA-Z0-9]*) OPERATOR ([+*-/%=,;!<>(){}]) SINGLE_COMMENT1 ("//"[^\n]*) SINGLE_COMMENT2 ("#"[^\n]*) %% [\n] { cur_line_num++; } [ \t\r\a]+ { /* ignore all spaces */ } {SINGLE_COMMENT1} { /* skip for single line comment */ } {SINGLE_COMMENT2} { /* skip for single line commnet */ } {OPERATOR} { return yytext[0]; } "<=" { return T_Le; } ">=" { return T_Ge; } "==" { return T_Eq; } "!=" { return T_Ne; } "&&" { return T_And; } "||" { return T_Or; } "void" { return T_Void; } "int" { return T_Int; } "while" { return T_While; } "if" { return T_If; } "else" { return T_Else; } "return" { return T_Return; } "break" { return T_Break; } "continue" { return T_Continue; } "print" { return T_Print; } "readint" { return T_ReadInt; } {INTEGER} { return T_IntConstant; } {STRING} { return T_StringConstant; } {IDENTIFIER} { return T_Identifier; } <<EOF>> { return 0; } {UNTERM_STRING} { lex_error("Unterminated string constant", cur_line_num); } . { lex_error("Unrecognized character", cur_line_num); } %% int main(int argc, char* argv[]) { int token; init_scanner(); while (token = yylex()) { print_token(token); puts(yytext); } return 0; } void init_scanner() { printf("%-20s%s\n", "TOKEN-TYPE", "TOKEN-VALUE"); printf("-------------------------------------------------\n"); } void lex_error(char* msg, int line) { printf("\nError at line %-3d: %s\n\n", line, msg); } int yywrap(void) { return 1; }
上面这个文件中,需要注意的是,正则表达式中,用双引号括起来的字符串就是原始字符串,里面的特殊字符是不需要转义的,而双引号本身必须转义(必须用 \” 或 \042 ),这是 flex 中不同于常规的正则表达式的一个特性。
除单字符运算符外的 token 的编号则在下面这个 token.h 文件,该文件中同时提供了一个 print_token 函数,可以根据 token 的编号打印其名称。
#ifndef TOKEN_H #define TOKEN_H typedef enum { T_Le = 256, T_Ge, T_Eq, T_Ne, T_And, T_Or, T_IntConstant, T_StringConstant, T_Identifier, T_Void, T_Int, T_While, T_If, T_Else, T_Return, T_Break, T_Continue, T_Print, T_ReadInt } TokenType; static void print_token(int token) { static char* token_strs[] = { "T_Le", "T_Ge", "T_Eq", "T_Ne", "T_And", "T_Or", "T_IntConstant", "T_StringConstant", "T_Identifier", "T_Void", "T_Int", "T_While", "T_If", "T_Else", "T_Return", "T_Break", "T_Continue", "T_Print", "T_ReadInt" }; if (token < 256) { printf("%-20c", token); } else { printf("%-20s", token_strs[token-256]); } } #endif
makefile
out: scanner scanner: lex.yy.c token.h gcc -o $@ $< lex.yy.c: scanner.l flex $<
相关推荐:《Linux视频教程》
The above is the detailed content of what is linux flex. For more information, please follow other related articles on the PHP Chinese website!

The five core components of the Linux operating system are: 1. Kernel, 2. System libraries, 3. System tools, 4. System services, 5. File system. These components work together to ensure the stable and efficient operation of the system, and together form a powerful and flexible operating system.

The five core elements of Linux are: 1. Kernel, 2. Command line interface, 3. File system, 4. Package management, 5. Community and open source. Together, these elements define the nature and functionality of Linux.

Linux user management and security can be achieved through the following steps: 1. Create users and groups, using commands such as sudouseradd-m-gdevelopers-s/bin/bashjohn. 2. Bulkly create users and set password policies, using the for loop and chpasswd commands. 3. Check and fix common errors, home directory and shell settings. 4. Implement best practices such as strong cryptographic policies, regular audits and the principle of minimum authority. 5. Optimize performance, use sudo and adjust PAM module configuration. Through these methods, users can be effectively managed and system security can be improved.

The core operations of Linux file system and process management include file system management and process control. 1) File system operations include creating, deleting, copying and moving files or directories, using commands such as mkdir, rmdir, cp and mv. 2) Process management involves starting, monitoring and killing processes, using commands such as ./my_script.sh&, top and kill.

Shell scripts are powerful tools for automated execution of commands in Linux systems. 1) The shell script executes commands line by line through the interpreter to process variable substitution and conditional judgment. 2) The basic usage includes backup operations, such as using the tar command to back up the directory. 3) Advanced usage involves the use of functions and case statements to manage services. 4) Debugging skills include using set-x to enable debugging mode and set-e to exit when the command fails. 5) Performance optimization is recommended to avoid subshells, use arrays and optimization loops.

Linux is a Unix-based multi-user, multi-tasking operating system that emphasizes simplicity, modularity and openness. Its core functions include: file system: organized in a tree structure, supports multiple file systems such as ext4, XFS, Btrfs, and use df-T to view file system types. Process management: View the process through the ps command, manage the process using PID, involving priority settings and signal processing. Network configuration: Flexible setting of IP addresses and managing network services, and use sudoipaddradd to configure IP. These features are applied in real-life operations through basic commands and advanced script automation, improving efficiency and reducing errors.

The methods to enter Linux maintenance mode include: 1. Edit the GRUB configuration file, add "single" or "1" parameters and update the GRUB configuration; 2. Edit the startup parameters in the GRUB menu, add "single" or "1". Exit maintenance mode only requires restarting the system. With these steps, you can quickly enter maintenance mode when needed and exit safely, ensuring system stability and security.

The core components of Linux include kernel, shell, file system, process management and memory management. 1) Kernel management system resources, 2) shell provides user interaction interface, 3) file system supports multiple formats, 4) Process management is implemented through system calls such as fork, and 5) memory management uses virtual memory technology.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.
