php小编小新在这里为大家介绍一种非常有趣的技术——Participle expresses unexpected mark。分词是自然语言处理中的一个重要任务,它将一段文本按照一定的规则进行切分,得到一系列的词语。而在分词过程中,我们有时会遇到一些特殊的情况,比如在某些词语中出现了意外的标记,这可能会对后续的处理造成困扰。因此,研究如何表示和处理这种意外的标记就变得非常重要。在本文中,我们将介绍一些常见的意外标记,并给出相应的解决方案,希望能对大家有所帮助。
问题内容
我正在玩一个分词来学习如何解析,但我无法确定为什么这是意外的。
// nolint: golint, dupl package main import ( "fmt" "io" "github.com/alecthomas/participle/v2" "github.com/alecthomas/participle/v2/lexer" ) var htaccesslexer = lexer.mustsimple([]lexer.simplerule{ {"comment", `^#[^\n]*`}, {"ident", `^\w+`}, {"int", `\d+`}, {"string", `("(\\"|[^"])*"|\s+)`}, {"eol", `[\n\r]+`}, {"whitespace", `[ \t]+`}, }) type htaccess struct { directives []*directive `@@*` } type directive struct { pos lexer.position errordocument *errordocument `@@` } type errordocument struct { code int `"errordocument" @int` path string `@string` } var htaccessparser = participle.mustbuild[htaccess]( participle.lexer(htaccesslexer), participle.caseinsensitive("ident"), participle.unquote("string"), participle.elide("whitespace"), ) func parse(r io.reader) (*htaccess, error) { program, err := htaccessparser.parse("", r) if err != nil { return nil, err } return program, nil } func main() { v, err := htaccessparser.parsestring("", `errordocument 403 test`) if err != nil { panic(err) } fmt.println(v) }
据我所知,这似乎是正确的,我期望 403 在那里,但我不确定为什么它不识别它。
编辑: 我将词法分析器更改为:
var htaccessLexer = lexer.MustSimple([]lexer.SimpleRule{ {"dir", `^\w+`}, {"int", `\d+`}, {"str", `("(\\"|[^"])*"|\S+)`}, {"EOL", `[\n\r]+`}, {"whitespace", `\s+`}, })
错误消失了,但它仍然打印一个空数组,不知道为什么。我也不确定为什么对词法分析器使用不同的值可以修复它。
解决方法
我相信我发现了问题,这是顺序,ident 通过 \w 标签在我的词法分析器中查找数字,因此这导致我的整数被标记为 ident。
我发现我必须将 quotedstrings 和 unquotedstrings 分开,否则未加引号的字符串会获取整数。或者,我可以确保它只获取非数字值,但这会错过 stringwithnum2
之类的东西
这是我的解决方案
var htaccesslexer = lexer.mustsimple([]lexer.simplerule{ {"comment", `(?i)#[^\n]*`}, {"quotedstring", `"(\\"|[^"])*"`}, {"number", `[-+]?(\d*\.)?\d+`}, {"unquotedstring", `[^ \t]+`}, {"ident", `^[a-za-z_]`}, {"eol", `[\n\r]+`}, {"whitespace", `[ \t]+`}, })
type ErrorDocument struct { Pos lexer.Position Code int `"ErrorDocument" @Number` Path string `(@QuotedString | @UnQuotedString)` }
这解决了我的问题,因为它现在查找带引号的字符串,然后查找数字,然后查找不带引号的字符串。
The above is the detailed content of Participle expresses unexpected mark. For more information, please follow other related articles on the PHP Chinese website!

The article explains how to use the pprof tool for analyzing Go performance, including enabling profiling, collecting data, and identifying common bottlenecks like CPU and memory issues.Character count: 159

The article discusses writing unit tests in Go, covering best practices, mocking techniques, and tools for efficient test management.

This article demonstrates creating mocks and stubs in Go for unit testing. It emphasizes using interfaces, provides examples of mock implementations, and discusses best practices like keeping mocks focused and using assertion libraries. The articl

This article explores Go's custom type constraints for generics. It details how interfaces define minimum type requirements for generic functions, improving type safety and code reusability. The article also discusses limitations and best practices

The article discusses Go's reflect package, used for runtime manipulation of code, beneficial for serialization, generic programming, and more. It warns of performance costs like slower execution and higher memory use, advising judicious use and best

The article discusses using table-driven tests in Go, a method that uses a table of test cases to test functions with multiple inputs and outcomes. It highlights benefits like improved readability, reduced duplication, scalability, consistency, and a

This article explores using tracing tools to analyze Go application execution flow. It discusses manual and automatic instrumentation techniques, comparing tools like Jaeger, Zipkin, and OpenTelemetry, and highlighting effective data visualization

The article discusses managing Go module dependencies via go.mod, covering specification, updates, and conflict resolution. It emphasizes best practices like semantic versioning and regular updates.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.
