


Linguists are back! Start learning from 'pronunciation”: this time the AI model has to teach itself
Trying to make computers understand human language has always been an insurmountable difficulty in the field of artificial intelligence.
Early natural language processing models usually used artificially designed features, requiring specialized linguists to manually write patterns. However, the final results were not ideal, and even AI research once fell into a cold winter.
Every time I fire a linguist, the speech recognition system becomes more accurate.
Every time I fire a linguist, the performance of the speech recognizer goes up.
——Frederick Jelinek
With statistical model and large-scale pre-training After the model is built, feature extraction is no longer necessary, but data annotation for specified tasks is still required, and the most critical problem is: the trained model still does not understand human language.
#So, should we start from the original form of language and re-study: How did humans acquire language ability?
Researchers from Cornell University, MIT and McGill University recently published a paper in Nature Communications, proposing a framework for algorithmic synthesis models, in the most basic part of human language, That is, morpho-phonology began to teach AI to learn language and construct the morphology of the language directly from sounds.
Paper link: https://www.nature.com/articles/s41467-022-32012-w
Morphology and phonology is linguistics One of the branches focuses on the sound changes that occur when morphemes (the smallest units of meaning) are combined into words, and attempts to provide a series of rules to predict the regular sound changes of phonemes in language.
For example, the plural morpheme in English is written as -s or -es, but there are three pronunciations [s], [z] and [әz]. For example, the pronunciation of cats is /kæts/, and the pronunciation of dogs is It is /dagz/, and horses is pronounced /hɔrsәz/.
When humans learn to convert plural pronunciation, they first realize that the plural suffix is actually /z/ based on morphology; then based on phonology, the suffix is based on the pronunciation in the stem , such as unvoiced consonants, etc. are converted into /s/ or /әz/
Other languages also have the same phonemic and morphological rules. The researchers studied the phonemic textbooks of 58 languages 70 data sets were collected, each containing only dozens to hundreds of words and only a few grammatical phenomena. The experiment showed that the method of finding grammatical structures in natural language can also simulate the process of infants learning language.
By performing hierarchical Bayesian inference on these language data sets, the researchers found that the model can acquire new morphophonemic rules from just one or a few examples, and Able to extract common cross-language patterns and express them in compact, human-understandable form.
Let the AI model be a "linguist"
Human intelligence is mainly reflected in the ability to establish a theory of cognitive world. For example, after the formation of natural language, linguists summarized a set of rules to Help children learn specific languages more quickly, but current AI models cannot summarize the rules and form a theoretical framework that others can understand.
Before building a model, we need to solve a core problem: "How to describe a word." For example, the learning process of a word includes understanding the concept, intention, usage, pronunciation, and meaning of the word.
When building the vocabulary, the researchers expressed each word as a pair, for example, open is expressed as εn/, [stem: OPEN]>, the past tense is expressed as /, [tense: PAST]>, and the combined opened is expressed as εnd/, [stem: OPEN, [tense: PAST]]>
After having the data set, the researchers built a model to explain the generation of grammatical rules on a set of pair sets through maximum posterior probability inference to explain word changes.
In the representation of sounds, phonemes (atomic sounds) are represented as vectors of binary features, such as /m/, /n/, which are nasal sounds, and then based on the The feature space defines speech rules.
The researchers use the classic rule expression method, that is, context-dependent memory, sometimes also called SPE-style rules, which are widely used in the representation of sound patterns of English. .
The writing method of each rule is
(focus)→(structural_change)/(left_trigger)_(right_trigger), which means that as long as the left/right trigger environment is close to the left/right of focus, The focus phoneme will be converted according to structural changes.
The trigger environment specifies the connection of features (representing the set of phonemes). For example, in English, as long as the phoneme on the left is [-sonorant], the pronunciation at the end of the word is It will change from /d/ to /t/, and the writing rule is [-sonorant] → [-voice]/[-voice -sonorant]_#. For example, after walking applies this rule, the pronunciation changes from /wɔkd/ to /wɔkt/.
When such rules are constrained not to apply cyclically to their own outputs, the rules and lexics correspond to 2-way rational functions, which in turn correspond to finite state converters. -state transductions). It has been argued that the space of finite state converters is expressive enough to cover known empirical phenomena in morphophonetics and represents a limit on the descriptive power of practical uses of phonetic theory.
To learn this grammar, researchers used the Bayesian Program Learning (BPL) method. Model each grammar rule T as a program in a programming language that captures the domain-specific constraints of the problem space. The language structure common to all languages is called universal grammar. This approach can be seen as a modern instance of a long-standing approach in linguistics and employs human-understandable generative representations to formalize universal grammar.
After defining the problem that BPL needs to solve, the search space of all programs is infinite, and no guidance is given on how to solve this problem, and there is a lack of information like In the case of local stationarity exploited by local optimization algorithms such as gradient descent or Markov chain Monte Carlo, the researchers adopted a constraint-based program synthesis strategy to transform the optimization problem into a combinatorial constraint satisfaction problem and use Boolean satisfiability (SAT) solver to solve.
These solvers implement an exhaustive but relatively efficient search and guarantee that an optimal solution will be found if there is enough time. The smallest grammar that is consistent with some data can be solved using the Sketch procedural synthesizer, but must comply with the upper limit of the grammar size.
But in practice, the exhaustive search techniques used by SAT solvers cannot scale to the massive amounts of rules required to interpret large corpora.
To scale the solver to large and complex theories, the researchers took inspiration from a fundamental feature of children acquiring language and scientists building theories.
Children do not learn language overnight, but gradually enrich their grasp of grammar and vocabulary through intermediate stages of language development. Likewise, a complex scientific theory may begin with a simple conceptual core and then gradually develop to encompass an increasing number of linguistic phenomena.
Based on the above ideas, the researchers designed a program synthesis algorithm, starting from a small program, and then repeatedly using the SAT solver to find small modification points so that it can explain more and more data . Specifically, find a counterexample to the current theory and then use a solver to exhaustively explore the space of all small modifications to the theory that can accommodate this counterexample.
##
But this heuristic method lacks the integrity guarantee of SAT solver: although it repeatedly calls a complete and accurate SAT solver, it does not guarantee to find an optimal solution, but each repeated call is better than directly Optimizing the entire data is much harder. Because constraining each new theory to be close to its previous theory in theory space results in a polynomial shrinkage of the constraint satisfaction problem, the search time increases exponentially, and the SAT solver in the worst case is exponentially .
In the experimental evaluation phase, the researchers collected 70 questions from linguistics textbooks, each of which required a comprehensive analysis of some form of theory in natural language. The problems range in difficulty and cover a wide variety of natural language phenomena.
Natural languages are also diverse, including tonal languages. For example, in Kerewe (a Bantu language in Tanzania), to count is /kubala/, but to count it is /kukíbála/, where Stress marks high pitches.
There are also languages with vowel harmony. For example, Turkey has /el/ and /t∫an/, which respectively represent hands and bells, as well as /el-ler/ and /t∫an-lar/. , representing the plurals of hands and clocks respectively; there are many other linguistic phenomena, such as assimilation and extensional forms.
#In evaluation, we first measure the model’s ability to discover the correct vocabulary. Compared to ground-truth vocabularies, the model found syntax that correctly matched the entire vocabulary of the question in 60% of the benchmarks and correctly interpreted a large portion of the vocabulary in 79% of the questions.
Typically, the correct vocabulary for each problem is more specific than the correct rules, and any rules that produce complete data from the correct vocabulary must be consistent with what the model is likely to propose. Any underlying rules of have observational equivalence. Therefore, consistency with the underlying truth lexicon should be used as a metric to measure whether the synchronized rules behave correctly on the data, and this evaluation is related to the quality of the rules.
To test this hypothesis, the researchers randomly selected 15 questions and consulted with a professional linguist to score the discovered rules. Recall (the proportion of actual phonetic rules that were correctly recovered) and precision (the proportion of recovered rules that actually occurred) were measured simultaneously. Under the indicators of precision and recall, it can be found that the accuracy of the rules is positively correlated with the accuracy of the vocabulary.
When the system gets all the lexicon correct, it rarely introduces irrelevant rules (high precision) and almost always gets all the correct rules (high recall Rate).
The above is the detailed content of Linguists are back! Start learning from 'pronunciation”: this time the AI model has to teach itself. For more information, please follow other related articles on the PHP Chinese website!

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Dreamweaver Mac version
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

WebStorm Mac version
Useful JavaScript development tools