Home > Article > Technology peripherals > Artificial intelligence writing detection tool is unreliable, and the U.S. Constitution is thought to have been written by a robot
News on July 16th, some netizens recently discovered that if the United States’ most important legal document, the United States Constitution, is entered into some tools specifically designed to detect artificial intelligence writing, a command will be obtained. Surprising result: The U.S. Constitution was almost certainly written by artificial intelligence. Unless James Madison is a time traveler, this is obviously impossible. So why do these AI detection tools make such errors? Foreign media Arstechnica interviewed several experts and the developer of the AI detection tool GPTZero to uncover the reasons.
In the field of education, artificial intelligence writing has caused a lot of controversy. Teachers have long used traditional teaching methods, using essays as a tool to assess students' mastery of a subject. Evidence so far suggests that many teachers who rely on AI tools to detect AI-generated writing are found to be unreliable. Due to the presence of false positives, AI detection tools such as GPTZero, ZeroGPT, and OpenAI's text classifiers are unreliable and cannot be used to determine whether an article was generated by a large language model (LLM).
When a portion of the U.S. Constitution is fed into GPTZero, GPTZero says the passage "was likely written entirely by AI." In the past six months, screenshots with similar results have been widely spread on social media through other AI detection tools. In fact, the same thing happens if you input something from the Bible. To understand why these tools make such obvious mistakes, we first need to understand how they work.
According to IT House, different artificial intelligence writing detectors use slightly different detection methods, but the basic principles are similar: through an artificial intelligence model, in a large amount of text (including millions of writing examples) and a set of hypothetical rules used to determine whether writing is more likely to have been generated by a human or an AI.
For example, at the heart of GPTZero is a neural network that was trained on “a large, diverse corpus of human writing and AI-generated text, with an emphasis on English prose.” Next, the system uses attributes such as "perplexity" and "emergency" to evaluate the text and classify it.
In machine learning, perplexity is a measure of the deviation between a piece of text and what an artificial intelligence model learned during training. The idea behind measuring perplexity is that when AI models write, they naturally choose the content they are most familiar with from their training data. The closer the output is to the training data, the lower the perplexity. Humans are more confusing writers Humans can also write with low confusion, especially when imitating the formal style used in law or certain types of academic writing. And, many of the phrases we use are surprisingly common.
As an example, let's try to guess the next word in this sentence: "I want a cup of _____". ” Most people would fill in the blanks with “water,” “coffee,” or “tea.” A language model trained on a large amount of English text would do the same, because these phrases appear frequently in English writing, as shown in these results. Either one will have a low level of perplexity.
Another property of text that GPTZero measures is "bursting," which is when certain words or phrases appear in quick succession or "burst" in the text phenomenon. In essence, emergency evaluates variability in sentence length and structure throughout a text. Human writers often exhibit dynamic writing styles, resulting in texts with variable sentence length and structure, while artificial Intelligently generated text tends to be more consistent and unified. However, emergencies are not a foolproof indicator of detecting AI-generated content. Like Perplexity, there are exceptions. Human writers may write in a highly structured, consistent style , resulting in a lower burstiness score. Conversely, AI models can be trained to simulate more human-like variability in sentence length and structure, thus improving their burstiness scores. In fact, with the emergence of artificial intelligence language models improvements, research shows their writing looks increasingly like human writing.
The above is the detailed content of Artificial intelligence writing detection tool is unreliable, and the U.S. Constitution is thought to have been written by a robot. For more information, please follow other related articles on the PHP Chinese website!