This article is reprinted with the authorization of AI New Media Qubits (public account ID: QbitAI). Please contact the source for reprinting.
The latest posture of scientific research was unlocked by a Chinese guy——
Tell AI your research goals, then "feed" the data set into it, and you're done.
This is the latest research by Zhong Ruiqi, a doctoral student from Berkeley, and others. The tedious "evidence collection" process from massive data sets is all solved by GPT-3:
They also found that this method of using AI for scientific research is not only highly efficient, but can also produce "unexpected surprises" that humans have not thought of.
Let GPT-3 help you with scientific research
So why do the guys suddenly want to use this method to do scientific research?
This is because they found that in-depth mining of large corpora can indeed obtain some useful results, but if this process is done by humans, it is simply too time-consuming and laborious.
Therefore, they decided to hand over this tedious process to GPT-3 and named this task "D5":
Goal Driven Discovery of Distributional Differences via Language Descriptions.
Through language description, goal-driven discovery of distributed differences is achieved.
The process of the "D5" task is simply two actions:
- Input the research goal
- Input the two corpora
For example, in the above case, the little brother first input two corpora into the AI:
- Corpus A: Taking medicine Patient's reaction report after A
- Corpus B: Patient's reaction report after taking drug B
Then determine your research goals to the AI, that is, "I want to know about drug A side effects".
After the AI receives the task, it immediately begins to perform analysis work, and finally reaches its conclusion:
The samples in corpus A will be mentioned by more patients "paranoia".
But just imagine, if human researchers are allowed to do this work, it will take a lot of time just to understand corpora A and B, not to mention further comparative analysis and other work.
The reason why the D5 mission can be done so smoothly is because the guys have done a lot of work behind the scenes.
For example, the OpenD5 metadata set is constructed, which contains 675 open questions that meet the D5 tasks, covering fields such as business, social sciences, humanities, health, and machine learning.
And each open-ended question corresponds to a corpus pair (Corpus A and Corpus B), with an average of 17,000 samples.
The little brother also uses 50% of each corpus as the research part, and the other 50% is used for verification.
Based on this, I built a "D5 system". Its working principle is similar to that of humans obtaining findings from the database. It is divided into two stages, that is, creatively putting forward a hypothesis, and then analyzing the data in the database. This hypothesis is rigorously tested on the set.
According to this idea, the researchers next conducted an experiment using GPT-3.
They first showed GPT3 the research goals and some samples from each corpus, and then asked it to come up with a list of hypotheses.
The final experiment found that GPT-3 can use target descriptions to propose more relevant, novel, and meaningful hypotheses.
It is precisely because the OpenD5 data set covers so many fields that my brother said that their D5 system has a wide range of applications.
There are also some flaws
But regarding this D5 system, I also bluntly stated its flaws.
For example, if the corpus contains a lot of slang, colloquialisms, or words with emotions, then the "discoveries" given by AI will be biased.
In short, it means that AI has misunderstood and analyzed the vocabulary or description of a specific situation.
In addition, I also said that a more flexible corpus and a more scalable system are also the focus of their future research. .
But it seems that this research has made me very excited. After all, it is one step closer to his dream of "building a scientific research using AI manuscripts."
Reference link:
[1] https://www.php.cn/link/ b1adda14824f50ef24ff1c05bb66faf3
[2]https://www.php.cn/link/ec26fc2eb2b75aece19c70392dc744c2
The above is the detailed content of A new approach to scientific research: let GPT-3 help you. For more information, please follow other related articles on the PHP Chinese website!

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

SublimeText3 English version
Recommended: Win version, supports code prompts!

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.