'Father of Machine Learning' Mitchell writes: How AI accelerates scientific development and how the United States seizes opportunities

王林

Jul 29, 2024 pm 08:23 PM

AIaisciencetheory

「机器学习之父」Mitchell 撰文：AI 如何加速科学发展，美国如何抓住机遇

Editor | ScienceAI

Recently, Tom M. Mitchell, a professor at Carnegie Mellon University and known as the "Father of Machine Learning", wrote a new AI for Science white paper, focusing on the discussion "How can artificial intelligence accelerate scientific development? How can the U.S. government help achieve this goal?" This topic.

「机器学习之父」Mitchell 撰文：AI 如何加速科学发展，美国如何抓住机遇

ScienceAI has compiled the full text of the original white paper without changing its original meaning. The content is as follows.

The field of artificial intelligence has recently made significant progress, including large-scale language models such as GPT, Claude and Gemini, thus raising the possibility that a very positive impact of artificial intelligence may be to greatly accelerate the transition from cell biology to Research advances in a variety of scientific fields, from materials science to weather and climate modeling to neuroscience. Here we briefly summarize this AI science opportunity and what the U.S. government can do to seize it.

「机器学习之父」Mitchell 撰文：AI 如何加速科学发展，美国如何抓住机遇

Opportunities of Artificial Intelligence and Science

The vast majority of scientific research in almost all fields today can be classified as "lone ranger" science.

In other words, scientists and their research teams of a dozen researchers come up with an idea, conduct experiments to test it, write up and publish the results, perhaps share their experimental data on the Internet, and then repeat the process.

Other scientists can consolidate these results by reading published papers, but This process is error-prone and extremely inefficient for several reasons:

(1) It is impossible for individual scientists to read already published papers in their field All articles published are therefore partially blind to other relevant studies; (2) Experiments described in journal publications necessarily omit many details, making it difficult for others to replicate their results and build on the results; (3) A single Analysis of experimental data sets is often performed in isolation, failing to incorporate data from other related experiments conducted by other scientists (and therefore not incorporating valuable information).

In the next ten years, artificial intelligence can help scientists overcome the above three problems

AI can transform this "lone ranger" scientific research model into a "community scientific discovery" model. In particular, AI can be used to create a new type of computer research assistant that helps human scientists overcome these problems by:

Discover complex data sets (including those built from many experiments conducted in multiple laboratories) ) rather than conducting isolated analyzes on a single, much smaller and less representative data set. More comprehensive and accurate analysis can be achieved by basing analysis on data sets that are orders of magnitude larger than human capability.
Use artificial intelligence large-scale language models such as GPT to read and digest every relevant publication in the field, thereby helping scientists form new hypotheses not only based on experimental data from their own laboratory and other laboratories, but also based on published Use assumptions and arguments from the research literature to formulate new hypotheses, leading to more informed hypotheses than would have been possible without this natural language AI tool.
Create “base models” and train these models using many different types of experimental data collected by labs and scientists, thus bringing the growing knowledge in the field into one place and making it computer-accessible Execution model. These executable "base models" can serve the same purpose as equations such as f = ma, i.e. they make predictions about certain quantities based on other observed quantities. And, unlike classical equations, these underlying models can capture the empirical relationships between hundreds of thousands of different variables rather than just a handful of variables.
Automate or semi-automate new experimental design and robotic execution, thereby accelerating new relevant experiments and improving the reproducibility of scientific experiments.

「机器学习之父」Mitchell 撰文：AI 如何加速科学发展，美国如何抓住机遇

What scientific breakthroughs might this paradigm shift in scientific practice bring?

Here are a few examples:

Reduce the development time and cost of new vaccines for new disease outbreaks by 10x.
Accelerating materials research may lead to breakthrough products such as room-temperature superconductors and thermoelectric materials that convert heat into electricity without producing emissions.
Combining a never-before-attempted volume and diversity of cell biology experimental data to form a "basic model" of human cell function, enabling the more expensive step of conducting in vivo experiments in the laboratory , quickly simulate the results of many potential experiments.
Combined with experimental data from neuroscience (from single neuron behavioral data to whole-brain fMRI imaging), build a "basic model" of the human brain at multiple levels of detail, integrate data with unprecedented scale and diversity, and establish A model that predicts the neural activity the brain uses to encode different types of thoughts and emotions, how those thoughts and emotions are evoked by different stimuli, the effects of drugs on neural activity, and the effectiveness of different treatments for mental disorders.
Improve our ability to predict weather, both by tailoring forecasts to highly localized areas (e.g., individual farms) and by expanding our ability to predict future weather.

「机器学习之父」Mitchell 撰文：AI 如何加速科学发展，美国如何抓住机遇

What can the US government do to seize this opportunity?

Translating this opportunity into reality requires several elements:

Lots of experimental data

One lesson of basic text-based models is that the more data they are trained on, the more powerful they become. Experienced scientists also know very well the value of more and more diverse experimental data. To achieve many orders of magnitude progress in science, and to train the types of underlying models we want, we need to make very significant advances in our ability to share and jointly analyze diverse datasets contributed by the entire scientific community.

The ability to access scientific publications and read them with computers

A key part of the opportunity here is to change the current situation: scientists are unlikely to read 1% of relevant publications in their field, computers read 100% of publications, summarizes them and their relevance to current scientific issues, and provides a conversational interface to discuss their content and implications. This requires not only access to online literature, but also AI research to build such a "literary assistant."

Computing and Network Resources

Text-based basic models such as GPT and Gemini are known for the large amount of processing resources consumed during their development. Developing basic models in different scientific fields also requires large amounts of computing resources. However, the computational demands in many AI scientific efforts are likely to be much smaller than those required to train LLMs such as GPT, and thus can be achieved with investments similar to those being made by government research labs.

For example, AlphaFold, an AI model that has revolutionized protein analysis for drug design, uses far less training computation than basic text-based models like GPT and Gemini. To support data sharing, we need massive computer networks, but the current Internet already provides a sufficient starting point for transferring large experimental data sets. Therefore, the cost of hardware to support AI-driven scientific advancement is likely to be quite low compared to the potential benefits.

New Machine Learning and AI Methods

Current machine learning methods are extremely useful for discovering statistical regularities in huge data sets that humans cannot examine (for example, AlphaFold is performed on large amounts of protein sequences and their carefully measured 3D structures trained). A key part of the new opportunity is to expand current machine learning methods (discovering statistical correlations in data) in two important directions: (1) moving from finding correlations to finding causal relationships in data, and (2) moving from finding only large-scale Structured dataset learning moves toward learning from large structured datasets and large research literatures; that is, learning like human scientists from experimental data and published hypotheses and arguments expressed in natural language by others. The recent emergence of LLMs with advanced capabilities for digesting, summarizing, and reasoning about large text collections could provide the basis for this new class of machine learning algorithms.

What should the government do? The key is to support the above four parts and unite the scientific community to explore new methods based on artificial intelligence to promote their research progress. Therefore, the government should consider taking the following actions:

「机器学习之父」Mitchell 撰文：AI 如何加速科学发展，美国如何抓住机遇

Explore specific opportunities in specific areas of science, Fund multi-institutional research teams in many scientific areas to present visions and preliminary results that demonstrate how AI can be used to significantly accelerate progress in their fields, and what is needed to scale this approach. This work should not be funded in the form of grants to individual institutions, as the greatest advances may come from integrating data and research from many scientists at many institutions. Instead, it is likely to be most effective if carried out by a team of scientists from many institutions, who propose opportunities and approaches that inspire their engagement with the scientific community at large.

Accelerate the creation of new experimental datasets to train new base models and make data available to the entire community of scientists:

Create data sharing standards to enable one scientist to conveniently use experimental data created by different scientists, and lay the foundation for national data resources in each relevant scientific field. Note that there have been previous successes in developing and using such standards that can provide a starting template for standards efforts (e.g., the success of data sharing during the Human Genome Project).
Create and support data sharing websites for every relevant field. Just as GitHub has become the go-to site for software developers to contribute, share, and reuse software code, creating a GitHub for scientific datasets can serve as both a data repository and a search engine for discovering topics related to specific topics, Hypothesize or plan an experiment on the most relevant data set.
Study how to build incentive mechanisms to maximize data sharing. Currently, scientific fields vary widely in the extent to which individual scientists share their data and the extent to which for-profit organizations use their data for basic scientific research. Building a large, shareable national data resource is integral to the scientific opportunity for AI, and building a compelling incentive structure for data sharing will be key to success.
Where appropriate, fund the development of automated laboratories (e.g. robotic laboratories for chemistry, biology, etc. experiments that can be used by many scientists via the Internet) to conduct experiments efficiently and generate them in a standard format data. A major benefit of creating such laboratories is that they will also promote the development of standards that precisely specify the experimental procedures to be followed, thereby increasing the reproducibility of experimental results. Just as we can benefit from GitHubs for datasets, we can also benefit from related GitHubs to share, modify, and reuse components of experimental protocols.

「机器学习之父」Mitchell 撰文：AI 如何加速科学发展，美国如何抓住机遇

To create a new generation of artificial intelligence tools requires:

Funding relevant basic AI research specifically developed for scientific research methods. This should include the development of "foundational models" in a broad sense as tools to accelerate research in different fields and accelerate the shift from "lone ranger" science to a more powerful "community scientific discovery" paradigm.
Specially supports research by reading the research literature, critiquing stated input assumptions and suggesting improvements, and helping scientists derive results from the scientific literature in a way that is directly relevant to their current questions.
Specially supports research that extends machine learning from the discovery of correlations to the discovery of causation, especially in settings where new experiments can be planned and executed to test causal hypotheses.
Specially supports the expansion of research on machine learning algorithms, from only taking big data as input, to taking both large experimental data and complete research literature in the field as input, in order to generate statistical regularities in experimental data and research literature The assumptions, explanations, and arguments discussed in .

Related content:

https://x.com/tommmitchell/status/1817297827003064715

https://docs.google.com/document/d/1ak_XRk5j5ZHixHUxXeqaiCeeaNxXySO lH1kIeEH3DXE/edit?pli=1

Note: The pictures in this article come from the Internet.

The above is the detailed content of 'Father of Machine Learning' Mitchell writes: How AI accelerates scientific development and how the United States seizes opportunities. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

An easy-to-understand explanation of how to save conversation history (conversation log) in ChatGPT!May 16, 2025 am 05:41 AM

Various ways to efficiently save ChatGPT dialogue records Have you ever thought about saving a ChatGPT-generated conversation record? This article will introduce a variety of saving methods in detail, including official functions, Chrome extensions and screenshots, etc., to help you make full use of ChatGPT conversation records. Understand the characteristics and steps of various methods and choose the one that suits you best. [Introduction to the latest AI proxy "OpenAI Operator" released by OpenAI] (The link to OpenAI Operator should be inserted here) Table of contents Save conversation records using ChatGPT Export Steps to use the official export function Save ChatGPT logs using Chrome extension ChatGP

Create a schedule with ChatGPT! Explaining prompts that can be used to create and adjust tablesMay 16, 2025 am 05:40 AM

Modern society has a compact pace and efficient schedule management is crucial. Work, life, study and other tasks are intertwined, and prioritization and schedules are often a headache. Therefore, intelligent schedule management methods using AI technology have attracted much attention. In particular, ChatGPT's powerful natural language processing capabilities can automate tedious schedules and task management, significantly improving productivity. This article will explain in-depth how to use ChatGPT for schedule management. We will combine specific cases and steps to demonstrate how AI can improve daily life and work efficiency. In addition, we will discuss things to note when using ChatGPT to ensure safe and effective use of this technology. Experience ChatGPT now and get your schedule

How to connect ChatGPT with spreadsheets! A thorough explanation of what you can doMay 16, 2025 am 05:39 AM

We will explain how to link Google Sheets and ChatGPT to improve business efficiency. In this article, we will explain in detail how to use the add-on "GPT for Sheets and Docs" that is easy for beginners to use. No programming knowledge is required. Increased business efficiency through ChatGPT and spreadsheet integration This article will focus on how to connect ChatGPT with spreadsheets using add-ons. Add-ons allow you to easily integrate ChatGPT features into your spreadsheets. GPT for Shee

6 Investor Predictions For AI In 2025May 16, 2025 am 05:37 AM

There are overarching trends and patterns that experts are highlighting as they forecast the next few years of the AI revolution. For instance, there's a significant demand for data, which we will discuss later. Additionally, the need for energy is d

Use ChatGPT for writing! A thorough explanation of tips and examples of prompts!May 16, 2025 am 05:36 AM

ChatGPT is not just a text generation tool, it is a true partner that dramatically increases writers' creativity. By using ChatGPT for the entire writing process, such as initial manuscript creation, ideation ideas, and stylistic changes, you can simultaneously save time and improve quality. This article will explain in detail the specific ways to use ChatGPT at each stage, as well as tips for maximizing productivity and creativity. Additionally, we will examine the synergy that combines ChatGPT with grammar checking tools and SEO optimization tools. Through collaboration with AI, writers can create originality with free ideas

How to create graphs in ChatGPT! No plugins required, so it can be used for Excel too!May 16, 2025 am 05:35 AM

Data visualization using ChatGPT: From graph creation to data analysis Data visualization, which conveys complex information in an easy-to-understand manner, is essential in modern society. In recent years, due to the advancement of AI technology, graph creation using ChatGPT has attracted attention. In this article, we will explain how to create graphs using ChatGPT in an easy-to-understand manner even for beginners. We will introduce the differences between the free version and the paid version (ChatGPT Plus), specific creation steps, and how to display Japanese labels, along with practical examples. Creating graphs using ChatGPT: From basics to advanced use ChatG

Pushing The Limits Of Modern LLMs With A Dinner Plate?May 16, 2025 am 05:34 AM

In general, we know that AI is big, and getting bigger. It’s fast, and getting faster. Specifically, though, not everyone’s familiar with some of the newest hardware and software approaches in the industry, and how they promote better results. Peopl

Archive your ChatGPT conversation history! Explaining the steps to save and how to restore itMay 16, 2025 am 05:33 AM

ChatGPT Dialogue Record Management Guide: Efficiently organize and make full use of your treasure house of knowledge! ChatGPT dialogue records are a source of creativity and knowledge, but how can growing records be effectively managed? Is it time-consuming to find important information? don’t worry! This article will explain in detail how to effectively "archive" (save and manage) your ChatGPT conversation records. We will cover official archive functions, data export, shared links, and data utilization and considerations. Table of contents Detailed explanation of ChatGPT's "archive" function How to use ChatGPT archive function Save location and viewing method of ChatGPT archive records Cancel and delete methods for ChatGPT archive records Cancel archive Delete the archive Summarize Ch

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

4 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Clair Obscur: Expedition 33 UE-Sandfall Game Crash? 3 Ways!

2 weeks agoByDDD

Hot Tools

WebStorm Mac version

Useful JavaScript development tools

SublimeText3 Linux new version

SublimeText3 Linux latest version

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.