Home >Technology peripherals >AI >'Father of Machine Learning' Mitchell writes: How AI accelerates scientific development and how the United States seizes opportunities
Editor | ScienceAI
Recently, Tom M. Mitchell, a professor at Carnegie Mellon University and known as the "Father of Machine Learning", wrote a new AI for Science white paper, focusing on the discussion "How can artificial intelligence accelerate scientific development? How can the U.S. government help achieve this goal?" This topic.
ScienceAI has compiled the full text of the original white paper without changing its original meaning. The content is as follows.
The field of artificial intelligence has recently made significant progress, including large-scale language models such as GPT, Claude and Gemini, thus raising the possibility that a very positive impact of artificial intelligence may be to greatly accelerate the transition from cell biology to Research advances in a variety of scientific fields, from materials science to weather and climate modeling to neuroscience. Here we briefly summarize this AI science opportunity and what the U.S. government can do to seize it.
The vast majority of scientific research in almost all fields today can be classified as "lone ranger" science.
In other words, scientists and their research teams of a dozen researchers come up with an idea, conduct experiments to test it, write up and publish the results, perhaps share their experimental data on the Internet, and then repeat the process.
Other scientists can consolidate these results by reading published papers, but This process is error-prone and extremely inefficient for several reasons:
(1) It is impossible for individual scientists to read already published papers in their field All articles published are therefore partially blind to other relevant studies; (2) Experiments described in journal publications necessarily omit many details, making it difficult for others to replicate their results and build on the results; (3) A single Analysis of experimental data sets is often performed in isolation, failing to incorporate data from other related experiments conducted by other scientists (and therefore not incorporating valuable information).
In the next ten years, artificial intelligence can help scientists overcome the above three problems
AI can transform this "lone ranger" scientific research model into a "community scientific discovery" model. In particular, AI can be used to create a new type of computer research assistant that helps human scientists overcome these problems by:
What scientific breakthroughs might this paradigm shift in scientific practice bring?
Here are a few examples:
Translating this opportunity into reality requires several elements:
Lots of experimental data
One lesson of basic text-based models is that the more data they are trained on, the more powerful they become. Experienced scientists also know very well the value of more and more diverse experimental data. To achieve many orders of magnitude progress in science, and to train the types of underlying models we want, we need to make very significant advances in our ability to share and jointly analyze diverse datasets contributed by the entire scientific community.
The ability to access scientific publications and read them with computers
A key part of the opportunity here is to change the current situation: scientists are unlikely to read 1% of relevant publications in their field, computers read 100% of publications, summarizes them and their relevance to current scientific issues, and provides a conversational interface to discuss their content and implications. This requires not only access to online literature, but also AI research to build such a "literary assistant."
Computing and Network Resources
Text-based basic models such as GPT and Gemini are known for the large amount of processing resources consumed during their development. Developing basic models in different scientific fields also requires large amounts of computing resources. However, the computational demands in many AI scientific efforts are likely to be much smaller than those required to train LLMs such as GPT, and thus can be achieved with investments similar to those being made by government research labs.
For example, AlphaFold, an AI model that has revolutionized protein analysis for drug design, uses far less training computation than basic text-based models like GPT and Gemini. To support data sharing, we need massive computer networks, but the current Internet already provides a sufficient starting point for transferring large experimental data sets. Therefore, the cost of hardware to support AI-driven scientific advancement is likely to be quite low compared to the potential benefits.
New Machine Learning and AI Methods
Current machine learning methods are extremely useful for discovering statistical regularities in huge data sets that humans cannot examine (for example, AlphaFold is performed on large amounts of protein sequences and their carefully measured 3D structures trained). A key part of the new opportunity is to expand current machine learning methods (discovering statistical correlations in data) in two important directions: (1) moving from finding correlations to finding causal relationships in data, and (2) moving from finding only large-scale Structured dataset learning moves toward learning from large structured datasets and large research literatures; that is, learning like human scientists from experimental data and published hypotheses and arguments expressed in natural language by others. The recent emergence of LLMs with advanced capabilities for digesting, summarizing, and reasoning about large text collections could provide the basis for this new class of machine learning algorithms.
What should the government do? The key is to support the above four parts and unite the scientific community to explore new methods based on artificial intelligence to promote their research progress. Therefore, the government should consider taking the following actions:
Explore specific opportunities in specific areas of science, Fund multi-institutional research teams in many scientific areas to present visions and preliminary results that demonstrate how AI can be used to significantly accelerate progress in their fields, and what is needed to scale this approach. This work should not be funded in the form of grants to individual institutions, as the greatest advances may come from integrating data and research from many scientists at many institutions. Instead, it is likely to be most effective if carried out by a team of scientists from many institutions, who propose opportunities and approaches that inspire their engagement with the scientific community at large.
Accelerate the creation of new experimental datasets to train new base models and make data available to the entire community of scientists:
Create data sharing standards to enable one scientist to conveniently use experimental data created by different scientists, and lay the foundation for national data resources in each relevant scientific field. Note that there have been previous successes in developing and using such standards that can provide a starting template for standards efforts (e.g., the success of data sharing during the Human Genome Project).
Create and support data sharing websites for every relevant field. Just as GitHub has become the go-to site for software developers to contribute, share, and reuse software code, creating a GitHub for scientific datasets can serve as both a data repository and a search engine for discovering topics related to specific topics, Hypothesize or plan an experiment on the most relevant data set.
Study how to build incentive mechanisms to maximize data sharing. Currently, scientific fields vary widely in the extent to which individual scientists share their data and the extent to which for-profit organizations use their data for basic scientific research. Building a large, shareable national data resource is integral to the scientific opportunity for AI, and building a compelling incentive structure for data sharing will be key to success.
Where appropriate, fund the development of automated laboratories (e.g. robotic laboratories for chemistry, biology, etc. experiments that can be used by many scientists via the Internet) to conduct experiments efficiently and generate them in a standard format data. A major benefit of creating such laboratories is that they will also promote the development of standards that precisely specify the experimental procedures to be followed, thereby increasing the reproducibility of experimental results. Just as we can benefit from GitHubs for datasets, we can also benefit from related GitHubs to share, modify, and reuse components of experimental protocols.
To create a new generation of artificial intelligence tools requires:
Funding relevant basic AI research specifically developed for scientific research methods. This should include the development of "foundational models" in a broad sense as tools to accelerate research in different fields and accelerate the shift from "lone ranger" science to a more powerful "community scientific discovery" paradigm.
Specially supports research by reading the research literature, critiquing stated input assumptions and suggesting improvements, and helping scientists derive results from the scientific literature in a way that is directly relevant to their current questions.
Specially supports research that extends machine learning from the discovery of correlations to the discovery of causation, especially in settings where new experiments can be planned and executed to test causal hypotheses.
Specially supports the expansion of research on machine learning algorithms, from only taking big data as input, to taking both large experimental data and complete research literature in the field as input, in order to generate statistical regularities in experimental data and research literature The assumptions, explanations, and arguments discussed in .
Related content:
The above is the detailed content of 'Father of Machine Learning' Mitchell writes: How AI accelerates scientific development and how the United States seizes opportunities. For more information, please follow other related articles on the PHP Chinese website!