Home > Article > Technology peripherals > The United States has spent 2.6 billion US dollars on artificial intelligence...It is expected to complete the construction of NAIRR within 6 years
Artificial intelligence is a strategic technology leading a new round of technological revolution and industrial transformation. Multiple research results and data show that the United States leads the world in basic scientific research, technological innovation, and industrial applications of artificial intelligence. Indicators such as high-level artificial intelligence papers, the number of top scholars, the number of artificial intelligence enterprises, and investment scale are all ahead of other countries.
The U.S. government attaches great importance to the innovation and development of artificial intelligence technology. According to the National AI Initiative Act of 2020, Congress requires the National Science Foundation (NSF) and the White House Office of Science and Technology Policy (OSTP) to form a working group to study and formulate the United States in January 2023. The National Artificial Intelligence Research Resource (NAIRR) infrastructure construction roadmap consolidates the United States’ competitive advantage in the field of artificial intelligence, expands opportunities for all parties in the United States to obtain key artificial intelligence and educational resources, and further drives U.S. artificial intelligence innovation and economic prosperity.
The background and significance of the construction of NAIRR in the United States
Construction background
The U.S. government believes that its leading edge in the field of artificial intelligence is being challenged and its competitive advantage is at risk of being weakened. There are mainly two problems. First, investment in artificial intelligence R&D and educational resources are unevenly distributed. Research data shows that from an investment perspective, the amount of artificial intelligence investment from the private sector in the United States has more than doubled from 2020 to 2021, but the number of new artificial intelligence companies is declining; from a talent perspective, the population and race of U.S. artificial intelligence doctoral graduates The distribution, gender distribution and the actual proportion of the population are quite different, which will restrict the innovation and development of artificial intelligence. Second, scientific research institutions have insufficient computing resources and data resources. From the perspective of computing power, the most advanced computing power platforms are owned by industry-leading private institutions, and scientific research institutions lack computing power platforms to support artificial intelligence research and development; from the perspective of data resources, the main data resources for artificial intelligence model training are owned by private institutions and large-scale Internet All platforms. Although the U.S. government continues to open data, it is still insufficient for artificial intelligence research.
The working group pointed out that the lack of sufficient artificial intelligence research resources will limit the U.S. artificial intelligence innovation ecosystem, leading to the concentration of top talents from academic research institutions to a small number of resource-rich companies. This trend, if established in the long term, will affect the competitiveness and competitiveness of the United States. Innovation. In January 2023, after 18 months of public solicitation of opinions and discussions, the working group formally proposed a construction plan and planned to apply for US$2.6 billion in construction and operation and maintenance funds. It plans to complete the NAIRR construction work in four phases within 6 years, focusing on achieving four major Goal: Gather resources to promote research innovation, enhance talent diversity, enhance basic resource capabilities, and promote the development of trusted artificial intelligence.
Significance
NAIRR, as an artificial intelligence research infrastructure, is open to American research schools, students, non-profit organizations and other institutions, providing computing resources, high-quality data, educational tools and other basic research resources. The platform is expected to become a U.S. artificial intelligence research cooperation as a key hub to consolidate its international competitive advantage.
In terms of ecological construction, the U.S. government will rely on NAIRR to unite relevant internal government departments and scientific research institutions to jointly carry out cooperative research and resource construction in the field of artificial intelligence to form a broad cooperative ecosystem. NAIRR services and functions are shown in Figure 1.
In terms of data, NAIRR will aggregate data from federal government departments and carry out data service cooperation with various institutions in the industry. The first is to promote the aggregation, development and utilization of large-scale artificial intelligence data resources. It will gather and connect the large-scale data resources that have been open sourced by US federal agencies, academic research institutions and technology giants to become the largest artificial intelligence data resource service platform in the United States. For example, the US National Institutes of Health has released more than 36PB of gene sequencing data, and the US Oceanic and Atmospheric Administration has released more than 10PB of weather and environmental data. The second is to promote the improvement of artificial intelligence data management and governance capabilities. Artificial intelligence data sets are highly fragmented. Each data set supports specialized tasks and research fields. There is a lack of unified standards for data annotation and data governance, making data management difficult. NAIRR will promote the establishment of unified standards for data aggregation, standardize data description formats, and promote the aggregation of multi-party data resources. The third is to promote the development and utilization of data resources through multi-party collaboration. The operating entity will operate the artificial intelligence data set community and encourage the community to actively develop and build valuable data resources for NAIRR to use. The operating entity will also provide data search services to facilitate querying federal agency open data and data resources from third-party service providers.
In terms of computing power, NAIRR will join forces with major U.S. artificial intelligence computing cloud platform companies to build a computing power platform, and plans to connect with the cloud platforms of technology giants such as Google, Microsoft, and Amazon, as well as the U.S. Natural Science Foundation, the U.S. National Institutes of Health, etc. Cloud platform for federal agencies. The platform provides different levels of service models and content for universities, research institutions, students, and start-ups, including a variety of services and resources such as data, computing power, test beds, and software tools. After completion, NAIRR's computing resources will include supercomputers that support at least one trillion parameter scale machine learning model training, as well as cloud computing resources, CPUs, GPUs and high-speed networks.
After the NAIRR infrastructure is established and operates stably, on the one hand, it will continue to expand cooperation with government departments and private institutions, expand the scope of platform services and users, and promote successful experiences; on the other hand, the platform will promote the formulation of relevant standards and specifications , participate in international exchanges and cooperation, serve as a basic platform for the United States and its allies and partners, and promote cooperative research and data sharing.
American NAIRR Construction Plan
The United States plans to use a systematic approach to mobilize the federal government and private institutions to work together to establish an artificial intelligence research resource infrastructure for academic research.
The first is to plan and build a platform governance system with multi-party participation. The NAIRR proposed governance structure is shown in Figure 2. The plan recommends establishing a governance system with multi-party participation from government departments, and establishing a series of responsible organizations such as a steering committee, management committee, project management office, operating entity, and advisory committee to coordinate cooperation. Establish a Steering Committee, composed of representatives from various federal government departments and agencies. It is the highest decision-making body at the national level for overall planning and strategic goals for NAIRR. It represents various departments to promote the country’s resource investment in the field of artificial intelligence. A management committee is established to guide and manage the platform operating entities, as well as provide funds and related resources. The plan proposes that NSF assume the responsibilities of the management committee. Establish a project management office to cooperate with the steering committee in daily management and evaluation of operating entities. The U.S. Congress has approved funding for the Project Management Office to support related project management, portal development and deployment, joint support, training and user support. Establish an operating entity that is independent of government departments and is responsible for formulating specific development goals for NAIRR, organizing platform construction and daily operation management, and formulating a transparent, fair and reasonable resource allocation system to meet the needs of various artificial intelligence research institutions and users. A scientific committee, technical committee, ethics committee, and user committee composed of experts in multiple fields have been established to provide decision-making support for the construction of NAIRR.
The second is to provide dedicated funds for NAIRR infrastructure operation and construction. The construction plan proposes to apply for US$2.6 billion in funding over six years, of which US$2.25 billion will be used to purchase platform computing power, software tools and data resources from service providers. The daily expenses of the operating organization will be US$370 million, and an additional US$30 million will be used for infrastructure operations. Situation assessment. All federal agencies involved in artificial intelligence research and development should participate in NAIRR’s project management. R&D investment by federal departments in the field of artificial intelligence can still be purchased and developed by each agency alone or cooperatively, but it should be managed and provided through the NAIRR infrastructure.
The third is to build NAIRR infrastructure in stages, expand computing resources as needed, and promote the aggregation of data resources. Platform construction is divided into four stages: project initiation, construction, trial operation and continuous operation. The trial operation phase will be able to support 50,000 users and can aggregate and use existing federal agency data and private agency data. After stable operation, it will support 150,000 users and establish a broader data resource cooperation community. NAIRR will develop data resources to facilitate data utilization by formulating data aggregation standards, data cooperative development, and providing data search services.
Under the new situation, the importance of building basic research resources for artificial intelligence has become increasingly prominent
Currently, new technologies and new applications of artificial intelligence are constantly emerging. The research and training of a new generation of large artificial intelligence models represented by the large language model ChatGPT require the support of larger-scale computing resources and data resources, and a single R&D investment has increased significantly. The computing power platform threshold for large artificial intelligence model training is extremely high, and ordinary institutions cannot afford huge R&D expenses and operating expenses. OpenAI research points out that the computing power required for training artificial intelligence models has increased exponentially. From 2012 to 2018, the computing power consumed in training AI models increased by 300,000 times. The computing power required to train GPT3 reaches 3640pfsday (that is, 1PetaFLOP/s efficiency runs for 3640 days), and the training cost is expected to reach US$1.4 million per time. Some organizations estimate that the initial investment cost of ChatGPT is about US$800 million.
In terms of artificial intelligence data sets, with the research and development of large pre-trained models, the size of the data sets required for training has further increased significantly. The data size has increased from millions or tens of millions in the past to hundreds of millions. The current data sets used in large model training mainly come from the Internet, including databases such as Wikipedia, social networking sites, public journals, books, papers, and codes. Some studies have pointed out that "training data will become one of the biggest constraints to the industrialization of large models. From a deeper perspective, large models still have various governance problems in terms of training data, such as data collection and labeling that is time-consuming, laborious and costly, and data quality is difficult There are insufficient guarantees and data diversification to cover the "long tail" and edge cases, and there are issues such as privacy protection and data bias in the acquisition, use and sharing of specific data." Research by foreign scholars believes that the overall scale of language data is growing at a rate of 7% ; The growth of high-quality language data is subject to factors such as population size and economic development, growing at a rate of 4% to 5%. High-quality data for training large language models will be "exhausted" by 2027.
Summary
Computing power and data resources are the basic supporting elements for artificial intelligence technology research. As artificial intelligence enters the "big model" era, computing power and data capabilities have become limiting factors for algorithm model research and training. The NAIRR infrastructure being built in the United States is conducive to solving the new challenges faced by the current innovation and development of artificial intelligence technology, and has certain reference significance for my country. my country should strengthen overall planning and coordination, accelerate the construction of computing infrastructure and data basic resources, and develop the data element market , encourage the gathering and circulation of data resources, and promote basic technology research and application innovation of artificial intelligence.
END
Author: China Academy of Information and Communications Technology Data Research Center Lu Yapeng Wang Weiguo
Editor/Format: Gai Beibei
Reviewer: Shu Wenqiong
Producer: Liu Qicheng
Likes and views are all here
The above is the detailed content of The United States has spent 2.6 billion US dollars on artificial intelligence...It is expected to complete the construction of NAIRR within 6 years. For more information, please follow other related articles on the PHP Chinese website!