Home  >  Article  >  Technology peripherals  >  How AI is changing data center design

How AI is changing data center design

王林
王林forward
2023-10-11 14:49:10928browse

How AI is changing data center design

With global spending on AI systems set to double from 2023 to 2026, it’s clear that data center capacity will increase rapidly to meet demand.

Surprisingly, however, many data center operators have put the brakes on new projects over the past year, slowing investment, with London's vacant capacity falling in 2022-23 6.3%.

What’s behind this counterintuitive trend? To explain this, we need to understand some of the issues surrounding AI computing and the infrastructure that supports it.

How AI is changing data center infrastructure

Data centers have historically been built around racks using CPUs to handle traditional computing workloads, however, AI computing instead requires the use of GPU drivers Rack, compared with the same CPU capacity, it consumes more power, releases more heat, and takes up more space.

In practice, this means that AI computing power often requires more power connections or alternative cooling systems.

This is because the embedded infrastructure is built into the fabric of the data center complex, so the cost of replacing it is often very high unless it is completely economically unfeasible

In practice, Operators must ensure that their new data centers have a certain amount of space dedicated to the "split" between AI and traditional computing Center operators impose a permanently underutilized and unprofitable burden

A problem exacerbated by the fact that the AI ​​market is still in its infancy. According to Gartner, we are currently in the midst of a hype cycle where expectations are overblown. As a result, many operators are choosing to stay on the sidelines during the design phase rather than commit too early to investing heavily in AI computing in new data center projects

Take a comprehensive approach during the design phase

However, operators Acutely aware that they can only risk delaying investment before losing market share and competitive advantage, but considering that many fundamental elements of data center infrastructure are being rewritten in real time, this is a tall order.

To meet the need to be first movers while offsetting risks, operators need to design their data centers for maximum efficiency and resiliency in the era of AI computing, which requires a new holistic design approach.

1. Involve more stakeholders

Regardless of the exact separation between AI and traditional computing that operators decide, data center sites with AI computing capabilities are expected to be significantly more complex than traditional facilities. many. More complexity often means more points of failure, especially when AI computing has more demands than traditional computing.

So, to guarantee uptime and reduce the risk of costly problems over the life of the site, teams need to be more thorough in the data center planning phase.

At the beginning of a project, especially during the design phase, input from the wider team and expertise should be sought. In addition to seeking power and cooling expertise, designers should engage with operations, cabling, and security teams early to understand potential sources of bottlenecks or failures

2. Integrating AI into data center operations

Since operators now perform AI computing in the field, they should leverage their capabilities to drive new efficiencies in their operations. AI has been adopted in data centers for a long time, and the technology is capable of undertaking workflows with extremely high precision and quality. For example, AI can help with: - Data analysis: AI can quickly and accurately analyze large amounts of data to provide valuable insights and decision support. - Customer service: AI can improve customer service experience by automatically answering frequently asked questions, providing personalized suggestions and solving problems. - Prediction and optimization: AI can use big data and machine learning algorithms to predict future trends and optimize operational strategies. - Security assurance: AI can identify and block potential network attacks and provide real-time security monitoring and early warning. - Automated processes: AI can automate repetitive tasks, improve work efficiency and reduce human errors. By fully leveraging AI technology, operators can improve operational efficiency, provide better services, and gain an advantage in a highly competitive market

Temperature and Humidity Monitoring.
  • Security system operation.
  • Power consumption monitoring and distribution.
  • Hardware fault detection and predictive maintenance.
  • By proactively using this technology at every stage of the data center lifecycle, operators can significantly improve the efficiency and robustness of their operations. Artificial intelligence is well-suited to help address the new challenges of adopting the novel and complex layouts of these next-generation data centers, such as avoiding false savings through fault detection and predictive maintenance

3 # During peak periods, such as during training runs or when running enterprise-level models in production, AI places a greater load on the data center. During these periods, AI computing tends to significantly exceed traditional expectations in terms of power consumption, cooling requirements, and data throughput.

At the most basic level, this means that the underlying materials in the data center are under greater pressure. If these underlying materials or components are not of high quality, it means they are more likely to fail. Since AI computing means a dramatic increase in the number of components and connections at a site, this means that cheaper, lower quality materials that work well in traditional sites could bring the data centers running AI computing to a standstill

To avoid false economic risks, operators should avoid purchasing lower quality materials, such as substandard cables, to save money. These materials are prone to failure, requiring more frequent replacement, and most seriously, failure of substandard materials and components often results in a site's downtime or downtime, impacting its profitability. Therefore, operators should carefully select materials to ensure that they are of reliable quality See, that's not the case. Rewritten content: While the infrastructure requirements of AI computing may be the main reason why operators delay investment, in the long term, this is not entirely the case

As market uncertainty recedes, companies will converge on the split between traditional computing and AI computing in the data center into their “moderate zone.”

To ensure that companies have every possible advantage in running their website, they need to ensure they learn and mature as the situation evolves

This means designing holistically from the start, Leverage AI itself to discover new efficiencies for their sites and invest in high-quality components that can handle greater AI computing needs.

The above is the detailed content of How AI is changing data center design. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete