Home >Technology peripherals >AI >The first GPU high-level language, massive parallelism is like writing Python, has received 8500 stars

The first GPU high-level language, massive parallelism is like writing Python, has received 8500 stars

王林
王林Original
2024-06-07 12:03:581182browse

After nearly 10 years of unremitting efforts and in-depth research on the core of computer science, people have finally realized a dream: running high-level languages ​​on GPUs.

Last weekend, a programming language called Bend sparked heated discussions in the open source community, and the number of stars on GitHub has exceeded 8,500.

首个GPU高级语言,大规模并行就像写Python,已获8500 Star

##GitHub: https://github.com/HigherOrderCO/Bend

As a large A large-scale parallel high-level programming language, it is still in the research stage, but the ideas proposed have surprised people. With Bend you can write parallel code for multi-core CPUs/GPUs without having to be a C/CUDA expert with 10 years of experience, it just feels like Python!

首个GPU高级语言,大规模并行就像写Python,已获8500 Star

Yes, Bend adopts Python syntax.

Bend is a programming paradigm that supports expressive languages ​​such as Python and Haskell. It is different from low-level alternatives such as CUDA and Metal. Bend features fast object allocation, full closure support for higher-order functions, unlimited recursion, and near-linear speedup based on core count. Bend runs on massively parallel hardware and provides HVM2-based runtime support.

The main contributor to the project, Victor Taelin, is from Brazil. He shared the main features and development ideas of Bend on the X platform.

First of all, Bend is not suitable for modern machine learning algorithms, because these algorithms are highly regularized (matrix multiplication) and have pre-allocated memory, and are usually already written in good CUDA Kernel.

The huge advantage of Bend is in practical applications, because "real applications" usually don't have the budget to make dedicated GPU cores. Question, who made the website in CUDA? Moreover, even if someone did, it would not be feasible because:

1. A real application would need to import functions from many different libraries, and CUDA kernels cannot be written for them;

2. Real applications have dynamic functions and closures;

3. Real applications dynamically and unpredictably allocate large amounts of Memory.

Bend has completed some new attempts and can be quite fast in some cases, but it is definitely not possible to write a large language model now.

The author compared the old method with the new method, using the same algorithm tree for bitonic sorting, involving JSON allocation and manipulation. Node.js is 3.5 seconds (Apple M3 Max) and Bend is 0.5 seconds (NVIDIA RTX 4090).

Yes, currently Bend requires an entire GPU to beat Node.js on a single core. But on the other hand, this is still a nascent new approach compared to a JIT compiler that a big company (Google) has been optimizing for 16 years. There are many possibilities in the future.

How to use

On GitHub, the author briefly introduces the usage process of Bend.

First, install Rust. If you want to use the C runtime, install a C compiler (such as GCC or Clang); if you want to use the CUDA runtime, install CUDA toolkit (CUDA and nvcc) version 12.x. Bend currently only supports Nvidia GPUs.

Then, install HVM2 and Bend:

cargo +nightly install hvmcargo +nightly install bend-lang

Finally, write some Bend files and use one of the following commands Once you run it:

bend run<file.bend> # uses the Rust interpreter (sequential)bend run-c<file.bend> # uses the C interpreter (parallel)bend run-cu <file.bend> # uses the CUDA interpreter (massively parallel)

You can also use gen-c and gen-cu to compile Bend into a standalone C/CUDA file for optimal performance . But gen-c and gen-cu are still in their infancy and are far less mature than SOTA compilers like GCC and GHC.

Parallel Programming in Bend

Here are examples of programs that can be run in parallel in Bend. For example, the expression:

(((1 + 2) + 3) + 4)

cannot be run in parallel because + 4 depends on + 3, which in turn depends on (1+2). And the expression:

((1 + 2) + (3 + 4))

can be run in parallel because (1+2) and (3+4) are independent. The condition for Bend to run in parallel is to comply with parallel logic.

Let’s look at a more complete code example:

# Sorting Network = just rotate trees!def sort (d, s, tree):switch d:case 0:return treecase _:(x,y) = treelft = sort (d-1, 0, x)rgt = sort (d-1, 1, y)return rots (d, s, lft, rgt)# Rotates sub-trees (Blue/Green Box)def rots (d, s, tree):switch d:case 0:return treecase _:(x,y) = treereturn down (d, s, warp (d-1, s, x, y))(...)

该文件实现了具有不可变树旋转的双调排序器。它不是很多人期望的在 GPU 上快速运行的算法。然而,由于它使用本质上并行的分治方法,因此 Bend 会以多线程方式运行它。一些速度基准:

  •  CPU,Apple M3 Max,1 个线程:12.15 秒
  •  CPU,Apple M3 Max,16 线程:0.96 秒
  •  GPU,NVIDIA RTX 4090,16k 线程:0.21 秒

不执行任何操作即可实现 57 倍的加速。没有线程产生,没有锁、互斥锁的显式管理。我们只是要求 Bend 在 RTX 上运行我们的程序,就这么简单。

Bend 不限于特定范例,例如张量或矩阵。任何的并发系统,从着色器到类 Erlang 的 actor 模型都可以在 Bend 上进行模拟。例如,要实时渲染图像,我们可以简单地在每个帧上分配一个不可变的树:

# given a shader, returns a square imagedef render (depth, shader):bend d = 0, i = 0:when d < depth:color = (fork (d+1, i*2+0), fork (d+1, i*2+1))else:width = depth / 2color = shader (i % width, i /width)return color# given a position, returns a color# for this demo, it just busy loopsdef demo_shader (x, y):bend i = 0:when i < 5000:color = fork (i + 1)else:color = 0x000001return color# renders a 256x256 image using demo_shaderdef main:return render (16, demo_shader)

它确实会起作用,即使涉及的算法在 Bend 上也能很好地并行。长距离通信通过全局 beta 缩减(根据交互演算)执行,并通过 HVM2 的原子链接器正确有效地同步。

最后,作者表示 Bend 现在仅仅是第一个版本,还没有在合适的编译器上投入太多精力。大家可以预期未来每个版本的原始性能都会大幅提高。而现在,我们已经可以使用解释器,从 Python 高级语言的角度一睹大规模并行编程的样子了。


The above is the detailed content of The first GPU high-level language, massive parallelism is like writing Python, has received 8500 stars. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn