假设您有一个在 Bun 中运行的 JavaScript 应用程序,并且您已经确定了一些想要优化的瓶颈。
用性能更高的语言重写它可能正是您需要的解决方案。
作为现代 JS 运行时,Bun 支持外部函数接口 (FFI) 来调用用其他支持公开 C ABI 的语言编写的库,例如 C、C、Rust 和 Zig。
在这篇文章中,我们将讨论如何使用它,并得出结论是否可以从中受益。
这个例子使用 Rust。使用 C 绑定创建共享库在其他语言中看起来有所不同,但想法保持不变。
Bun 通过 Bun:ffi 模块公开其 FFI API。
入口点是一个 dlopen 函数。它采用绝对路径或相对于当前工作目录的路径到库文件(对于Linux,扩展名为.so,对于macOS,扩展名为.dylib,对于Windows,扩展名为.dll)和一个对象您要导入的函数的签名。
它返回一个带有 close 方法的对象,当不再需要库时,您可以使用该方法关闭该库;它还返回一个包含您选择的函数的对象符号属性。
import { dlopen, FFIType, read, suffix, toArrayBuffer, type Pointer, } from "bun:ffi"; // Both your script and your library don't typically change their locations // Use `import.meta.dirname` to make your script independent from the cwd const DLL_PATH = import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`; function main() { // Deconstruct object to get functions // but collect `close` method into object // to avoid using `this` in a wrong scope const { symbols: { do_work }, ...dll } = dlopen(DLL_PATH, { do_work: { args: [FFIType.ptr, FFIType.ptr, "usize", "usize"], returns: FFIType.void, }, }); /* ... */ // It is unclear whether it is required or recommended to call `close` // an example says `JSCallback` instances specifically need to be closed // Note that using `symbols` after calling `close` is undefined behaviour dll.close(); } main();
正如您可能注意到的,bun 通过 FFI 接受的支持类型仅限于数字,包括指针。
值得注意的是,支持的类型列表中缺少 size_t 或 usize,尽管它的代码自 Bun 版本 1.1.34 起就已存在。
Bun 在传递比 C 字符串更复杂的数据时不提供任何帮助。这意味着您必须自己使用指针。
让我们看看如何将指针从 JavaScript 传递到 Rust ...
{ reconstruct_slice: { args: [FFIType.ptr, "usize"], returns: FFIType.void, }, } const array = new BigInt64Array([0, 1, 3]); // Bun automatically converts `TypedArray`s into pointers reconstruct_slice(array, array.length);
/// Reconstruct a `slice` that was initialized in JavaScript unsafe fn reconstruct_slice( array_ptr: *const i64, length: libc::size_t, ) -> &[i64] { // Even though here it's not null, it's good practice to check assert!(!array_ptr.is_null()); // Unaligned pointer can lead to undefined behaviour assert!(array_ptr.is_aligned()); // Check that the array doesn't "wrap around" the address space assert!(length < usize::MAX / 4); let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) }; }
...以及如何将指针从 Rust 返回到 JavaScript。
{ allocate_buffer: { args: [], returns: FFIType.ptr, }, as_pointer: { args: ["usize"], returns: FFIType.ptr, }, } // Hardcoding this value for 64-bit systems const BYTES_IN_PTR = 8; const box: Pointer = allocate_buffer()!; const ptr: number = read.ptr(box); // Reading the value next to `ptr` const length: number = read.ptr(box, BYTES_IN_PTR); // Hardcoding `byteOffset` to be 0 because Rust guarantees that // Buffer holds `i32` values which take 4 bytes // Note how we need to call a no-op function `as_pointer` because // `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number` const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
#[no_mangle] pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> { let buffer: Vec<i32> = vec![0; 10]; let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer); let ptr: *const i32 = memory.as_ptr(); let length: usize = memory.len(); // Unlike a `Vec`, `Box` is FFI compatible and will not drop // its data when crossing the FFI // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer Box::new([ptr as usize, length]) } #[no_mangle] pub const extern "C" fn as_pointer(ptr: usize) -> usize { ptr }
Rust 不知道 JS 正在获取另一端数据的所有权,因此您必须明确告诉它不要使用 ManuallyDrop 释放堆上的数据。其他管理内存的语言也必须做类似的事情。
正如我们所看到的,在 JS 和 Rust 中都可以分配内存,并且都不能安全地管理其他内存。
让我们选择应该在何处分配内存以及如何分配内存。
有 3 种方法可以将内存清理从 JS 委托给 Rust,每种方法都有其优点和缺点。
通过跟踪 JavaScript 中的对象,使用 FinalizationRegistry 在垃圾回收期间请求清理回调。
import { dlopen, FFIType, read, suffix, toArrayBuffer, type Pointer, } from "bun:ffi"; // Both your script and your library don't typically change their locations // Use `import.meta.dirname` to make your script independent from the cwd const DLL_PATH = import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`; function main() { // Deconstruct object to get functions // but collect `close` method into object // to avoid using `this` in a wrong scope const { symbols: { do_work }, ...dll } = dlopen(DLL_PATH, { do_work: { args: [FFIType.ptr, FFIType.ptr, "usize", "usize"], returns: FFIType.void, }, }); /* ... */ // It is unclear whether it is required or recommended to call `close` // an example says `JSCallback` instances specifically need to be closed // Note that using `symbols` after calling `close` is undefined behaviour dll.close(); } main();
{ reconstruct_slice: { args: [FFIType.ptr, "usize"], returns: FFIType.void, }, } const array = new BigInt64Array([0, 1, 3]); // Bun automatically converts `TypedArray`s into pointers reconstruct_slice(array, array.length);
将垃圾收集跟踪委托给bun以调用清理回调。
当向 toArrayBuffer 传递 4 个参数时,第 4 个参数必须是要在清理时调用的 C 函数。
但是,当传递 5 个参数时,第 5 个参数是函数,第 4 个参数必须是传递它的上下文指针。
/// Reconstruct a `slice` that was initialized in JavaScript unsafe fn reconstruct_slice( array_ptr: *const i64, length: libc::size_t, ) -> &[i64] { // Even though here it's not null, it's good practice to check assert!(!array_ptr.is_null()); // Unaligned pointer can lead to undefined behaviour assert!(array_ptr.is_aligned()); // Check that the array doesn't "wrap around" the address space assert!(length < usize::MAX / 4); let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) }; }
{ allocate_buffer: { args: [], returns: FFIType.ptr, }, as_pointer: { args: ["usize"], returns: FFIType.ptr, }, } // Hardcoding this value for 64-bit systems const BYTES_IN_PTR = 8; const box: Pointer = allocate_buffer()!; const ptr: number = read.ptr(box); // Reading the value next to `ptr` const length: number = read.ptr(box, BYTES_IN_PTR); // Hardcoding `byteOffset` to be 0 because Rust guarantees that // Buffer holds `i32` values which take 4 bytes // Note how we need to call a no-op function `as_pointer` because // `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number` const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
当你不再需要内存时,自己删除它即可。
幸运的是,TypeScript 有一个非常有用的 Disposable 接口和 using 关键字。
它相当于 Python 的 with 或 C# 的 using 关键字。
查看文档
#[no_mangle] pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> { let buffer: Vec<i32> = vec![0; 10]; let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer); let ptr: *const i32 = memory.as_ptr(); let length: usize = memory.len(); // Unlike a `Vec`, `Box` is FFI compatible and will not drop // its data when crossing the FFI // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer Box::new([ptr as usize, length]) } #[no_mangle] pub const extern "C" fn as_pointer(ptr: usize) -> usize { ptr }
{ drop_buffer: { args: [FFIType.ptr], returns: FFIType.void, }, } const registry = new FinalizationRegistry((box: Pointer): void => { drop_buffer(box); }); registry.register(buffer, box);
这更简单、更安全,因为系统会为您处理取消分配。
但是,有一个很大的缺点。
由于您无法在 Rust 中管理 JavaScript 的内存,因此您无法超过缓冲区的容量,因为这会导致释放。这意味着在将缓冲区大小传递给 Rust 之前,您必须知道缓冲区大小。
事先不知道需要多少缓冲区也会产生大量开销,因为您将通过 FFI 来回进行分配。
/// # Safety /// /// This call assumes neither the box nor the buffer have been mutated in JS #[no_mangle] pub unsafe extern "C" fn drop_buffer(raw: *mut [usize; 2]) { let box_: Box<[usize; 2]> = unsafe { Box::from_raw(raw) }; let ptr: *mut i32 = box_[0] as *mut i32; let length: usize = box_[1]; let buffer: Vec<i32> = unsafe { Vec::from_raw_parts(ptr, length, length) }; drop(buffer); }
{ box_value: { args: ["usize"], returns: FFIType.ptr, }, drop_box: { args: [FFIType.ptr], returns: FFIType.void, }, drop_buffer: { args: [FFIType.ptr, FFIType.ptr], returns: FFIType.void, }, } // Bun expects the context to specifically be a pointer const finalizationCtx: Pointer = box_value(length)!; // Note that despite the presence of these extra parameters in the docs, // they're absent from `@types/bun` //@ts-expect-error see above const buffer = toArrayBuffer( as_pointer(ptr)!, 0, length * 4, //@ts-expect-error see above finalizationCtx, drop_buffer, ); // Don't leak the box used to pass buffer through FFI drop_box(box);
如果您期望库的输出是一个字符串,您可能已经考虑过返回 u16 向量而不是字符串的微优化,因为通常 JavaScript 引擎在底层使用 UTF-16。
但是,这将是一个错误,因为将字符串转换为 C 字符串并使用 Bun 的 cstring 类型会稍微快一些。
这是使用一个不错的基准测试库 mitata 完成的基准测试
import { dlopen, FFIType, read, suffix, toArrayBuffer, type Pointer, } from "bun:ffi"; // Both your script and your library don't typically change their locations // Use `import.meta.dirname` to make your script independent from the cwd const DLL_PATH = import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`; function main() { // Deconstruct object to get functions // but collect `close` method into object // to avoid using `this` in a wrong scope const { symbols: { do_work }, ...dll } = dlopen(DLL_PATH, { do_work: { args: [FFIType.ptr, FFIType.ptr, "usize", "usize"], returns: FFIType.void, }, }); /* ... */ // It is unclear whether it is required or recommended to call `close` // an example says `JSCallback` instances specifically need to be closed // Note that using `symbols` after calling `close` is undefined behaviour dll.close(); } main();
{ reconstruct_slice: { args: [FFIType.ptr, "usize"], returns: FFIType.void, }, } const array = new BigInt64Array([0, 1, 3]); // Bun automatically converts `TypedArray`s into pointers reconstruct_slice(array, array.length);
/// Reconstruct a `slice` that was initialized in JavaScript unsafe fn reconstruct_slice( array_ptr: *const i64, length: libc::size_t, ) -> &[i64] { // Even though here it's not null, it's good practice to check assert!(!array_ptr.is_null()); // Unaligned pointer can lead to undefined behaviour assert!(array_ptr.is_aligned()); // Check that the array doesn't "wrap around" the address space assert!(length < usize::MAX / 4); let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) }; }
是时候解决WebAssembly这个房间里的大象了。
您是否应该选择现有的 WASM 绑定而不是处理 C ABI?
答案是可能都不是。
将另一种语言引入到您的代码库中需要的不仅仅是一个瓶颈,在 DX 方面和性能方面都是值得的。
这是 JS、WASM 和 Rust 中简单范围函数的基准。
{ allocate_buffer: { args: [], returns: FFIType.ptr, }, as_pointer: { args: ["usize"], returns: FFIType.ptr, }, } // Hardcoding this value for 64-bit systems const BYTES_IN_PTR = 8; const box: Pointer = allocate_buffer()!; const ptr: number = read.ptr(box); // Reading the value next to `ptr` const length: number = read.ptr(box, BYTES_IN_PTR); // Hardcoding `byteOffset` to be 0 because Rust guarantees that // Buffer holds `i32` values which take 4 bytes // Note how we need to call a no-op function `as_pointer` because // `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number` const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
#[no_mangle] pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> { let buffer: Vec<i32> = vec![0; 10]; let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer); let ptr: *const i32 = memory.as_ptr(); let length: usize = memory.len(); // Unlike a `Vec`, `Box` is FFI compatible and will not drop // its data when crossing the FFI // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer Box::new([ptr as usize, length]) } #[no_mangle] pub const extern "C" fn as_pointer(ptr: usize) -> usize { ptr }
{ drop_buffer: { args: [FFIType.ptr], returns: FFIType.void, }, } const registry = new FinalizationRegistry((box: Pointer): void => { drop_buffer(box); }); registry.register(buffer, box);
原生库勉强击败了 WASM,并且一直输给纯 TypeScript 实现。
这就是本关于 Bun:ffi 模块的教程/探索。希望我们都已经摆脱了这个问题,并受到了更多的教育。
欢迎在评论中分享想法和问题
以上是如何以及应该使用 Bun FFI的详细内容。更多信息请关注PHP中文网其他相关文章!