假設您有一個在 Bun 中運行的 JavaScript 應用程序,並且您已經確定了一些想要優化的瓶頸。
用更高效能的語言重寫它可能正是您需要的解決方案。
作為現代 JS 運行時,Bun 支援外部函數介面 (FFI) 來呼叫以其他支援公開 C ABI 的語言編寫的函式庫,例如 C、C、Rust 和 Zig。
在這篇文章中,我們將討論如何使用它,並得出結論是否可以從中受益。
這個範例使用 Rust。使用 C 綁定建立共享庫在其他語言中看起來有所不同,但想法保持不變。
Bun 透過 Bun:ffi 模組公開其 FFI API。
入口點是一個 dlopen 函數。它採用絕對路徑或相對於目前工作目錄的路徑到庫檔案(對於Linux,副檔名為.so,對於macOS,副檔名為.dylib,對於Windows,副檔名為.dll)和一個物件您要匯入的函數的簽名。
它會傳回一個帶有 close 方法的對象,當不再需要函式庫時,您可以使用該方法關閉該函式庫;它還傳回一個包含您選擇的函數的物件符號屬性。
import { dlopen, FFIType, read, suffix, toArrayBuffer, type Pointer, } from "bun:ffi"; // Both your script and your library don't typically change their locations // Use `import.meta.dirname` to make your script independent from the cwd const DLL_PATH = import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`; function main() { // Deconstruct object to get functions // but collect `close` method into object // to avoid using `this` in a wrong scope const { symbols: { do_work }, ...dll } = dlopen(DLL_PATH, { do_work: { args: [FFIType.ptr, FFIType.ptr, "usize", "usize"], returns: FFIType.void, }, }); /* ... */ // It is unclear whether it is required or recommended to call `close` // an example says `JSCallback` instances specifically need to be closed // Note that using `symbols` after calling `close` is undefined behaviour dll.close(); } main();
如您可能注意到的,bun 透過 FFI 接受的支援類型僅限於數字,包括指標。
值得注意的是,支援的類型清單中缺少 size_t 或 usize,儘管它的程式碼自 Bun 版本 1.1.34 起就已存在。
Bun 在傳遞比 C 字串更複雜的資料時不提供任何幫助。這意味著您必須自己使用指標。
讓我們看看如何將指標從 JavaScript 傳遞到 Rust ...
{ reconstruct_slice: { args: [FFIType.ptr, "usize"], returns: FFIType.void, }, } const array = new BigInt64Array([0, 1, 3]); // Bun automatically converts `TypedArray`s into pointers reconstruct_slice(array, array.length);
/// Reconstruct a `slice` that was initialized in JavaScript unsafe fn reconstruct_slice( array_ptr: *const i64, length: libc::size_t, ) -> &[i64] { // Even though here it's not null, it's good practice to check assert!(!array_ptr.is_null()); // Unaligned pointer can lead to undefined behaviour assert!(array_ptr.is_aligned()); // Check that the array doesn't "wrap around" the address space assert!(length < usize::MAX / 4); let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) }; }
...以及如何將指標從 Rust 返回 JavaScript。
{ allocate_buffer: { args: [], returns: FFIType.ptr, }, as_pointer: { args: ["usize"], returns: FFIType.ptr, }, } // Hardcoding this value for 64-bit systems const BYTES_IN_PTR = 8; const box: Pointer = allocate_buffer()!; const ptr: number = read.ptr(box); // Reading the value next to `ptr` const length: number = read.ptr(box, BYTES_IN_PTR); // Hardcoding `byteOffset` to be 0 because Rust guarantees that // Buffer holds `i32` values which take 4 bytes // Note how we need to call a no-op function `as_pointer` because // `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number` const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
#[no_mangle] pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> { let buffer: Vec<i32> = vec![0; 10]; let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer); let ptr: *const i32 = memory.as_ptr(); let length: usize = memory.len(); // Unlike a `Vec`, `Box` is FFI compatible and will not drop // its data when crossing the FFI // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer Box::new([ptr as usize, length]) } #[no_mangle] pub const extern "C" fn as_pointer(ptr: usize) -> usize { ptr }
Rust 不知道 JS 正在獲取另一端資料的所有權,因此您必須明確告訴它不要使用 ManuallyDrop 釋放堆上的資料。其他管理記憶體的語言也必須做類似的事情。
正如我們所看到的,在 JS 和 Rust 中都可以分配內存,並且都不能安全地管理其他內存。
讓我們選擇應該在何處分配記憶體以及如何分配記憶體。
有 3 種方法可以將記憶體清理從 JS 委託給 Rust,每種方法都有其優點和缺點。
透過追蹤 JavaScript 中的對象,使用 FinalizationRegistry 在垃圾回收期間請求清理回呼。
import { dlopen, FFIType, read, suffix, toArrayBuffer, type Pointer, } from "bun:ffi"; // Both your script and your library don't typically change their locations // Use `import.meta.dirname` to make your script independent from the cwd const DLL_PATH = import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`; function main() { // Deconstruct object to get functions // but collect `close` method into object // to avoid using `this` in a wrong scope const { symbols: { do_work }, ...dll } = dlopen(DLL_PATH, { do_work: { args: [FFIType.ptr, FFIType.ptr, "usize", "usize"], returns: FFIType.void, }, }); /* ... */ // It is unclear whether it is required or recommended to call `close` // an example says `JSCallback` instances specifically need to be closed // Note that using `symbols` after calling `close` is undefined behaviour dll.close(); } main();
{ reconstruct_slice: { args: [FFIType.ptr, "usize"], returns: FFIType.void, }, } const array = new BigInt64Array([0, 1, 3]); // Bun automatically converts `TypedArray`s into pointers reconstruct_slice(array, array.length);
將垃圾收集追蹤委託給bun以呼叫清理回調。
當向 toArrayBuffer 傳遞 4 個參數時,第 4 個參數必須是要在清理時呼叫的 C 函數。
但是,當傳遞 5 個參數時,第 5 個參數是函數,第 4 個參數必須是傳遞它的上下文指標。
/// Reconstruct a `slice` that was initialized in JavaScript unsafe fn reconstruct_slice( array_ptr: *const i64, length: libc::size_t, ) -> &[i64] { // Even though here it's not null, it's good practice to check assert!(!array_ptr.is_null()); // Unaligned pointer can lead to undefined behaviour assert!(array_ptr.is_aligned()); // Check that the array doesn't "wrap around" the address space assert!(length < usize::MAX / 4); let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) }; }
{ allocate_buffer: { args: [], returns: FFIType.ptr, }, as_pointer: { args: ["usize"], returns: FFIType.ptr, }, } // Hardcoding this value for 64-bit systems const BYTES_IN_PTR = 8; const box: Pointer = allocate_buffer()!; const ptr: number = read.ptr(box); // Reading the value next to `ptr` const length: number = read.ptr(box, BYTES_IN_PTR); // Hardcoding `byteOffset` to be 0 because Rust guarantees that // Buffer holds `i32` values which take 4 bytes // Note how we need to call a no-op function `as_pointer` because // `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number` const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
當你不再需要記憶體時,自己刪除它即可。
幸運的是,TypeScript 有一個非常有用的 Disposable 介面和 using 關鍵字。
它相當於 Python 的 with 或 C# 的 using 關鍵字。
查看文件
#[no_mangle] pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> { let buffer: Vec<i32> = vec![0; 10]; let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer); let ptr: *const i32 = memory.as_ptr(); let length: usize = memory.len(); // Unlike a `Vec`, `Box` is FFI compatible and will not drop // its data when crossing the FFI // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer Box::new([ptr as usize, length]) } #[no_mangle] pub const extern "C" fn as_pointer(ptr: usize) -> usize { ptr }
{ drop_buffer: { args: [FFIType.ptr], returns: FFIType.void, }, } const registry = new FinalizationRegistry((box: Pointer): void => { drop_buffer(box); }); registry.register(buffer, box);
這更簡單、更安全,因為系統會為您處理取消分配。
但是,有一個很大的缺點。
由於您無法在 Rust 中管理 JavaScript 的內存,因此您無法超過緩衝區的容量,因為這會導致釋放。這意味著在將緩衝區大小傳遞給 Rust 之前,您必須知道緩衝區大小。
事先不知道需要多少緩衝區也會產生大量開銷,因為您將透過 FFI 來回進行分配。
/// # Safety /// /// This call assumes neither the box nor the buffer have been mutated in JS #[no_mangle] pub unsafe extern "C" fn drop_buffer(raw: *mut [usize; 2]) { let box_: Box<[usize; 2]> = unsafe { Box::from_raw(raw) }; let ptr: *mut i32 = box_[0] as *mut i32; let length: usize = box_[1]; let buffer: Vec<i32> = unsafe { Vec::from_raw_parts(ptr, length, length) }; drop(buffer); }
{ box_value: { args: ["usize"], returns: FFIType.ptr, }, drop_box: { args: [FFIType.ptr], returns: FFIType.void, }, drop_buffer: { args: [FFIType.ptr, FFIType.ptr], returns: FFIType.void, }, } // Bun expects the context to specifically be a pointer const finalizationCtx: Pointer = box_value(length)!; // Note that despite the presence of these extra parameters in the docs, // they're absent from `@types/bun` //@ts-expect-error see above const buffer = toArrayBuffer( as_pointer(ptr)!, 0, length * 4, //@ts-expect-error see above finalizationCtx, drop_buffer, ); // Don't leak the box used to pass buffer through FFI drop_box(box);
如果您期望庫的輸出是一個字串,您可能已經考慮過返回 u16 向量而不是字串的微優化,因為通常 JavaScript 引擎在底層使用 UTF-16。
但是,這將是一個錯誤,因為將字串轉換為 C 字串並使用 Bun 的 cstring 類型會稍微快一些。
這是使用一個不錯的基準測試庫 mitata 完成的基準測試
import { dlopen, FFIType, read, suffix, toArrayBuffer, type Pointer, } from "bun:ffi"; // Both your script and your library don't typically change their locations // Use `import.meta.dirname` to make your script independent from the cwd const DLL_PATH = import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`; function main() { // Deconstruct object to get functions // but collect `close` method into object // to avoid using `this` in a wrong scope const { symbols: { do_work }, ...dll } = dlopen(DLL_PATH, { do_work: { args: [FFIType.ptr, FFIType.ptr, "usize", "usize"], returns: FFIType.void, }, }); /* ... */ // It is unclear whether it is required or recommended to call `close` // an example says `JSCallback` instances specifically need to be closed // Note that using `symbols` after calling `close` is undefined behaviour dll.close(); } main();
{ reconstruct_slice: { args: [FFIType.ptr, "usize"], returns: FFIType.void, }, } const array = new BigInt64Array([0, 1, 3]); // Bun automatically converts `TypedArray`s into pointers reconstruct_slice(array, array.length);
/// Reconstruct a `slice` that was initialized in JavaScript unsafe fn reconstruct_slice( array_ptr: *const i64, length: libc::size_t, ) -> &[i64] { // Even though here it's not null, it's good practice to check assert!(!array_ptr.is_null()); // Unaligned pointer can lead to undefined behaviour assert!(array_ptr.is_aligned()); // Check that the array doesn't "wrap around" the address space assert!(length < usize::MAX / 4); let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) }; }
是時候解決WebAssembly這個房間裡的大象了。
您是否應該選擇現有的 WASM 綁定而不是處理 C ABI?
答案是可能都不是。
將另一種語言引入到您的程式碼庫中需要的不僅僅是一個瓶頸,在 DX 方面和效能方面都是值得的。
這是 JS、WASM 和 Rust 中簡單範圍函數的基準。
{ allocate_buffer: { args: [], returns: FFIType.ptr, }, as_pointer: { args: ["usize"], returns: FFIType.ptr, }, } // Hardcoding this value for 64-bit systems const BYTES_IN_PTR = 8; const box: Pointer = allocate_buffer()!; const ptr: number = read.ptr(box); // Reading the value next to `ptr` const length: number = read.ptr(box, BYTES_IN_PTR); // Hardcoding `byteOffset` to be 0 because Rust guarantees that // Buffer holds `i32` values which take 4 bytes // Note how we need to call a no-op function `as_pointer` because // `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number` const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
#[no_mangle] pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> { let buffer: Vec<i32> = vec![0; 10]; let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer); let ptr: *const i32 = memory.as_ptr(); let length: usize = memory.len(); // Unlike a `Vec`, `Box` is FFI compatible and will not drop // its data when crossing the FFI // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer Box::new([ptr as usize, length]) } #[no_mangle] pub const extern "C" fn as_pointer(ptr: usize) -> usize { ptr }
{ drop_buffer: { args: [FFIType.ptr], returns: FFIType.void, }, } const registry = new FinalizationRegistry((box: Pointer): void => { drop_buffer(box); }); registry.register(buffer, box);
原生函式庫勉強擊敗了 WASM,並且一直輸給純 TypeScript 實作。
這就是本關於 Bun:ffi 模組的教學/探索。希望我們都已經擺脫了這個問題,並受到了更多的教育。
歡迎在評論中分享想法和問題
以上是如何以及應該使用 Bun FFI的詳細內容。更多資訊請關注PHP中文網其他相關文章!