首頁  >  文章  >  web前端  >  如何以及應該使用 Bun FFI

如何以及應該使用 Bun FFI

Linda Hamilton
Linda Hamilton原創
2024-11-11 10:53:02770瀏覽

How to and Should you use Bun FFI

我們想要實現什麼目標

假設您有一個在 Bun 中運行的 JavaScript 應用程序,並且您已經確定了一些想要優化的瓶頸。
用更高效能的語言重寫它可能正是您需要的解決方案。

作為現代 JS 運行時,Bun 支援外部函數介面 (FFI) 來呼叫以其他支援公開 C ABI 的語言編寫的函式庫,例如 C、C、Rust 和 Zig。

在這篇文章中,我們將討論如何使用它,並得出結論是否可以從中受益。

如何將庫連結到 JavaScript

這個範例使用 Rust。使用 C 綁定建立共享庫在其他語言中看起來有所不同,但想法保持不變。

從JS端

Bun 透過 Bun:ffi 模組公開其 FFI API。

入口點是一個 dlopen 函數。它採用絕對路徑或相對於目前工作目錄的路徑到庫檔案(對於Linux,副檔名為.so,對於macOS,副檔名為.dylib,對於Windows,副檔名為.dll)和一個物件您要匯入的函數的簽名。
它會傳回一個帶有 close 方法的對象,當不再需要函式庫時,您可以使用該方法關閉該函式庫;它還傳回一個包含您選擇的函數的物件符號屬性。

import {
  dlopen,
  FFIType,
  read,
  suffix,
  toArrayBuffer,
  type Pointer,
} from "bun:ffi";

// Both your script and your library don't typically change their locations
// Use `import.meta.dirname` to make your script independent from the cwd
const DLL_PATH =
  import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`;

function main() {
  // Deconstruct object to get functions
  // but collect `close` method into object
  // to avoid using `this` in a wrong scope
  const {
    symbols: { do_work },
    ...dll
  } = dlopen(DLL_PATH, {
    do_work: {
      args: [FFIType.ptr, FFIType.ptr, "usize", "usize"],
      returns: FFIType.void,
    },
  });

  /* ... */

  // It is unclear whether it is required or recommended to call `close`
  // an example says `JSCallback` instances specifically need to be closed
  // Note that using `symbols` after calling `close` is undefined behaviour
  dll.close();
}

main();

透過 FFI 邊界傳遞數據

如您可能注意到的,bun 透過 FFI 接受的支援類型僅限於數字,包括指標。
值得注意的是,支援的類型清單中缺少 size_t 或 usize,儘管它的程式碼自 Bun 版本 1.1.34 起就已存在。

Bun 在傳遞比 C 字串更複雜的資料時不提供任何幫助。這意味著您必須自己使用指標。

讓我們看看如何將指標從 JavaScript 傳遞到 Rust ...

{
  reconstruct_slice: {
    args: [FFIType.ptr, "usize"],
    returns: FFIType.void,
  },
}

const array = new BigInt64Array([0, 1, 3]);
// Bun automatically converts `TypedArray`s into pointers
reconstruct_slice(array, array.length);
/// Reconstruct a `slice` that was initialized in JavaScript
unsafe fn reconstruct_slice(
    array_ptr: *const i64,
    length: libc::size_t,
) -> &[i64] {
    // Even though here it's not null, it's good practice to check
    assert!(!array_ptr.is_null());
    // Unaligned pointer can lead to undefined behaviour
    assert!(array_ptr.is_aligned());
    // Check that the array doesn't "wrap around" the address space
    assert!(length < usize::MAX / 4);
    let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) };
}

...以及如何將指標從 Rust 返回 JavaScript。

{
  allocate_buffer: {
    args: [],
    returns: FFIType.ptr,
  },
  as_pointer: {
    args: ["usize"],
    returns: FFIType.ptr,
  },
}

// Hardcoding this value for 64-bit systems
const BYTES_IN_PTR = 8;

const box: Pointer = allocate_buffer()!;
const ptr: number = read.ptr(box);
// Reading the value next to `ptr`
const length: number = read.ptr(box, BYTES_IN_PTR);
// Hardcoding `byteOffset` to be 0 because Rust guarantees that
// Buffer holds `i32` values which take 4 bytes
// Note how we need to call a no-op function `as_pointer` because
// `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number`
const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
#[no_mangle]
pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> {
    let buffer: Vec<i32> = vec![0; 10];
    let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer);
    let ptr: *const i32 = memory.as_ptr();
    let length: usize = memory.len();
    // Unlike a `Vec`, `Box` is FFI compatible and will not drop
    // its data when crossing the FFI
    // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer
    Box::new([ptr as usize, length])
}

#[no_mangle]
pub const extern "C" fn as_pointer(ptr: usize) -> usize {
    ptr
}

Rust 不知道 JS 正在獲取另一端資料的所有權,因此您必須明確告訴它不要使用 ManuallyDrop 釋放堆上的資料。其他管理記憶體的語言也必須做類似的事情。

記憶體管理

正如我們所看到的,在 JS 和 Rust 中都可以分配內存,並且都不能安全地管理其他內存。

讓我們選擇應該在何處分配記憶體以及如何分配記憶體。

在 Rust 中分配

有 3 種方法可以將記憶體清理從 JS 委託給 Rust,每種方法都有其優點和缺點。

使用 FinalizationRegistry

透過追蹤 JavaScript 中的對象,使用 FinalizationRegistry 在垃圾回收期間請求清理回呼。

import {
  dlopen,
  FFIType,
  read,
  suffix,
  toArrayBuffer,
  type Pointer,
} from "bun:ffi";

// Both your script and your library don't typically change their locations
// Use `import.meta.dirname` to make your script independent from the cwd
const DLL_PATH =
  import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`;

function main() {
  // Deconstruct object to get functions
  // but collect `close` method into object
  // to avoid using `this` in a wrong scope
  const {
    symbols: { do_work },
    ...dll
  } = dlopen(DLL_PATH, {
    do_work: {
      args: [FFIType.ptr, FFIType.ptr, "usize", "usize"],
      returns: FFIType.void,
    },
  });

  /* ... */

  // It is unclear whether it is required or recommended to call `close`
  // an example says `JSCallback` instances specifically need to be closed
  // Note that using `symbols` after calling `close` is undefined behaviour
  dll.close();
}

main();
{
  reconstruct_slice: {
    args: [FFIType.ptr, "usize"],
    returns: FFIType.void,
  },
}

const array = new BigInt64Array([0, 1, 3]);
// Bun automatically converts `TypedArray`s into pointers
reconstruct_slice(array, array.length);
優點
  • 很簡單
缺點
  • 垃圾收集是特定於引擎的且具有不確定性
  • 根本不保證呼叫清理回調

使用toArrayBuffer的finalizationCallback參數

將垃圾收集追蹤委託給bun以呼叫清理回調。
當向 toArrayBuffer 傳遞 4 個參數時,第 4 個參數必須是要在清理時呼叫的 C 函數。
但是,當傳遞 5 個參數時,第 5 個參數是函數,第 4 個參數必須是傳遞它的上下文指標。

/// Reconstruct a `slice` that was initialized in JavaScript
unsafe fn reconstruct_slice(
    array_ptr: *const i64,
    length: libc::size_t,
) -> &[i64] {
    // Even though here it's not null, it's good practice to check
    assert!(!array_ptr.is_null());
    // Unaligned pointer can lead to undefined behaviour
    assert!(array_ptr.is_aligned());
    // Check that the array doesn't "wrap around" the address space
    assert!(length < usize::MAX / 4);
    let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) };
}
{
  allocate_buffer: {
    args: [],
    returns: FFIType.ptr,
  },
  as_pointer: {
    args: ["usize"],
    returns: FFIType.ptr,
  },
}

// Hardcoding this value for 64-bit systems
const BYTES_IN_PTR = 8;

const box: Pointer = allocate_buffer()!;
const ptr: number = read.ptr(box);
// Reading the value next to `ptr`
const length: number = read.ptr(box, BYTES_IN_PTR);
// Hardcoding `byteOffset` to be 0 because Rust guarantees that
// Buffer holds `i32` values which take 4 bytes
// Note how we need to call a no-op function `as_pointer` because
// `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number`
const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
優點
  • JavaScript 中的委託邏輯
缺點
  • 大量樣板檔案和記憶體洩漏的機會
  • 缺少 toArrayBuffer 的型別註釋
  • 垃圾收集是特定於引擎的且具有不確定性
  • 根本不保證呼叫清理回調

手動管理記憶體

當你不再需要記憶體時,自己刪除它即可。
幸運的是,TypeScript 有一個非常有用的 Disposable 介面和 using 關鍵字。
它相當於 Python 的 with 或 C# 的 using 關鍵字。

查看文件

  • TypeScript 5.2 變更日誌
  • 拉取請求以使用
#[no_mangle]
pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> {
    let buffer: Vec<i32> = vec![0; 10];
    let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer);
    let ptr: *const i32 = memory.as_ptr();
    let length: usize = memory.len();
    // Unlike a `Vec`, `Box` is FFI compatible and will not drop
    // its data when crossing the FFI
    // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer
    Box::new([ptr as usize, length])
}

#[no_mangle]
pub const extern "C" fn as_pointer(ptr: usize) -> usize {
    ptr
}
{
  drop_buffer: {
    args: [FFIType.ptr],
    returns: FFIType.void,
  },
}

const registry = new FinalizationRegistry((box: Pointer): void => {
  drop_buffer(box);
});
registry.register(buffer, box);
優點
  • 清理工作保證運作
  • 您可以控制何時刪除記憶體
缺點
  • 一次性介面的樣板物件
  • 手動移除記憶體比使用垃圾收集器慢
  • 如果您想放棄緩衝區的所有權,您必須製作副本並刪除原始

在JS中分配

這更簡單、更安全,因為系統會為您處理取消分配。

但是,有一個很大的缺點。
由於您無法在 Rust 中管理 JavaScript 的內存,因此您無法超過緩衝區的容量,因為這會導致釋放。這意味著在將緩衝區大小傳遞給 Rust 之前,您必須知道緩衝區大小。
事先不知道需要多少緩衝區也會產生大量開銷,因為您將透過 FFI 來回進行分配。

/// # Safety
///
/// This call assumes neither the box nor the buffer have been mutated in JS
#[no_mangle]
pub unsafe extern "C" fn drop_buffer(raw: *mut [usize; 2]) {
    let box_: Box<[usize; 2]> = unsafe { Box::from_raw(raw) };
    let ptr: *mut i32 = box_[0] as *mut i32;
    let length: usize = box_[1];
    let buffer: Vec<i32> = unsafe { Vec::from_raw_parts(ptr, length, length) };
    drop(buffer);
}
{
  box_value: {
    args: ["usize"],
    returns: FFIType.ptr,
  },
  drop_box: {
    args: [FFIType.ptr],
    returns: FFIType.void,
  },
  drop_buffer: {
    args: [FFIType.ptr, FFIType.ptr],
    returns: FFIType.void,
  },
}

// Bun expects the context to specifically be a pointer
const finalizationCtx: Pointer = box_value(length)!;

// Note that despite the presence of these extra parameters in the docs,
// they're absent from `@types/bun`
//@ts-expect-error see above
const buffer = toArrayBuffer(
  as_pointer(ptr)!,
  0,
  length * 4,
  //@ts-expect-error see above
  finalizationCtx,
  drop_buffer,
);
// Don't leak the box used to pass buffer through FFI
drop_box(box);

關於字串的旁注

如果您期望庫的輸出是一個字串,您可能已經考慮過返回 u16 向量而不是字串的微優化,因為通常 JavaScript 引擎在底層使用 UTF-16。

但是,這將是一個錯誤,因為將字串轉換為 C 字串並使用 Bun 的 cstring 類型會稍微快一些。
這是使用一個不錯的基準測試庫 mitata 完成的基準測試

import {
  dlopen,
  FFIType,
  read,
  suffix,
  toArrayBuffer,
  type Pointer,
} from "bun:ffi";

// Both your script and your library don't typically change their locations
// Use `import.meta.dirname` to make your script independent from the cwd
const DLL_PATH =
  import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`;

function main() {
  // Deconstruct object to get functions
  // but collect `close` method into object
  // to avoid using `this` in a wrong scope
  const {
    symbols: { do_work },
    ...dll
  } = dlopen(DLL_PATH, {
    do_work: {
      args: [FFIType.ptr, FFIType.ptr, "usize", "usize"],
      returns: FFIType.void,
    },
  });

  /* ... */

  // It is unclear whether it is required or recommended to call `close`
  // an example says `JSCallback` instances specifically need to be closed
  // Note that using `symbols` after calling `close` is undefined behaviour
  dll.close();
}

main();
{
  reconstruct_slice: {
    args: [FFIType.ptr, "usize"],
    returns: FFIType.void,
  },
}

const array = new BigInt64Array([0, 1, 3]);
// Bun automatically converts `TypedArray`s into pointers
reconstruct_slice(array, array.length);
/// Reconstruct a `slice` that was initialized in JavaScript
unsafe fn reconstruct_slice(
    array_ptr: *const i64,
    length: libc::size_t,
) -> &[i64] {
    // Even though here it's not null, it's good practice to check
    assert!(!array_ptr.is_null());
    // Unaligned pointer can lead to undefined behaviour
    assert!(array_ptr.is_aligned());
    // Check that the array doesn't "wrap around" the address space
    assert!(length < usize::MAX / 4);
    let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) };
}

WebAssembly 怎麼樣?

是時候解決WebAssembly這個房間裡的大象了。
您是否應該選擇現有的 WASM 綁定而不是處理 C ABI?

答案是可能都不是

它真的值得嗎?

將另一種語言引入到您的程式碼庫中需要的不僅僅是一個瓶頸,在 DX 方面和效能方面都是值得的。

這是 JS、WASM 和 Rust 中簡單範圍函數的基準。

{
  allocate_buffer: {
    args: [],
    returns: FFIType.ptr,
  },
  as_pointer: {
    args: ["usize"],
    returns: FFIType.ptr,
  },
}

// Hardcoding this value for 64-bit systems
const BYTES_IN_PTR = 8;

const box: Pointer = allocate_buffer()!;
const ptr: number = read.ptr(box);
// Reading the value next to `ptr`
const length: number = read.ptr(box, BYTES_IN_PTR);
// Hardcoding `byteOffset` to be 0 because Rust guarantees that
// Buffer holds `i32` values which take 4 bytes
// Note how we need to call a no-op function `as_pointer` because
// `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number`
const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
#[no_mangle]
pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> {
    let buffer: Vec<i32> = vec![0; 10];
    let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer);
    let ptr: *const i32 = memory.as_ptr();
    let length: usize = memory.len();
    // Unlike a `Vec`, `Box` is FFI compatible and will not drop
    // its data when crossing the FFI
    // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer
    Box::new([ptr as usize, length])
}

#[no_mangle]
pub const extern "C" fn as_pointer(ptr: usize) -> usize {
    ptr
}
{
  drop_buffer: {
    args: [FFIType.ptr],
    returns: FFIType.void,
  },
}

const registry = new FinalizationRegistry((box: Pointer): void => {
  drop_buffer(box);
});
registry.register(buffer, box);

原生函式庫勉強擊敗了 WASM,並且一直輸給純 TypeScript 實作。

這就是本關於 Bun:ffi 模組的教學/探索。希望我們都已經擺脫了這個問題,並受到了更多的教育。
歡迎在評論中分享想法和問題

以上是如何以及應該使用 Bun FFI的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述:
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn