首页  >  文章  >  web前端  >  如何以及应该使用 Bun FFI

如何以及应该使用 Bun FFI

Linda Hamilton
Linda Hamilton原创
2024-11-11 10:53:02770浏览

How to and Should you use Bun FFI

我们想要实现什么目标

假设您有一个在 Bun 中运行的 JavaScript 应用程序,并且您已经确定了一些想要优化的瓶颈。
用性能更高的语言重写它可能正是您需要的解决方案。

作为现代 JS 运行时,Bun 支持外部函数接口 (FFI) 来调用用其他支持公开 C ABI 的语言编写的库,例如 C、C、Rust 和 Zig。

在这篇文章中,我们将讨论如何使用它,并得出结论是否可以从中受益。

如何将库链接到 JavaScript

这个例子使用 Rust。使用 C 绑定创建共享库在其他语言中看起来有所不同,但想法保持不变。

从JS端

Bun 通过 Bun:ffi 模块公开其 FFI API。

入口点是一个 dlopen 函数。它采用绝对路径或相对于当前工作目录的路径到库文件(对于Linux,扩展名为.so,对于macOS,扩展名为.dylib,对于Windows,扩展名为.dll)和一个对象您要导入的函数的签名。
它返回一个带有 close 方法的对象,当不再需要库时,您可以使用该方法关闭该库;它还返回一个包含您选择的函数的对象符号属性。

import {
  dlopen,
  FFIType,
  read,
  suffix,
  toArrayBuffer,
  type Pointer,
} from "bun:ffi";

// Both your script and your library don't typically change their locations
// Use `import.meta.dirname` to make your script independent from the cwd
const DLL_PATH =
  import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`;

function main() {
  // Deconstruct object to get functions
  // but collect `close` method into object
  // to avoid using `this` in a wrong scope
  const {
    symbols: { do_work },
    ...dll
  } = dlopen(DLL_PATH, {
    do_work: {
      args: [FFIType.ptr, FFIType.ptr, "usize", "usize"],
      returns: FFIType.void,
    },
  });

  /* ... */

  // It is unclear whether it is required or recommended to call `close`
  // an example says `JSCallback` instances specifically need to be closed
  // Note that using `symbols` after calling `close` is undefined behaviour
  dll.close();
}

main();

通过 FFI 边界传递数据

正如您可能注意到的,bun 通过 FFI 接受的支持类型仅限于数字,包括指针。
值得注意的是,支持的类型列表中缺少 size_t 或 usize,尽管它的代码自 Bun 版本 1.1.34 起就已存在。

Bun 在传递比 C 字符串更复杂的数据时不提供任何帮助。这意味着您必须自己使用指针。

让我们看看如何将指针从 JavaScript 传递到 Rust ...

{
  reconstruct_slice: {
    args: [FFIType.ptr, "usize"],
    returns: FFIType.void,
  },
}

const array = new BigInt64Array([0, 1, 3]);
// Bun automatically converts `TypedArray`s into pointers
reconstruct_slice(array, array.length);
/// Reconstruct a `slice` that was initialized in JavaScript
unsafe fn reconstruct_slice(
    array_ptr: *const i64,
    length: libc::size_t,
) -> &[i64] {
    // Even though here it's not null, it's good practice to check
    assert!(!array_ptr.is_null());
    // Unaligned pointer can lead to undefined behaviour
    assert!(array_ptr.is_aligned());
    // Check that the array doesn't "wrap around" the address space
    assert!(length < usize::MAX / 4);
    let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) };
}

...以及如何将指针从 Rust 返回到 JavaScript。

{
  allocate_buffer: {
    args: [],
    returns: FFIType.ptr,
  },
  as_pointer: {
    args: ["usize"],
    returns: FFIType.ptr,
  },
}

// Hardcoding this value for 64-bit systems
const BYTES_IN_PTR = 8;

const box: Pointer = allocate_buffer()!;
const ptr: number = read.ptr(box);
// Reading the value next to `ptr`
const length: number = read.ptr(box, BYTES_IN_PTR);
// Hardcoding `byteOffset` to be 0 because Rust guarantees that
// Buffer holds `i32` values which take 4 bytes
// Note how we need to call a no-op function `as_pointer` because
// `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number`
const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
#[no_mangle]
pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> {
    let buffer: Vec<i32> = vec![0; 10];
    let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer);
    let ptr: *const i32 = memory.as_ptr();
    let length: usize = memory.len();
    // Unlike a `Vec`, `Box` is FFI compatible and will not drop
    // its data when crossing the FFI
    // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer
    Box::new([ptr as usize, length])
}

#[no_mangle]
pub const extern "C" fn as_pointer(ptr: usize) -> usize {
    ptr
}

Rust 不知道 JS 正在获取另一端数据的所有权,因此您必须明确告诉它不要使用 ManuallyDrop 释放堆上的数据。其他管理内存的语言也必须做类似的事情。

内存管理

正如我们所看到的,在 JS 和 Rust 中都可以分配内存,并且都不能安全地管理其他内存。

让我们选择应该在何处分配内存以及如何分配内存。

在 Rust 中分配

有 3 种方法可以将内存清理从 JS 委托给 Rust,每种方法都有其优点和缺点。

使用 FinalizationRegistry

通过跟踪 JavaScript 中的对象,使用 FinalizationRegistry 在垃圾回收期间请求清理回调。

import {
  dlopen,
  FFIType,
  read,
  suffix,
  toArrayBuffer,
  type Pointer,
} from "bun:ffi";

// Both your script and your library don't typically change their locations
// Use `import.meta.dirname` to make your script independent from the cwd
const DLL_PATH =
  import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`;

function main() {
  // Deconstruct object to get functions
  // but collect `close` method into object
  // to avoid using `this` in a wrong scope
  const {
    symbols: { do_work },
    ...dll
  } = dlopen(DLL_PATH, {
    do_work: {
      args: [FFIType.ptr, FFIType.ptr, "usize", "usize"],
      returns: FFIType.void,
    },
  });

  /* ... */

  // It is unclear whether it is required or recommended to call `close`
  // an example says `JSCallback` instances specifically need to be closed
  // Note that using `symbols` after calling `close` is undefined behaviour
  dll.close();
}

main();
{
  reconstruct_slice: {
    args: [FFIType.ptr, "usize"],
    returns: FFIType.void,
  },
}

const array = new BigInt64Array([0, 1, 3]);
// Bun automatically converts `TypedArray`s into pointers
reconstruct_slice(array, array.length);
优点
  • 很简单
缺点
  • 垃圾收集是特定于引擎的且具有不确定性
  • 根本不保证调用清理回调

使用toArrayBuffer的finalizationCallback参数

将垃圾收集跟踪委托给bun以调用清理回调。
当向 toArrayBuffer 传递 4 个参数时,第 4 个参数必须是要在清理时调用的 C 函数。
但是,当传递 5 个参数时,第 5 个参数是函数,第 4 个参数必须是传递它的上下文指针。

/// Reconstruct a `slice` that was initialized in JavaScript
unsafe fn reconstruct_slice(
    array_ptr: *const i64,
    length: libc::size_t,
) -> &[i64] {
    // Even though here it's not null, it's good practice to check
    assert!(!array_ptr.is_null());
    // Unaligned pointer can lead to undefined behaviour
    assert!(array_ptr.is_aligned());
    // Check that the array doesn't "wrap around" the address space
    assert!(length < usize::MAX / 4);
    let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) };
}
{
  allocate_buffer: {
    args: [],
    returns: FFIType.ptr,
  },
  as_pointer: {
    args: ["usize"],
    returns: FFIType.ptr,
  },
}

// Hardcoding this value for 64-bit systems
const BYTES_IN_PTR = 8;

const box: Pointer = allocate_buffer()!;
const ptr: number = read.ptr(box);
// Reading the value next to `ptr`
const length: number = read.ptr(box, BYTES_IN_PTR);
// Hardcoding `byteOffset` to be 0 because Rust guarantees that
// Buffer holds `i32` values which take 4 bytes
// Note how we need to call a no-op function `as_pointer` because
// `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number`
const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
优点
  • JavaScript 中的委托逻辑
缺点
  • 大量样板文件和内存泄漏的机会
  • 缺少 toArrayBuffer 的类型注释
  • 垃圾收集是特定于引擎的且具有不确定性
  • 根本不保证调用清理回调

手动管理内存

当你不再需要内存时,自己删除它即可。
幸运的是,TypeScript 有一个非常有用的 Disposable 接口和 using 关键字。
它相当于 Python 的 with 或 C# 的 using 关键字。

查看文档

  • TypeScript 5.2 变更日志
  • 拉取请求以使用
#[no_mangle]
pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> {
    let buffer: Vec<i32> = vec![0; 10];
    let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer);
    let ptr: *const i32 = memory.as_ptr();
    let length: usize = memory.len();
    // Unlike a `Vec`, `Box` is FFI compatible and will not drop
    // its data when crossing the FFI
    // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer
    Box::new([ptr as usize, length])
}

#[no_mangle]
pub const extern "C" fn as_pointer(ptr: usize) -> usize {
    ptr
}
{
  drop_buffer: {
    args: [FFIType.ptr],
    returns: FFIType.void,
  },
}

const registry = new FinalizationRegistry((box: Pointer): void => {
  drop_buffer(box);
});
registry.register(buffer, box);
优点
  • 清理工作保证运行
  • 您可以控制何时删除内存
缺点
  • 一次性接口的样板对象
  • 手动删除内存比使用垃圾收集器慢
  • 如果您想放弃缓冲区的所有权,您必须制作副本并删除原始

在JS中分配

这更简单、更安全,因为系统会为您处理取消分配。

但是,有一个很大的缺点。
由于您无法在 Rust 中管理 JavaScript 的内存,因此您无法超过缓冲区的容量,因为这会导致释放。这意味着在将缓冲区大小传递给 Rust 之前,您必须知道缓冲区大小。
事先不知道需要多少缓冲区也会产生大量开销,因为您将通过 FFI 来回进行分配。

/// # Safety
///
/// This call assumes neither the box nor the buffer have been mutated in JS
#[no_mangle]
pub unsafe extern "C" fn drop_buffer(raw: *mut [usize; 2]) {
    let box_: Box<[usize; 2]> = unsafe { Box::from_raw(raw) };
    let ptr: *mut i32 = box_[0] as *mut i32;
    let length: usize = box_[1];
    let buffer: Vec<i32> = unsafe { Vec::from_raw_parts(ptr, length, length) };
    drop(buffer);
}
{
  box_value: {
    args: ["usize"],
    returns: FFIType.ptr,
  },
  drop_box: {
    args: [FFIType.ptr],
    returns: FFIType.void,
  },
  drop_buffer: {
    args: [FFIType.ptr, FFIType.ptr],
    returns: FFIType.void,
  },
}

// Bun expects the context to specifically be a pointer
const finalizationCtx: Pointer = box_value(length)!;

// Note that despite the presence of these extra parameters in the docs,
// they're absent from `@types/bun`
//@ts-expect-error see above
const buffer = toArrayBuffer(
  as_pointer(ptr)!,
  0,
  length * 4,
  //@ts-expect-error see above
  finalizationCtx,
  drop_buffer,
);
// Don't leak the box used to pass buffer through FFI
drop_box(box);

关于字符串的旁注

如果您期望库的输出是一个字符串,您可能已经考虑过返回 u16 向量而不是字符串的微优化,因为通常 JavaScript 引擎在底层使用 UTF-16。

但是,这将是一个错误,因为将字符串转换为 C 字符串并使用 Bun 的 cstring 类型会稍微快一些。
这是使用一个不错的基准测试库 mitata 完成的基准测试

import {
  dlopen,
  FFIType,
  read,
  suffix,
  toArrayBuffer,
  type Pointer,
} from "bun:ffi";

// Both your script and your library don't typically change their locations
// Use `import.meta.dirname` to make your script independent from the cwd
const DLL_PATH =
  import.meta.dirname + `/../../rust-lib/target/release/library.${suffix}`;

function main() {
  // Deconstruct object to get functions
  // but collect `close` method into object
  // to avoid using `this` in a wrong scope
  const {
    symbols: { do_work },
    ...dll
  } = dlopen(DLL_PATH, {
    do_work: {
      args: [FFIType.ptr, FFIType.ptr, "usize", "usize"],
      returns: FFIType.void,
    },
  });

  /* ... */

  // It is unclear whether it is required or recommended to call `close`
  // an example says `JSCallback` instances specifically need to be closed
  // Note that using `symbols` after calling `close` is undefined behaviour
  dll.close();
}

main();
{
  reconstruct_slice: {
    args: [FFIType.ptr, "usize"],
    returns: FFIType.void,
  },
}

const array = new BigInt64Array([0, 1, 3]);
// Bun automatically converts `TypedArray`s into pointers
reconstruct_slice(array, array.length);
/// Reconstruct a `slice` that was initialized in JavaScript
unsafe fn reconstruct_slice(
    array_ptr: *const i64,
    length: libc::size_t,
) -> &[i64] {
    // Even though here it's not null, it's good practice to check
    assert!(!array_ptr.is_null());
    // Unaligned pointer can lead to undefined behaviour
    assert!(array_ptr.is_aligned());
    // Check that the array doesn't "wrap around" the address space
    assert!(length < usize::MAX / 4);
    let _: &[i64] = unsafe { slice::from_raw_parts(array_ptr, length) };
}

WebAssembly 怎么样?

是时候解决WebAssembly这个房间里的大象了。
您是否应该选择现有的 WASM 绑定而不是处理 C ABI?

答案是可能都不是

它真的值得吗?

将另一种语言引入到您的代码库中需要的不仅仅是一个瓶颈,在 DX 方面和性能方面都是值得的。

这是 JS、WASM 和 Rust 中简单范围函数的基准。

{
  allocate_buffer: {
    args: [],
    returns: FFIType.ptr,
  },
  as_pointer: {
    args: ["usize"],
    returns: FFIType.ptr,
  },
}

// Hardcoding this value for 64-bit systems
const BYTES_IN_PTR = 8;

const box: Pointer = allocate_buffer()!;
const ptr: number = read.ptr(box);
// Reading the value next to `ptr`
const length: number = read.ptr(box, BYTES_IN_PTR);
// Hardcoding `byteOffset` to be 0 because Rust guarantees that
// Buffer holds `i32` values which take 4 bytes
// Note how we need to call a no-op function `as_pointer` because
// `toArrayBuffer` takes a `Pointer` but `read.ptr` returns a `number`
const _buffer = toArrayBuffer(as_pointer(ptr)!, 0, length * 4);
#[no_mangle]
pub extern "C" fn allocate_buffer() -> Box<[usize; 2]> {
    let buffer: Vec<i32> = vec![0; 10];
    let memory: ManuallyDrop<Vec<i32>> = ManuallyDrop::new(buffer);
    let ptr: *const i32 = memory.as_ptr();
    let length: usize = memory.len();
    // Unlike a `Vec`, `Box` is FFI compatible and will not drop
    // its data when crossing the FFI
    // Additionally, a `Box<T>` where `T` is `Sized` will be a thin pointer
    Box::new([ptr as usize, length])
}

#[no_mangle]
pub const extern "C" fn as_pointer(ptr: usize) -> usize {
    ptr
}
{
  drop_buffer: {
    args: [FFIType.ptr],
    returns: FFIType.void,
  },
}

const registry = new FinalizationRegistry((box: Pointer): void => {
  drop_buffer(box);
});
registry.register(buffer, box);

原生库勉强击败了 WASM,并且一直输给纯 TypeScript 实现。

这就是本关于 Bun:ffi 模块的教程/探索。希望我们都已经摆脱了这个问题,并受到了更多的教育。
欢迎在评论中分享想法和问题

以上是如何以及应该使用 Bun FFI的详细内容。更多信息请关注PHP中文网其他相关文章!

声明:
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn