Skip to content

[typed_data] Add Float16List + [vm/ffi] Add Float16 #56319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rainyl opened this issue Jul 25, 2024 · 16 comments
Open

[typed_data] Add Float16List + [vm/ffi] Add Float16 #56319

rainyl opened this issue Jul 25, 2024 · 16 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. contributions-welcome Contributions welcome to help resolve this (the resolution is expected to be clear from the issue) library-ffi triaged Issue has been triaged by sub team type-enhancement A request for a change that isn't a bug

Comments

@rainyl
Copy link

rainyl commented Jul 25, 2024

Currently, only Float (Float32) and Double (Float64) are introduced in dart:ffi.
However, the application of Float16 is becoming more and more widespread, especially for AI-releated computations, and it is very inconvenient when interacting to native libraries that supports Float16, developers have to access the fp16 pointers or values using Uint16 and write the convension menthods by themselves, even so, some methods like Uint8List.view() are not possible for fp16 if developers want to return a float16 view instead of a copy.

I have read #52250 and #51994, but both of them are talking about more specific primitive types for dart lang, however this issue just for dart:ffi.

@dart-github-bot
Copy link
Collaborator

Summary: This issue proposes adding Float16 support to dart:ffi to enable direct interaction with native libraries that use Float16, improving efficiency and convenience for developers working with AI-related computations.

@dart-github-bot dart-github-bot added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. triage-automation See https://github.com/dart-lang/ecosystem/tree/main/pkgs/sdk_triage_bot. type-enhancement A request for a change that isn't a bug labels Jul 25, 2024
@lrhn lrhn added library-ffi and removed triage-automation See https://github.com/dart-lang/ecosystem/tree/main/pkgs/sdk_triage_bot. labels Jul 25, 2024
@lrhn
Copy link
Member

lrhn commented Jul 25, 2024

It's also technically possible to support a Float16List in dart:typed_data, but if there are no operations to convert single 16-bit floats to double (a very quick check suggests that to be the case at least for intel/AMD CPUs), reading and writing would likely not be as efficient as expected. (A "clever" implementation may convert a number of values at a time, and cache the results, so consecutive reading can be optimized. Writing is harder.)

@rainyl
Copy link
Author

rainyl commented Jul 25, 2024

What about just make float16List as an alias of Uint16List, which means a float16List is actually stored as Uint16List, but convert to dart double when getting values and convert to Uint16 when setting values?

@dcharkes
Copy link
Contributor

but if there are no operations to convert single 16-bit floats to double (a very quick check suggests that to be the case at least for intel/AMD CPUs), reading and writing would likely not be as efficient as expected.

I did find this:

I'm not sure if using machine instructions from float16->float32->float64 is slower or faster than going from float16->float64 manually.

If users need this and are going to manually use slow conversions to doubles anyway, then we might as well make their lives easier and add it in dart:ffi and dart:typed_data.

@rainyl do you want to use the float16's as doubles in Dart? or are your use cases only about efficiently shuffling bytes around?

What about just make float16List as an alias of Uint16List, which means a float16List is actually stored as Uint16List, but convert to dart double when getting values and convert to Uint16 when setting values?

Any XXXList is stored as bytes and only converted when reading/writing values! 😄

If you just want to shuffle bytes around, you don't want to do it via something that requires conversions when reading writing. So you'd want to use setRange on a TypedData with another TypedData that has the same element type, so that it can be a memcpy.

If we add Half as a valid float NativeType, then we also need to implement it in FFI calls. It looks like for not all ABIs this is well-defined:

So it might be tricky to fully add Half everywhere in dart:ffi. (Though I'd definitely be open to someone trying.)

cc @mraleph @mkustermann @rmacnak-google

@dcharkes dcharkes changed the title [Enhancement] Add Float16 to dart:ffi [vm/ffi] Add Float16/Half Jul 25, 2024
@rainyl
Copy link
Author

rainyl commented Jul 25, 2024

do you want to use the float16's as doubles in Dart? or are your use cases only about efficiently shuffling bytes around?

Both, the ideal use cases are very similar to other native types, but the most important for my project now is creating a view of Float16List and providing a proper way to get/set values. Now I can regard Uint16List as Float16List, but it's not elegant if users want to get/set values.

// Currently I interact with native float16 using:
final ffi.Pointer<ffi.Uint16> ptr = ...;
final Uint16List view = ptr.asTypedList(length);
// Without Float16List, users have to set/get values via:
final double val = fp16_int_to_double(view[0]);
view[0] = fp16_double_to_int(val);
// `fp16_int_to_double` and `fp16_double_to_int` are implemented referring to https://github.com/opencv/opencv/blob/71d3237a093b60a27601c20e9ee6c3e52154e8b1/modules/core/include/opencv2/core/cvdef.h#L828-L917

// It will be user-friendly if users can set/get using dart double directly, maybe some thing like:
final ffi.Pointer<ffi.Float16> ptr = ...;
final Float16List view = ptr.asTypedList(length);
// With Float16List, users can set/get values via:
final double val = view[0];
view[0] = val;

Any XXXList is stored as bytes and only converted when reading/writing values! 😄

Sounds like easy to implement the above operations for Float16List, good news.

If we add Half as a valid float NativeType, then we also need to implement it in FFI calls. It looks like for not all ABIs this is well-defined:

Yes, but maybe the implementation of opencv can be a reference? It defined a hfloat and use it's own implementation if __fp16 is not defined, otherwise use __fp16 https://github.com/opencv/opencv/blob/71d3237a093b60a27601c20e9ee6c3e52154e8b1/modules/core/include/opencv2/core/cvdef.h#L384-L399

@dcharkes
Copy link
Contributor

final ffi.Pointer<ffi.Float16> ptr = ...;
final Float16List view = ptr.asTypedList(length);
// With Float16List, users can set/get values via:
final double val = view[0];
view[0] = val;

Happy to receive a PR for this!

Should it be Half or Float16? We call the other thing Double. 😄

A PR for only this should add errors on Halfs in FFI calls and callbacks.

Yes, but maybe the implementation of opencv can be a reference? It defined a hfloat and use it's own implementation if __fp16 is not defined, otherwise use __fp16

ushort

hehe so it's an uint16 if it's not available.

Well, Dart is not compiled at the same time as your library that uses open-cv, so we risk compiling with different flags which will lead to segfaults. On the other hand, we also assume SoftFP on Android arm32 and hard fp on arm32 Linux. Technically there can be Androids out there with hardfp and linuxes with softfp, but we've not run into them.

I'd be fine simply assuming the type is defined (except for risc-v).

I'm also open for getting a PR for adding this. This PR will be much more involved, as it includes getting the calling conventions right.

If you want to work on these PRs I can provide pointers for where to start.

@dcharkes dcharkes added the contributions-welcome Contributions welcome to help resolve this (the resolution is expected to be clear from the issue) label Jul 25, 2024
@Wdestroier
Copy link

I would like to suggest Float16 instead of Half.

@rainyl
Copy link
Author

rainyl commented Jul 26, 2024

Should it be Half or Float16? We call the other thing Double. 😄

Same as @Wdestroier , I like Float16 too.

If you want to work on these PRs I can provide pointers for where to start.

Sure, I am willing to work on this when having some free time, so could you please provide some instructions? So that other developers can work on this too. 😄

@dcharkes
Copy link
Contributor

@sigmundch @mkustermann can dart:typed_data Float16List be properly supported on dart2js and dart2wasm? (We can of course always fall back to an implementation that does the bit-shuffling in Dart, but that might not be desirable for performance reasons.)

For adding Float16List:

  • Float16List can be added in sdk/lib/typed_data/typed_data.dart
  • runtime/vm/class_id.h needs to get an entry in CLASS_LIST_TYPED_DATA
  • The implementation can be added in sdk/lib/_internal/vm/lib/typed_data_patch.dart
    • We'll need to add the 16 bit variant of external double _getFloat32(int offsetInBytes); and the setter.
    • An optimized implementation that targets machine code instructions for float16 conversions will need to be an external function in Dart that is recognized in the Dart compiler.
      • runtime/vm/compiler/recognized_methods_list.h TypedList_GetFloat32
      • runtime/vm/compiler/frontend/kernel_to_il.cc that's where these recognized methods are implemented via FlowGraphBuilder::BuildTypedListGet
        • This will need to generate IL that will generate machine code in the files such as runtime/vm/compiler/backend/il_x64.cc with EmitNativeCode.
    • A non-optimized implementation will instead of having an external function in the patch file, just load 16 bits via a Uint16List and do the conversion in Dart. This is probably an easier implementation to start with and will perform considerably worse. But it's maybe worth doing that as the first PR before diving in to generating machine code.
  • A benchmark can be added in benchmarks/TypedData/

For adding support for Pointer<Float16>, Array<Float16> and Float16s in structs/unions; and error messages on using Float16 in FFI calls and callbacks:

  • The type should be added in sdk/lib/ffi/native_type.dart
  • The public API as extension types in sdk/lib/ffi/ffi.dart
    • Most of the API is generated by runtime/tools/ffi/sdk_lib_ffi_generator.dart
  • The implementation in CFE transforms pkg/vm/lib/modular/transformations/ffi/*.dart
    • pkg/vm/lib/modular/transformations/ffi/common.dart in all the consts
    • pkg/vm/lib/modular/transformations/ffi/native_type_cfe.dart to reason about size (2 bytes) and alignment (?) inside structs
  • DItto in the backend to reason about size and alignment runtime/vm/compiler/ffi/native_type.cc
  • Error messages for the analyzer are added in pkg/analyzer/lib/src/generated/ffi_verifier.dart
  • Error messages for the CFE are added in pkg/vm/lib/modular/transformations/ffi/use_sites.dart
  • Tests
    • For the error messages for analyzer and CFE: Add a file next to tests/ffi/static_checks/vmspecific_static_checks_test.dart in the style of that file
    • For positive tests using Float16 inside structs add a float16 type to tests/ffi/generator/c_types.dart and some tests with structs to tests/ffi/generator/structs_by_value_tests_configuration.dart -> try to especially cover cases where alignment could be unexpected. For example a struct with first a uint8_t and then a float16.
    • For positive tests using asTypedList you can add a new test in tests/ffi/.

If the rest of the Dart team is in favor of adding this, my suggestion would be to split this work up in multiple PRs:

  1. Adding an unoptimized Float16List
  2. Optimizing the Float16List get and set with recognized methods that target assembly instructions for float16 conversions
  3. Adding support for Pointer<Float16> and Float16 inside structs (but rejecting Float16 as FFI call/callback arguments and return value)
  4. Adding support for Float16 as FFI call/callback arguments and return value. (I can provide pointers on how to do that later.)

@dcharkes dcharkes changed the title [vm/ffi] Add Float16/Half [typed_data] add Float16List + [vm/ffi] Add Float16 Jul 26, 2024
@dcharkes dcharkes changed the title [typed_data] add Float16List + [vm/ffi] Add Float16 [typed_data] Add Float16List + [vm/ffi] Add Float16 Jul 26, 2024
@lrhn
Copy link
Member

lrhn commented Jul 26, 2024

I don't think a Float16List can be efficient in JavaScript, probably not in Wasm either if it is not a built-in type.
And even on native, the smallest x64 operation performs four parallel conversions, not just a single one.
That means bit-shuffling in JS and Wasm, possibly on native too.
Reading is fairly simple, it's one sign bit, 5 bit exponent, 10 bit mantissa. A 64 entry lookup table for the exponent + sign will probably work.
Writing worries me more. Bit-fiddling on doubles requires first getting the bits of the double, which Dart doesn't support directly. Then it needs some rounding rules. The input is bigger than for reading, so a table isn't useful.
Native will almost certainly use the SIMD operation for each value. Everybody else will have to do something more expensive.

I'm not sure bad support is better than no support.

(We'll probably also want a Float16x8 type and list of those.)

@dcharkes
Copy link
Contributor

(We'll probably also want a Float16x8 type and list of those.)

➕ I was thinking about that too.

@mkustermann
Copy link
Member

AI/ML models can use different 16-bit floating point number formats, most commonly IEEE float16 and bfloat16 (which has more exponent bits). I

So if the reason is AI/ML it would make sense to extend the discussion to be
=> dart:ffi: Pointer<Float16> & Pointer<BFloat16>
=> dart:typed_data: Float16List & BFloat16List

Those two are somewhat separate and can be discussed separately (e.g. we support Pointer<Bool> in dart:ffi without having an equivalent list type in dart:typed_data).

For dart:typed_data it may be tricky as JavaScript doesn't have equivalent typed arrays and dart2js would dynamically need to keep track of the type (which may be problematic, see e.g. recent deprecation & removal of UnmodifiableUint8List/... classes). @rakudrama wdyt?

For dart:ffi we'd need to think to what extend we want to support it: Allowing it indirectly via Pointer with appropriate double operator [](int index) void operator[]=(int index, double value) is probably the most common use and uncontroversial. Though allowing them as Struct members or primtivies is more tricky as we'd need to have ABI support and it's not part of standard C and it seems some ABIs may not support it.
=> The only real use may? be via Pointer<> usage, so we could restrict it's usage to that
=> Our compiler would then generate very efficient code for the conversion to/from double

But if the only use is via Pointer, we have to think whether it's actually needed to have this support as part of dart:ffi. Let's say we model this as extension types in a helper package (e.g. in package:ffi/bfloat16.dart):

import 'dart:ffi';

extension type BFloat16P(Pointer<Uint16> pointer) {
  double operator [](int index) {
    final int value = pointer.value;
    // ... code to bfloat16->double ... (XXX)
    return convertedValue;
  }

  void operator []=(int index, double value) {
    // ... code to double->bfloat16 ... (XXX)
    return convertedValue;
  }

  BFloat16List asTypedList(int length) => BFloat16List(this, length);
}

class BFloat16List implements List<double> {
  final BFloat16P pointer;
  final int length;

  BFloat16List(this.pointer, this.length);

  double operator [](int index) => pointer[index];

  void setRange(...) {
    // Would e.g. delegate to already optimized `pointer.asTypedList().setRange()`
  }
}

And then users can use it via

@Native<Pointer<Uint16> Function()>()
external Pointer<Uint16> getTensor();

main() {
  final BFloat16List tensor = BFloat16P(getTensor()).asTypedList(64);
  for (int i = 0; i < tensor.length; ++i) {
    print(tensor[i]);
  }
}

or if we allow convenience usage of extension types in FFI:

@Native<BFloat16P Function()>()
external BFloat16P getTensor();

main() {
  final BFloat16List tensor = getTensor().asTypedList(64);
  for (int i = 0; i < tensor.length; ++i) {
    print(tensor[i]);
  }
}

We could ensure the conversion code in (XXX) is written in a way that allows our compilers to generate very efficient code for it (possibly even recognizing the specific conversion pattern & optimizing via built-in HW support).

@rainyl Would your use case be solved by this?

@dcharkes
Copy link
Contributor

or if we allow convenience usage of extension types in FFI:

👍 Tracked in:

=> Our compiler would then generate very efficient code for the conversion to/from double

It would be even more efficient with Float16x8. But so far we've been only doing that for Float32x4 via typed_data. So if we wanted to allow that and not add it to typed data we should maybe consider having such Float16x8 in dart:ffi? (But I guess no support for BFloat16x8, I haven't seen any assembly instructions tailored to that yet.)

I'd be cautious adding Float16Pointer as an extension type in for example package:ffi if we would consider adding Float16x8 later in the Dart SDK. Moving types between a package and dart: libs is next to impossible.

@rainyl
Copy link
Author

rainyl commented Jul 29, 2024

Would your use case be solved by this?

Yes, I am working on opencv bindings for dart, so I have to get a view of the pixel values at (x, y) to read and change the values, I believe your method will work.

@rmacnak-google
Copy link
Contributor

@dcharkes RISC-V has a ratified extension, Zfh, but the major Linux distributions don't include it in their baseline. AFAIK, Android and Fuchsia haven't chosen their baseline yet, but Zfhmin is part of the RVA22 profile, so I expect they will include it.

@a-siva a-siva added the triaged Issue has been triaged by sub team label Jul 31, 2024
@mkustermann
Copy link
Member

Chrome intends to add Float16Array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. contributions-welcome Contributions welcome to help resolve this (the resolution is expected to be clear from the issue) library-ffi triaged Issue has been triaged by sub team type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

8 participants