Skip to content

[Idea]: add support for working with arrays backed by memory-mapped files #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks done
kgryte opened this issue Feb 15, 2025 · 0 comments
Open
4 tasks done
Labels
difficulty: 5 Likely to be difficult to implement with several unknowns. idea Potential GSoC project idea. priority: normal Normal priority. tech: c Involves programming in C. tech: javascript Involves programming in JavaScript. tech: native addons Involves developing Node.js native add-ons. tech: nodejs Requires developing with Node.js.

Comments

@kgryte
Copy link
Member

kgryte commented Feb 15, 2025

Idea

Memory-mapped files allow accessing small segments of large disks stored on disk, without reading the entire file into memory. Not only can this be advantageous for memory performance, but it also facilitates shared memory between processes (e.g., operating on the same array in both Node.js and Python running in two separate processes).

The goal of this project is to add support for working with typed arrays backed by memory-mapped files. Memory-mapped-backed typed arrays should support all the APIs of built-in typed arrays, with the exceptions that the constructors will need to support mmap-related arguments (e.g., filename, mode, offset) and indexing will require accessors, not square bracket syntax. The project is well-prepared to support accessors (see array/bool, array/complex128, etc), such that, provided a memory-mapped typed array supports the accessor protocol, passing to downstream utilities should just work.

Similar to how we've approached fixed-endian typed arrays (see array/fixed-endian-factory), we can likely create a package exposing a constructor factory and then create lightweight wrappers for type-specific constructors (e.g., array/little-endian-float64).

This project may require figuring out a strategy for C-JS iterop which can be used across constructors.

Expected outcomes

Ideally, we would have the following constructors:

  • Float64ArrayMMap
  • Float32ArrayMMap
  • Int32ArrayMMap
  • Int16ArrayMMap
  • Int8ArrayMMap
  • Uint32ArrayMMap
  • Uint16ArrayMMap
  • Uint8ArrayMMap
  • Uint8ClampedArrayMMap
  • BooleanArrayMMap
  • Complex128ArrayMMap
  • Complex64ArrayMMap

Additionally, the following constructors would also be useful:

  • DataViewMMap

Status

None.

Involved software

C compiler such as GCC or Clang.

Technology

C, JavaScript, nodejs, native addons

Other technology

None

Difficulty

5

Difficulty justification

Figuring out an effective bridge between JavaScript and C for working with memory-mapped files will likely require some R&D. It is not clear whether we'd need to first develop separate dedicated mmap(2)-like functionality in JavaScript or whether we can directly interface into C. Once the lower-level details are determined, the next steps will be implementing all the user-facing APIs expected from typed arrays. This should be straightforward; however, there may be some unexpected challenges and constraints surrounding read-only access, etc.

Prerequisite knowledge

C, JavaScript, and Node.js experience will be useful.

Project length

350

Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • The issue name begins with [Idea]: and succinctly describes your idea.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.
@kgryte kgryte added difficulty: 5 Likely to be difficult to implement with several unknowns. idea Potential GSoC project idea. priority: normal Normal priority. tech: c Involves programming in C. tech: javascript Involves programming in JavaScript. tech: native addons Involves developing Node.js native add-ons. tech: nodejs Requires developing with Node.js. labels Feb 15, 2025
@stdlib-js stdlib-js locked and limited conversation to collaborators Feb 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
difficulty: 5 Likely to be difficult to implement with several unknowns. idea Potential GSoC project idea. priority: normal Normal priority. tech: c Involves programming in C. tech: javascript Involves programming in JavaScript. tech: native addons Involves developing Node.js native add-ons. tech: nodejs Requires developing with Node.js.
Projects
None yet
Development

No branches or pull requests

1 participant