Skip to content

Add a raw static memory mechanism #1233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 22, 2020
Merged

Add a raw static memory mechanism #1233

merged 10 commits into from
Apr 22, 2020

Conversation

dcodeIO
Copy link
Member

@dcodeIO dcodeIO commented Apr 18, 2020

An implementation of the mechanism proposed in #1232.

Usage:

  • memory.data(size: i32, align?: i32): usize
    Gets a pointer to a zeroed static chunk of memory of the given size. Alignment defaults to 16. Arguments must be compile-time constants.
  • memory.data<T>(values: T[], align?: i32): usize
    Gets a pointer to a pre-initialized static chunk of memory. Alignment defaults to the size of T. Arguments must be compile-time constants.

fixes #1232

@dcodeIO
Copy link
Member Author

dcodeIO commented Apr 18, 2020

Alright, turned out this conflicts with exported memory if one has export { memory }. D'oh.

@dcodeIO
Copy link
Member Author

dcodeIO commented Apr 18, 2020

This is potentially breaking because I had to make the compiler refuse to export builtin functions, so this now also doesn't export memory.copy anymore if it has been polyfilled.

@jtenner
Copy link
Contributor

jtenner commented Apr 21, 2020

I would love to start using this. Any status on when this will be ready for nightly?

@dcodeIO
Copy link
Member Author

dcodeIO commented Apr 21, 2020

Last commit updates all (but one, SPECIALS_UPPER, where .length is read in String#toUpperCase) occasions of StaticArray in stdlib to use memory.data instead, which appears to work just fine and reduces not only size of static memory, but also eliminates lots of changetypes.

There's one subtle difference now, though, in that these memory segments are not guaranteed to start at an offset that is 16 byte aligned, potentially leading to inefficient loads where the data is expected to be aligned in some way that is larger than the natural alignment of their element type. Not sure if that's the case - @MaxGraey ?

@MaxGraey
Copy link
Member

16-byte alignment could be performance sensitive case, so perhaps make sense add alignment for such cases

@dcodeIO
Copy link
Member Author

dcodeIO commented Apr 22, 2020

Are there specific blobs of data that would benefit from this, or are you suggesting to align all of them to 16 byte boundaries?

@MaxGraey
Copy link
Member

MaxGraey commented Apr 22, 2020

try yo align all data to 16 bytes. It will be zero padding right? In this case binaryen should optimize it and reduce size anyway in memory packing pass

@dcodeIO
Copy link
Member Author

dcodeIO commented Apr 22, 2020

To give an example of what this means:

const PIO2_TABLE = memory.data<u64>([...]); // aligns to 8 bytes
const TAB = memory.data<u8>([...]); // does not align

and the change would be

const PIO2_TABLE = memory.data<u64>([...], 16); // explicitly align to 16 bytes
const TAB = memory.data<u8>([...], 16); // explicitly align to 16 bytes

for either reason of

  • The algorithm operating on the data makes the assumption of a specific alignment, even though the data is smaller.
  • There might be an advantage when it comes to CPU caches where aligning to 16 bytes makes it more likely to fit into the same page? How likely is that?

@MaxGraey
Copy link
Member

I see. Hmm I guess need benchmark

@dcodeIO
Copy link
Member Author

dcodeIO commented Apr 22, 2020

Perhaps, what we can do is to look at the algorithms using the data and see if these may do unaligned accesses considering the alignment indicated by the T to memory.data. Did you notice something like an alignment of 2 bytes being assumed, even though the data is given as 1 byte? One example would be that there's a table of u8s that are read via load<u16>.

@dcodeIO
Copy link
Member Author

dcodeIO commented Apr 22, 2020

  • std/math/PIO2_TABLE is u64
  • std/math/PIO2F_TABLE is u64
  • std/util/casemap/TAB is u8 and only access via load<u8>
  • std/util/casemap/RULES is i32 and only accessed via load<i32>(X + (Y << alignof<i32>()))
  • std/util/casemap/RULE_BASES is u8 and only accessed via load<u8>
  • std/util/casemap/EXCEPTIONS is u8 and only accessed via load<u8>
  • std/util/casemap/MT is i32 and only accessed via load<i32>(X + (Y << alignof<i32>()))
  • std/util/math/EXP2F_DATA_TAB is u64
  • std/util/math/LOG2F_DATA_TAB is f64
  • std/util/math/LOGF_DATA_TAB is f64
  • std/util/math/EXP_DATA_TAB is u64
  • std/util/math/LOG2_DATA_TAB1 is f64
  • std/util/math/LOG2_DATA_TAB2 is f64
  • std/util/math/LOG_DATA_TAB1 is f64
  • std/util/math/LOG_DATA_TAB2 is f64
  • std/util/math/POW_LOG_DATA_TAB is f64
  • std/util/number/POWERS10 is u32 and only accessed via load<u32>(X + (Y << alignof<u32>()))
  • std/util/number/DIGITS is u32 and only accessed via load<u32>(X + (Y << alignof<u32>()))
  • std/util/number/EXP_POWERS is i16 and only accessed via load<i16>(X + (Y << alignof<i16>()))
  • std/util/number/FRC_POWERS is u64
  • std/util/string/ALPHA_TABLE is u8 and only accessed via load<u8> in stagedBinaryLookup
  • std/util/string/CASED is u8 and only accessed via load<u8> in stagedBinaryLookup
  • std/util/string/CASE_IGNORABLES is u8 and only accessed via load<u8> in stagedBinaryLookup
  • std/util/string/LOWER127 is u8 and only accessed via load<u8>
  • std/util/string/UPPER127 is u8 and only accessed via load<u8>
  • std/util/string/POWERS10 is f64

According to this, and if I didn't miss something, all of them should do proper aligned accesses.

@MaxGraey
Copy link
Member

I see. Thanks for explanation. In this case everything should be fine. But we should keep alignment in mind when something changes in code relying to bigger than native alignment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Static memory segment idea
3 participants