Skip to content

Cleanup deps_info.py. NFC. #12824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Cleanup deps_info.py. NFC. #12824

wants to merge 1 commit into from

Conversation

sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Nov 19, 2020

  • SDL_getenv: was removed from JS in Add a compiler check for JS library aliases and signatures #12319
  • getenv: was removed from JS in Remove some WASM_BACKEND usage in src/. See #11860 #11885
  • getlogin: depends on getenv so used to require JS?
  • emscripten_log: does not use strlen
  • __ctype_b_loc: implemented in pure C
  • __ctype_tolower_loc: implemented in pure C
  • __ctype_toupper_loc: implemented in pure C
  • calloc/realloc: implemented in pure C
  • tmpnam: was removed from JS
  • freelocale/freeaddrinfo: implemented in pure C
  • newlocale: implemented in pure C
  • realpath: implemented in pure C
  • strerr: implemented in pure C
  • ttyname: implemented in pure C

For these two symbols I can see no evidence as to why they were ever added here.

  • freopen
  • syslog

@sbc100 sbc100 requested a review from kripken November 20, 2020 18:56
@kripken
Copy link
Member

kripken commented Nov 20, 2020

I don't see an explanation for ttyname?

When you write can't tell why this was ever in here, what did you do to check? I ask because it is sadly pretty hard to check these things, so I'd like to make sure what process was used.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 20, 2020

I don't see an explanation for ttyname?

When you write can't tell why this was ever in here, what did you do to check? I ask because it is sadly pretty hard to check these things, so I'd like to make sure what process was used.

Yeah it is sad that is so hard to check if/why these things are needed. I guess for each entry here we would ideally have a test that fails without it... but I don't know that we do right now.

One thing we could/shoudl probably do is add comments here (even thought its json .. so crap).. so that we know why each entry is needed.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 20, 2020

ttyname used to be implement in JS and like many others here was move to pure C and removed in d92efe0. When it was written it JS it used to depend on malloc.

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 20, 2020

My process for going through these was:

  1. Write a script the find any symbol that is listed in this file but is not implement in JS. This gives me a shortlist of possible candidates for removal. Not all symbols in this file have to be JS symbols but most of them are.. and if a symbol is listed in this file, but is not a JS symbols its likely it was moved from JS to native at some point in history.

  2. For each symbol use git log -S to figure out if was ever JS symbol. It if was, but is not longer, when its very likely it can be removed from this file. If it was never a JS symbol I find the implementation and try to figure out which of its transitive dependency end up calling into JS and why the dependency is there.

For example, I'm going to re-invesigate those three symbols for which I couldn't find any reason for them being here in the first place:

  1. tmpnam - git log -S tmpnam src/ tells me this was removed from JS in d92efe0 and used to call malloc when it was there. (updated the comment)

  2. syslog - This looks like it was never implement in JS, and what is more it not current tests at all emscripten and fails at to link (and at build time due to ntohs being missing from deps_info.json!). It was added to the deps_info.json in Stop including malloc/free by default #12213. Any idea how it ended up there?

  3. freopen was added to deps_info in ac0354b but I can see no evidence as to why

@sbc100 sbc100 force-pushed the cleanup_deps_info branch 2 times, most recently from 73d37e8 to 77eb1c6 Compare November 22, 2020 03:28
@kripken
Copy link
Member

kripken commented Nov 25, 2020

I can take another look here, but I think it needs to be rebased first?

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 26, 2020

Rebased.

@sbc100 sbc100 changed the title Cleanup deps_info.json. NFC. Cleanup deps_info.py. NFC. Nov 26, 2020
@sbc100
Copy link
Collaborator Author

sbc100 commented Dec 2, 2020

Ping.

@kripken
Copy link
Member

kripken commented Dec 4, 2020

I'm unsure about the process above. Why does being or not being in JS suffice?

The list here handles the case of

USER_CODE [=> COMPILED_LIB_CODE] => JS_LIB_CODE => OTHER_COMPILED_LIB_CODE

where COMPILED_LIB_CODE is optional. That is, we need to map the original reference in the user code, through an indirection in the compiled libraries, to JS library code that needs some other compiled lib code. This is really tricky to figure out statically, sadly, unless I'm missing something?

@sbc100
Copy link
Collaborator Author

sbc100 commented Dec 5, 2020

I'm unsure about the process above. Why does being or not being in JS suffice?

The list here handles the case of

USER_CODE [=> COMPILED_LIB_CODE] => JS_LIB_CODE => OTHER_COMPILED_LIB_CODE

where COMPILED_LIB_CODE is optional. That is, we need to map the original reference in the user code, through an indirection in the compiled libraries, to JS library code that needs some other compiled lib code. This is really tricky to figure out statically, sadly, unless I'm missing something?

Yes you are correct. I am not saying that a symbol being in JS or not is sufficient.

I just used the heuristic of "did this function recently move from JS to native code" as a way to produce a short list for investigation. Because often then a symbol moves from JS it can be removed from this file. Not always, but often.

@sbc100
Copy link
Collaborator Author

sbc100 commented Dec 5, 2020

However, if a function is implemented purely in C and never calls out of JS, it never needs to be in this file right?

Lets take an example:

'realpath': ['malloc', 'free'],

Since realpath is implemented in pure C and does not call back out to JS, wasm-ld as full visibility of all its dependencies. If needs mallco or free wasm-ld already knows this. Its only if realpath was than going to call back out to JS that we night have some hidden dependencies right?

@kripken
Copy link
Member

kripken commented Dec 7, 2020

However, if a function is implemented purely in C and never calls out of JS, it never needs to be in this file right?

(I assume of => to?)

I don't think that's true. For example, malloc is purely in C and never calls out to JS (well, except for growth, but in a build without growth), but we still need it here. The problem is that it is called from JS, not that it calls JS.

@kripken
Copy link
Member

kripken commented Dec 7, 2020

Oh, do you mean to appear here as a key, as opposed to a value? Yes, something in pure C and all deps also in pure C should not appear here. Is that what you verified?

@sbc100
Copy link
Collaborator Author

sbc100 commented Dec 8, 2020

Oh, do you mean to appear here as a key, as opposed to a value? Yes, something in pure C and all deps also in pure C should not appear here. Is that what you verified?

Yes, sorry, my process here is trying to purge unneeded keys. I'm mostly ignoring the values.

My hypothesis is that if given key is implemented in native code and doesn't (even transitively) call out to JS then for sure it is no longer needed in this file. I think you are in agreement with that?

If a given key is implemented in native code but does transitively call out to JS than perhaps it does belong in this file.

The subset of keys that I've marked as "implemented in pure C" I believe fall into the first category and obviously don't belong in this file.

@kripken
Copy link
Member

kripken commented Dec 8, 2020

I think that's right.

Going through them, the first I am unsure of is tmpnam. It's implemented in C which does a syscall. How did you verify the syscall doesn't call back into C code to do a malloc or a free?

@sbc100
Copy link
Collaborator Author

sbc100 commented Dec 8, 2020

tmpnam looks like it call the following functions:

  1. __randname (which calls __clock_gettime)
  2. __sys_lstat64
  3. strcpy

I can't see how any of them our call out to JS an then depend on malloc and/or free.

I guess we can prove these are not needed by compiling a test program that contains nothing else:

$ cat test.c
#include <stdio.h>

int main() {
  tmpnam(0);
  return 0;
}
$ ./emcc --profiling -O2 test.c -o out.js
$ grep malloc out.wasm out.js

I will do that with each of these symbols... I think I made a mistake the result would be link error right? Not a runtime error?

@kripken
Copy link
Member

kripken commented Dec 9, 2020

My worry is __sys_lstat64 calls into FS code, which is pretty large, and has a few calls to malloc/free in there.

To be more sure, I'd also grep the output for malloc and free, as I'm not confident we error on such things. One can write _malloc() in a JS function, and that doesn't lead to a compile error. An alternative to grep could be to run --closure 1 instead of profiling, but that is less certain than a grep (as in some modes we emit a stub malloc, I think?).

@sbc100 sbc100 force-pushed the cleanup_deps_info branch 5 times, most recently from f144915 to 56bd16e Compare February 12, 2021 00:52
This test create a small test program for each key in deps_info.py.
The it checks the compiled JS output for usage of each of the
symbols declared as dependencies.  If a given symbol is not used then
the dependency deemed not to be invalid.

Adding this test means that as we evolve the code it won't ve possible
to accidentally forget to update this file.

Checking for symbols that are "used" is somewhat tricky.  Each symbol
will be defintition appear in the output JS since we are forcing the
export that those symbols.  We also don't want to match of substrings
or comments.
@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 12, 2021

Closing in favor a more rigorous approach in #13483

@sbc100 sbc100 closed this Feb 12, 2021
@sbc100 sbc100 deleted the cleanup_deps_info branch February 12, 2021 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants