-
Notifications
You must be signed in to change notification settings - Fork 12.8k
Change only updated files when using watch option. #3113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We already cache the bound ASTs during
const enum E {
Hello, World
}
var x = [E.Hello, E.World] When compiled, var x = [0, 1] If we change const enum E {
Hello, Beautiful, World
}
var x = [0, 2] I could see us caching a hash of the emit result to avoid an IO penalty but I'd have to see some numbers indicating whether this becomes a real win. Edit: I should clarify - when I say that it's difficult, I mean that it would introduce complexity in the compiler which I don't think is warranted. |
Are you trying to save the cost of compiling the file? Or just the cost of writing the file to disk? We cannot avoid the former. We possibly could avoid the latter. Though we'd need numbers to know what sort of benefit there would be. |
Thanks for your replies. IO seems to be an issue on (my) Windows, though I can't figure out why (adding exception to Windows Defender makes it better, but still slow comparing to Mac). http://stackoverflow.com/questions/30046811/why-is-compiling-typescript-on-mac-much-faster-than-on-pc On the other hand, many tools have "watch" functionality to detect file changes and do some re-releasing or restarting. I managed to make the part I can control smoother by determine compiling status, but I can't change third-party tools we use. So they are purely file changes + timeout based. Which means one more js file the compiler updated, one more js file that tool will need to re-releasing. Also, if two files are emitted not within time that's short enough, that tool might run more than once. These kind of slow down our debugging process. |
Sounds like a reasonable suggestion. We could definitely attempt to avoid writes to files if we've emitted the exact same bytes as what exists on disk. We'd likely need to still read the files in to verify if they changed or not. But at least we wouldn't be writing (and updating timestamps) for anything unchanged. |
That would definitely help, looking forward to it. |
I noticed this as well recently, and ended up using grunt-ts, which will only pass changed files to the compiler and reduces output significantly |
@scriby FYI, I made some changes to sys.ts, and replaced tsc.js file of my team as a temporary solution. function compareFileWithData(fileName: string, data: string, writeByteOrderMark: boolean): boolean {
if (!_fs.existsSync(fileName)) {
return false;
}
var buffer = _fs.readFileSync(fileName);
var len = buffer.length;
if (len >= 2 && buffer[0] === 0xFE && buffer[1] === 0xFF) {
// Big endian UTF-16 byte order mark detected. Since big endian is not supported by node.js,
// flip all byte pairs and treat as little endian.
return false;
}
if (len >= 2 && buffer[0] === 0xFF && buffer[1] === 0xFE) {
// Little endian UTF-16 byte order mark detected
return false;
}
if (len >= 3 && buffer[0] === 0xEF && buffer[1] === 0xBB && buffer[2] === 0xBF) {
// UTF-8 byte order mark detected
return writeByteOrderMark && buffer.toString("utf8", 3) == data;
}
// Default is UTF-8 with no byte order mark
return !writeByteOrderMark && buffer.toString("utf8") == data;
}
function writeFile(fileName: string, data: string, writeByteOrderMark?: boolean): void {
// If the file is exactly the same as what will be written, skip it.
if (compareFileWithData(fileName, data, writeByteOrderMark)) {
return;
}
// If a BOM is required, emit one
if (writeByteOrderMark) {
data = '\uFEFF' + data;
}
_fs.writeFileSync(fileName, data, "utf8");
} |
@vilic can you describe the type of speedup you're getting from this? Is it substantial? What is the size of the codebase you are |
Some numbers: Modified Initial compilation.
Incremental compilation.
Unmodified Initial compilation.
Incremental compilation.
These are numbers when third party tools are not watching, it seems that keep them watching will dramatically increase i/o write time (up to about 2 seconds, on (my) Windows). And updating only necessary files speeds up re-releasing process quite a lot. |
So IO-write goes from around 0.6s to around 0.1 seconds. A reasonable savings, though not huge. I'm curious what third party tools you're using that are watching this? |
We are using yog, which is an integrated solution for fis on NodeJS. We are moving our project from PHP to NodeJS, before yog we were using another integrated solution for fis named fisp. The integrated solution (yog/fisp) divided a project to several apps, and we need to watch and release for apps we are working on, separately. |
We would consider a PR for this. things to watch for are the size of the change (a non-invasive/destabilizing would be ideal), also a quick way to verify that the file has not changed would be ideal. |
@mhegazy I would like to give it try this weekends. But I am not very sure about what you mean by "size of the change"? Do you want the implementation to be limited to file system utils or with the help of something else? |
@vilic, we will need an implementation proposal, outlining the approach of detecting if a file changed or not, and what implications if any on the performance. I would not expect this change to be large, if that is not the case let us know. |
@mhegazy I am thinking about memorizing the last modified time and output after writing every file. And determine whether a file has external changes by querying and comparing its last modified time. If the last modified time changed or the output changed, the file will get updated. (I guess storing output in memory should have minor impact comparing to things like AST?) |
i would be concerned about performance, mainly GC load if we keep two copies of the project in memory at a time. |
Then what I can think of are:
|
Storing a hash of the output would be the most memory efficient way, I would suspect. |
@kitsonk I think so. But as I am not familiar with general GC implementation, I don't know which would actually have more performance impact. I think the output (which are strings) should be considered ignorable comparing to the AST stored in memory. |
Comparing AST is likely to be CPU intensive when you have low frequency of change. The only way you know that nothing has changed is by walking the whole AST tree. If the change level is high, then you can obviously benefit from short circuit logic. Where as a hash of the output would have the same CPU profile and most engine string comparison is highly efficient and also benefits from short circuiting, but on a much more smaller data set, with no tree walking logic. I suspect it would be several orders of magnitude more CPU efficient irrespective of likelihood of change. As far as GC/memory utilisation in general, I can't see where it wouldn't be more efficient unless every output file was really small. Less memory used, less GC incurred. A map of output hashes seems ideal. I would even consider abandoning time stamps of the files, as validating this is likely wasted cycles. If the last output hash is different, output, if not, don't, if there is no output hash in memory, output. |
@kitsonk Oops, there seems to be some misunderstanding. I meant string output won't take much memory comparing TO AST. But it helps if hashing in Node implementation is very efficient. (But I guess that also means we would probably give up other hosts on this.) |
Well, even still, big strings are more efficient than AST. AST would have a pointer for each node and those pointers are unlikely to be shared. A big string would have one or two pointers (I am not 100% sure) irrespective of size. Also, JVMs take advantage of string interning (e.g. While NodeJS comes with some hashing functions, it would be trivial to fallback to a JS hash function on other hosts. |
@kitsonk I see, thanks a lot! |
What we could do is use node hashing functions and if not available fall back to the current behavior. |
thanks @vilic ! |
One thing this would really help with for us is preventing our nodemon apps from all going nuts when re-running watchers. Right now it causes tons of restarts, cpu goes to 100% for quite a while. So we have to kill all our processes and completely restart all the apps whenever we make a change that watch doesnt pick up, which seems fairly often (sometimes even adding a new file seems to not get picked up). Edit: I see this claims to be fixed, but I still see a lot of restarts. Perhaps there is something else going on... |
There seems to have been a regression. I noticed it while using nodemon, but it's reproducible without it: $ tsc --version
Version 3.3.1
$ tsc --init
$ tsc --watch
$ echo '//a' > a.ts
$ stat a.js
16777220 21539803 -rw-r--r-- 1 laptop staff 0 18 "Feb 6 10:42:56 2019" "Feb 6 10:45:59 2019" "Feb 6 10:45:59 2019" "Feb 6 10:42:56 2019" 4096 8 0 a.js
$ echo '//a' > a.ts
$ stat a.js
16777220 21539803 -rw-r--r-- 1 rabbit staff 0 18 "Feb 6 10:42:56 2019" "Feb 6 10:47:21 2019" "Feb 6 10:47:21 2019" "Feb 6 10:42:56 2019" 4096 8 0 a.js I've also confirmed the above triggers file change events for nodemon. |
Updating all files slows down debugging.
Compile on save would also do but I can't expect it would soon be there in Visual Studio Code, even it would be there, it might not be stable enough since the very beginning. It would be great if
tsc
updates changed files only when doing incremental compilation.The text was updated successfully, but these errors were encountered: