Skip to content

Change only updated files when using watch option. #3113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vilicvane opened this issue May 11, 2015 · 27 comments
Closed

Change only updated files when using watch option. #3113

vilicvane opened this issue May 11, 2015 · 27 comments
Labels
Fixed A PR has been merged for this issue Help Wanted You can do this Suggestion An idea for TypeScript

Comments

@vilicvane
Copy link

Updating all files slows down debugging.

Compile on save would also do but I can't expect it would soon be there in Visual Studio Code, even it would be there, it might not be stable enough since the very beginning. It would be great if tsc updates changed files only when doing incremental compilation.

@DanielRosenwasser
Copy link
Member

We already cache the bound ASTs during --watch to speed things up, but the problem with holding off on emit is that it's difficult to see what files might change in the process. Imagine two files, a.ts and b.ts.

a.ts

const enum E {
    Hello, World
}

b.ts

var x = [E.Hello, E.World]

When compiled, b.js will be

var x = [0, 1]

If we change a.ts to the following

const enum E {
    Hello, Beautiful, World
}

b.js is also affected and it's difficult to track that:

var x = [0, 2]

I could see us caching a hash of the emit result to avoid an IO penalty but I'd have to see some numbers indicating whether this becomes a real win.


Edit: I should clarify - when I say that it's difficult, I mean that it would introduce complexity in the compiler which I don't think is warranted.

@CyrusNajmabadi
Copy link
Contributor

Are you trying to save the cost of compiling the file? Or just the cost of writing the file to disk?

We cannot avoid the former. We possibly could avoid the latter. Though we'd need numbers to know what sort of benefit there would be.

@vilicvane
Copy link
Author

Thanks for your replies.

IO seems to be an issue on (my) Windows, though I can't figure out why (adding exception to Windows Defender makes it better, but still slow comparing to Mac). http://stackoverflow.com/questions/30046811/why-is-compiling-typescript-on-mac-much-faster-than-on-pc

On the other hand, many tools have "watch" functionality to detect file changes and do some re-releasing or restarting. I managed to make the part I can control smoother by determine compiling status, but I can't change third-party tools we use. So they are purely file changes + timeout based. Which means one more js file the compiler updated, one more js file that tool will need to re-releasing. Also, if two files are emitted not within time that's short enough, that tool might run more than once.

These kind of slow down our debugging process.

@CyrusNajmabadi
Copy link
Contributor

Sounds like a reasonable suggestion. We could definitely attempt to avoid writes to files if we've emitted the exact same bytes as what exists on disk. We'd likely need to still read the files in to verify if they changed or not. But at least we wouldn't be writing (and updating timestamps) for anything unchanged.

@vilicvane
Copy link
Author

That would definitely help, looking forward to it.

@danquirk danquirk added the Suggestion An idea for TypeScript label May 12, 2015
@scriby
Copy link

scriby commented May 13, 2015

I noticed this as well recently, and ended up using grunt-ts, which will only pass changed files to the compiler and reduces output significantly

@vilicvane
Copy link
Author

@scriby FYI, I made some changes to sys.ts, and replaced tsc.js file of my team as a temporary solution.

function compareFileWithData(fileName: string, data: string, writeByteOrderMark: boolean): boolean {
    if (!_fs.existsSync(fileName)) {
        return false;
    }
    var buffer = _fs.readFileSync(fileName);
    var len = buffer.length;
    if (len >= 2 && buffer[0] === 0xFE && buffer[1] === 0xFF) {
        // Big endian UTF-16 byte order mark detected. Since big endian is not supported by node.js,
        // flip all byte pairs and treat as little endian.
        return false;
    }
    if (len >= 2 && buffer[0] === 0xFF && buffer[1] === 0xFE) {
        // Little endian UTF-16 byte order mark detected
        return false;
    }
    if (len >= 3 && buffer[0] === 0xEF && buffer[1] === 0xBB && buffer[2] === 0xBF) {
        // UTF-8 byte order mark detected
        return writeByteOrderMark && buffer.toString("utf8", 3) == data;
    }
    // Default is UTF-8 with no byte order mark
    return !writeByteOrderMark && buffer.toString("utf8") == data;
}

function writeFile(fileName: string, data: string, writeByteOrderMark?: boolean): void {
    // If the file is exactly the same as what will be written, skip it.
    if (compareFileWithData(fileName, data, writeByteOrderMark)) {
        return;
    }

    // If a BOM is required, emit one
    if (writeByteOrderMark) {
        data = '\uFEFF' + data;
    }

    _fs.writeFileSync(fileName, data, "utf8");
}

@DanielRosenwasser
Copy link
Member

@vilic can you describe the type of speedup you're getting from this? Is it substantial? What is the size of the codebase you are --watching?

@vilicvane
Copy link
Author

Some numbers:

Modified

Initial compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   124673k
i/o read:        0.10s
i/o write:       0.12s
parse time:      0.67s
bind time:       0.36s
check time:      1.26s
emit time:       0.54s

Incremental compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   166245k
i/o read:        0.02s
i/o write:       0.14s
parse time:      0.06s
bind time:       0.01s
check time:      1.65s
emit time:       0.47s
total time:      2.19s

Unmodified

Initial compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   120182k
i/o read:        0.08s
i/o write:       0.49s
parse time:      0.64s
bind time:       0.35s
check time:      1.22s
emit time:       0.97s

Incremental compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   154028k
i/o read:        0.01s
i/o write:       0.64s
parse time:      0.05s
bind time:       0.00s
check time:      1.82s
emit time:       1.19s

These are numbers when third party tools are not watching, it seems that keep them watching will dramatically increase i/o write time (up to about 2 seconds, on (my) Windows).

And updating only necessary files speeds up re-releasing process quite a lot.

@CyrusNajmabadi
Copy link
Contributor

So IO-write goes from around 0.6s to around 0.1 seconds. A reasonable savings, though not huge.

I'm curious what third party tools you're using that are watching this?

@vilicvane
Copy link
Author

We are using yog, which is an integrated solution for fis on NodeJS. We are moving our project from PHP to NodeJS, before yog we were using another integrated solution for fis named fisp.

The integrated solution (yog/fisp) divided a project to several apps, and we need to watch and release for apps we are working on, separately.

@mhegazy
Copy link
Contributor

mhegazy commented Dec 9, 2015

We would consider a PR for this. things to watch for are the size of the change (a non-invasive/destabilizing would be ideal), also a quick way to verify that the file has not changed would be ideal.

@mhegazy mhegazy added the Help Wanted You can do this label Dec 9, 2015
@vilicvane
Copy link
Author

@mhegazy I would like to give it try this weekends. But I am not very sure about what you mean by "size of the change"? Do you want the implementation to be limited to file system utils or with the help of something else?

@mhegazy
Copy link
Contributor

mhegazy commented Jan 4, 2016

@vilic, we will need an implementation proposal, outlining the approach of detecting if a file changed or not, and what implications if any on the performance. I would not expect this change to be large, if that is not the case let us know.

@vilicvane
Copy link
Author

@mhegazy I am thinking about memorizing the last modified time and output after writing every file. And determine whether a file has external changes by querying and comparing its last modified time. If the last modified time changed or the output changed, the file will get updated. (I guess storing output in memory should have minor impact comparing to things like AST?)

@mhegazy
Copy link
Contributor

mhegazy commented Jan 29, 2016

i would be concerned about performance, mainly GC load if we keep two copies of the project in memory at a time.

@vilicvane
Copy link
Author

Then what I can think of are:

  1. Read previous output from file system every time, just like the very first implementation. (But maybe compare by buffers in stream.)
  2. Store hashes of previous output files and last modified time, and comparing these factors to determine whether actual outputs are needed.

@kitsonk
Copy link
Contributor

kitsonk commented Jan 31, 2016

Storing a hash of the output would be the most memory efficient way, I would suspect.

@vilicvane
Copy link
Author

@kitsonk I think so. But as I am not familiar with general GC implementation, I don't know which would actually have more performance impact. I think the output (which are strings) should be considered ignorable comparing to the AST stored in memory.

@kitsonk
Copy link
Contributor

kitsonk commented Jan 31, 2016

Comparing AST is likely to be CPU intensive when you have low frequency of change. The only way you know that nothing has changed is by walking the whole AST tree. If the change level is high, then you can obviously benefit from short circuit logic. Where as a hash of the output would have the same CPU profile and most engine string comparison is highly efficient and also benefits from short circuiting, but on a much more smaller data set, with no tree walking logic. I suspect it would be several orders of magnitude more CPU efficient irrespective of likelihood of change. As far as GC/memory utilisation in general, I can't see where it wouldn't be more efficient unless every output file was really small. Less memory used, less GC incurred. A map of output hashes seems ideal. I would even consider abandoning time stamps of the files, as validating this is likely wasted cycles. If the last output hash is different, output, if not, don't, if there is no output hash in memory, output.

@vilicvane
Copy link
Author

@kitsonk Oops, there seems to be some misunderstanding. I meant string output won't take much memory comparing TO AST. But it helps if hashing in Node implementation is very efficient. (But I guess that also means we would probably give up other hosts on this.)

@kitsonk
Copy link
Contributor

kitsonk commented Feb 1, 2016

Well, even still, big strings are more efficient than AST. AST would have a pointer for each node and those pointers are unlikely to be shared. A big string would have one or two pointers (I am not 100% sure) irrespective of size. Also, JVMs take advantage of string interning (e.g. a = 'abc' and b='abc' share the same memory space) and are immutable (you mutate a string it reallocated and the previous instance is potentially freed for GC) and so tend to be more memory efficient than any other sort of complex multi-nodal representation, where the analysis of GC alone could be quite a lot of overhead. I guess comparing output files, without hashing their value, likely would be a similar CPU profile, though obviously more working memory that is less likely to get GC'ed.

While NodeJS comes with some hashing functions, it would be trivial to fallback to a JS hash function on other hosts.

@vilicvane
Copy link
Author

@kitsonk I see, thanks a lot!

@mhegazy
Copy link
Contributor

mhegazy commented Feb 6, 2016

What we could do is use node hashing functions and if not available fall back to the current behavior.

@mhegazy mhegazy added this to the TypeScript 2.0 milestone Apr 4, 2016
@mhegazy mhegazy added the Fixed A PR has been merged for this issue label Apr 4, 2016
@mhegazy
Copy link
Contributor

mhegazy commented Apr 4, 2016

thanks @vilic !

@natew
Copy link

natew commented Jun 13, 2018

One thing this would really help with for us is preventing our nodemon apps from all going nuts when re-running watchers.

Right now it causes tons of restarts, cpu goes to 100% for quite a while. So we have to kill all our processes and completely restart all the apps whenever we make a change that watch doesnt pick up, which seems fairly often (sometimes even adding a new file seems to not get picked up).

Edit: I see this claims to be fixed, but I still see a lot of restarts. Perhaps there is something else going on...

@gilbert
Copy link

gilbert commented Feb 6, 2019

There seems to have been a regression. I noticed it while using nodemon, but it's reproducible without it:

$ tsc --version
Version 3.3.1
$ tsc --init
$ tsc --watch

$ echo '//a' > a.ts
$ stat a.js
16777220 21539803 -rw-r--r-- 1 laptop staff 0 18 "Feb  6 10:42:56 2019" "Feb  6 10:45:59 2019" "Feb  6 10:45:59 2019" "Feb  6 10:42:56 2019" 4096 8 0 a.js

$ echo '//a' > a.ts
$ stat a.js
16777220 21539803 -rw-r--r-- 1 rabbit staff 0 18 "Feb  6 10:42:56 2019" "Feb  6 10:47:21 2019" "Feb  6 10:47:21 2019" "Feb  6 10:42:56 2019" 4096 8 0 a.js

I've also confirmed the above triggers file change events for nodemon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Fixed A PR has been merged for this issue Help Wanted You can do this Suggestion An idea for TypeScript
Projects
None yet
Development

No branches or pull requests

10 participants