Change only updated files when using watch option. #3113

vilicvane · 2015-05-11T12:59:50Z

Updating all files slows down debugging.

Compile on save would also do but I can't expect it would soon be there in Visual Studio Code, even it would be there, it might not be stable enough since the very beginning. It would be great if tsc updates changed files only when doing incremental compilation.

The text was updated successfully, but these errors were encountered:

DanielRosenwasser · 2015-05-11T19:37:16Z

We already cache the bound ASTs during --watch to speed things up, but the problem with holding off on emit is that it's difficult to see what files might change in the process. Imagine two files, a.ts and b.ts.

a.ts

const enum E {
    Hello, World
}

b.ts

var x = [E.Hello, E.World]

When compiled, b.js will be

var x = [0, 1]

If we change a.ts to the following

const enum E {
    Hello, Beautiful, World
}

b.js is also affected and it's difficult to track that:

var x = [0, 2]

I could see us caching a hash of the emit result to avoid an IO penalty but I'd have to see some numbers indicating whether this becomes a real win.

Edit: I should clarify - when I say that it's difficult, I mean that it would introduce complexity in the compiler which I don't think is warranted.

CyrusNajmabadi · 2015-05-12T03:18:40Z

Are you trying to save the cost of compiling the file? Or just the cost of writing the file to disk?

We cannot avoid the former. We possibly could avoid the latter. Though we'd need numbers to know what sort of benefit there would be.

vilicvane · 2015-05-12T03:56:49Z

Thanks for your replies.

IO seems to be an issue on (my) Windows, though I can't figure out why (adding exception to Windows Defender makes it better, but still slow comparing to Mac). http://stackoverflow.com/questions/30046811/why-is-compiling-typescript-on-mac-much-faster-than-on-pc

On the other hand, many tools have "watch" functionality to detect file changes and do some re-releasing or restarting. I managed to make the part I can control smoother by determine compiling status, but I can't change third-party tools we use. So they are purely file changes + timeout based. Which means one more js file the compiler updated, one more js file that tool will need to re-releasing. Also, if two files are emitted not within time that's short enough, that tool might run more than once.

These kind of slow down our debugging process.

CyrusNajmabadi · 2015-05-12T17:50:23Z

Sounds like a reasonable suggestion. We could definitely attempt to avoid writes to files if we've emitted the exact same bytes as what exists on disk. We'd likely need to still read the files in to verify if they changed or not. But at least we wouldn't be writing (and updating timestamps) for anything unchanged.

vilicvane · 2015-05-12T18:28:05Z

That would definitely help, looking forward to it.

scriby · 2015-05-13T08:34:11Z

I noticed this as well recently, and ended up using grunt-ts, which will only pass changed files to the compiler and reduces output significantly

vilicvane · 2015-05-13T08:39:21Z

@scriby FYI, I made some changes to sys.ts, and replaced tsc.js file of my team as a temporary solution.

function compareFileWithData(fileName: string, data: string, writeByteOrderMark: boolean): boolean {
    if (!_fs.existsSync(fileName)) {
        return false;
    }
    var buffer = _fs.readFileSync(fileName);
    var len = buffer.length;
    if (len >= 2 && buffer[0] === 0xFE && buffer[1] === 0xFF) {
        // Big endian UTF-16 byte order mark detected. Since big endian is not supported by node.js,
        // flip all byte pairs and treat as little endian.
        return false;
    }
    if (len >= 2 && buffer[0] === 0xFF && buffer[1] === 0xFE) {
        // Little endian UTF-16 byte order mark detected
        return false;
    }
    if (len >= 3 && buffer[0] === 0xEF && buffer[1] === 0xBB && buffer[2] === 0xBF) {
        // UTF-8 byte order mark detected
        return writeByteOrderMark && buffer.toString("utf8", 3) == data;
    }
    // Default is UTF-8 with no byte order mark
    return !writeByteOrderMark && buffer.toString("utf8") == data;
}

function writeFile(fileName: string, data: string, writeByteOrderMark?: boolean): void {
    // If the file is exactly the same as what will be written, skip it.
    if (compareFileWithData(fileName, data, writeByteOrderMark)) {
        return;
    }

    // If a BOM is required, emit one
    if (writeByteOrderMark) {
        data = '\uFEFF' + data;
    }

    _fs.writeFileSync(fileName, data, "utf8");
}

DanielRosenwasser · 2015-05-14T01:33:28Z

@vilic can you describe the type of speedup you're getting from this? Is it substantial? What is the size of the codebase you are --watching?

vilicvane · 2015-05-14T03:37:56Z

Some numbers:

Modified

Initial compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   124673k
i/o read:        0.10s
i/o write:       0.12s
parse time:      0.67s
bind time:       0.36s
check time:      1.26s
emit time:       0.54s

Incremental compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   166245k
i/o read:        0.02s
i/o write:       0.14s
parse time:      0.06s
bind time:       0.01s
check time:      1.65s
emit time:       0.47s
total time:      2.19s

Unmodified

Initial compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   120182k
i/o read:        0.08s
i/o write:       0.49s
parse time:      0.64s
bind time:       0.35s
check time:      1.22s
emit time:       0.97s

Incremental compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   154028k
i/o read:        0.01s
i/o write:       0.64s
parse time:      0.05s
bind time:       0.00s
check time:      1.82s
emit time:       1.19s

These are numbers when third party tools are not watching, it seems that keep them watching will dramatically increase i/o write time (up to about 2 seconds, on (my) Windows).

And updating only necessary files speeds up re-releasing process quite a lot.

CyrusNajmabadi · 2015-05-14T03:58:33Z

So IO-write goes from around 0.6s to around 0.1 seconds. A reasonable savings, though not huge.

I'm curious what third party tools you're using that are watching this?

vilicvane · 2015-05-14T04:20:47Z

We are using yog, which is an integrated solution for fis on NodeJS. We are moving our project from PHP to NodeJS, before yog we were using another integrated solution for fis named fisp.

The integrated solution (yog/fisp) divided a project to several apps, and we need to watch and release for apps we are working on, separately.

mhegazy · 2015-12-09T19:29:39Z

We would consider a PR for this. things to watch for are the size of the change (a non-invasive/destabilizing would be ideal), also a quick way to verify that the file has not changed would be ideal.

vilicvane · 2015-12-10T06:41:59Z

@mhegazy I would like to give it try this weekends. But I am not very sure about what you mean by "size of the change"? Do you want the implementation to be limited to file system utils or with the help of something else?

mhegazy · 2016-01-04T21:37:31Z

@vilic, we will need an implementation proposal, outlining the approach of detecting if a file changed or not, and what implications if any on the performance. I would not expect this change to be large, if that is not the case let us know.

vilicvane · 2016-01-29T07:03:33Z

@mhegazy I am thinking about memorizing the last modified time and output after writing every file. And determine whether a file has external changes by querying and comparing its last modified time. If the last modified time changed or the output changed, the file will get updated. (I guess storing output in memory should have minor impact comparing to things like AST?)

mhegazy · 2016-01-29T23:21:40Z

i would be concerned about performance, mainly GC load if we keep two copies of the project in memory at a time.

vilicvane · 2016-01-30T03:01:36Z

Then what I can think of are:

Read previous output from file system every time, just like the very first implementation. (But maybe compare by buffers in stream.)
Store hashes of previous output files and last modified time, and comparing these factors to determine whether actual outputs are needed.

kitsonk · 2016-01-31T08:39:04Z

Storing a hash of the output would be the most memory efficient way, I would suspect.

vilicvane · 2016-01-31T08:45:16Z

@kitsonk I think so. But as I am not familiar with general GC implementation, I don't know which would actually have more performance impact. I think the output (which are strings) should be considered ignorable comparing to the AST stored in memory.

kitsonk · 2016-01-31T17:55:49Z

Comparing AST is likely to be CPU intensive when you have low frequency of change. The only way you know that nothing has changed is by walking the whole AST tree. If the change level is high, then you can obviously benefit from short circuit logic. Where as a hash of the output would have the same CPU profile and most engine string comparison is highly efficient and also benefits from short circuiting, but on a much more smaller data set, with no tree walking logic. I suspect it would be several orders of magnitude more CPU efficient irrespective of likelihood of change. As far as GC/memory utilisation in general, I can't see where it wouldn't be more efficient unless every output file was really small. Less memory used, less GC incurred. A map of output hashes seems ideal. I would even consider abandoning time stamps of the files, as validating this is likely wasted cycles. If the last output hash is different, output, if not, don't, if there is no output hash in memory, output.

vilicvane · 2016-02-01T02:48:13Z

@kitsonk Oops, there seems to be some misunderstanding. I meant string output won't take much memory comparing TO AST. But it helps if hashing in Node implementation is very efficient. (But I guess that also means we would probably give up other hosts on this.)

kitsonk · 2016-02-01T09:45:40Z

Well, even still, big strings are more efficient than AST. AST would have a pointer for each node and those pointers are unlikely to be shared. A big string would have one or two pointers (I am not 100% sure) irrespective of size. Also, JVMs take advantage of string interning (e.g. a = 'abc' and b='abc' share the same memory space) and are immutable (you mutate a string it reallocated and the previous instance is potentially freed for GC) and so tend to be more memory efficient than any other sort of complex multi-nodal representation, where the analysis of GC alone could be quite a lot of overhead. I guess comparing output files, without hashing their value, likely would be a similar CPU profile, though obviously more working memory that is less likely to get GC'ed.

While NodeJS comes with some hashing functions, it would be trivial to fallback to a JS hash function on other hosts.

vilicvane · 2016-02-01T09:51:44Z

@kitsonk I see, thanks a lot!

mhegazy · 2016-02-06T03:45:19Z

What we could do is use node hashing functions and if not available fall back to the current behavior.

mhegazy · 2016-04-04T22:33:24Z

thanks @vilic !

natew · 2018-06-13T21:16:13Z

One thing this would really help with for us is preventing our nodemon apps from all going nuts when re-running watchers.

Right now it causes tons of restarts, cpu goes to 100% for quite a while. So we have to kill all our processes and completely restart all the apps whenever we make a change that watch doesnt pick up, which seems fairly often (sometimes even adding a new file seems to not get picked up).

Edit: I see this claims to be fixed, but I still see a lot of restarts. Perhaps there is something else going on...

gilbert · 2019-02-06T18:49:17Z

There seems to have been a regression. I noticed it while using nodemon, but it's reproducible without it:

$ tsc --version
Version 3.3.1
$ tsc --init
$ tsc --watch

$ echo '//a' > a.ts
$ stat a.js
16777220 21539803 -rw-r--r-- 1 laptop staff 0 18 "Feb  6 10:42:56 2019" "Feb  6 10:45:59 2019" "Feb  6 10:45:59 2019" "Feb  6 10:42:56 2019" 4096 8 0 a.js

$ echo '//a' > a.ts
$ stat a.js
16777220 21539803 -rw-r--r-- 1 rabbit staff 0 18 "Feb  6 10:42:56 2019" "Feb  6 10:47:21 2019" "Feb  6 10:47:21 2019" "Feb  6 10:42:56 2019" 4096 8 0 a.js

I've also confirmed the above triggers file change events for nodemon.

ivogabe mentioned this issue May 12, 2015

Duplicate of gulp-typescript jrieken/gulp-tsb#5

Closed

danquirk added the Suggestion An idea for TypeScript label May 12, 2015

mhegazy added the Help Wanted You can do this label Dec 9, 2015

vilicvane mentioned this issue Feb 6, 2016

Avoid writing files that are not changed while compiling incrementally. #6937

Merged

mhegazy added this to the TypeScript 2.0 milestone Apr 4, 2016

mhegazy added the Fixed A PR has been merged for this issue label Apr 4, 2016

RyanCavanaugh closed this as completed Apr 4, 2016

mhegazy mentioned this issue Jun 27, 2016

nodemon - change files on disk that really changed #9364

Closed

kimamula mentioned this issue Jan 9, 2017

Emit only updated files during incremental compilation ivogabe/gulp-typescript#478

Closed

gilbert mentioned this issue Feb 15, 2019

tsc --watch emits all files even when they have not changed #29927

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change only updated files when using watch option. #3113

Change only updated files when using watch option. #3113

vilicvane commented May 11, 2015

DanielRosenwasser commented May 11, 2015

CyrusNajmabadi commented May 12, 2015

vilicvane commented May 12, 2015

CyrusNajmabadi commented May 12, 2015

vilicvane commented May 12, 2015

scriby commented May 13, 2015

vilicvane commented May 13, 2015

DanielRosenwasser commented May 14, 2015

vilicvane commented May 14, 2015

CyrusNajmabadi commented May 14, 2015

vilicvane commented May 14, 2015

mhegazy commented Dec 9, 2015

vilicvane commented Dec 10, 2015

mhegazy commented Jan 4, 2016

vilicvane commented Jan 29, 2016

mhegazy commented Jan 29, 2016

vilicvane commented Jan 30, 2016

kitsonk commented Jan 31, 2016

vilicvane commented Jan 31, 2016

kitsonk commented Jan 31, 2016

vilicvane commented Feb 1, 2016

kitsonk commented Feb 1, 2016

vilicvane commented Feb 1, 2016

mhegazy commented Feb 6, 2016

mhegazy commented Apr 4, 2016

natew commented Jun 13, 2018 •

edited

Loading

gilbert commented Feb 6, 2019

Change only updated files when using watch option. #3113

Change only updated files when using watch option. #3113

Comments

vilicvane commented May 11, 2015

DanielRosenwasser commented May 11, 2015

CyrusNajmabadi commented May 12, 2015

vilicvane commented May 12, 2015

CyrusNajmabadi commented May 12, 2015

vilicvane commented May 12, 2015

scriby commented May 13, 2015

vilicvane commented May 13, 2015

DanielRosenwasser commented May 14, 2015

vilicvane commented May 14, 2015

CyrusNajmabadi commented May 14, 2015

vilicvane commented May 14, 2015

mhegazy commented Dec 9, 2015

vilicvane commented Dec 10, 2015

mhegazy commented Jan 4, 2016

vilicvane commented Jan 29, 2016

mhegazy commented Jan 29, 2016

vilicvane commented Jan 30, 2016

kitsonk commented Jan 31, 2016

vilicvane commented Jan 31, 2016

kitsonk commented Jan 31, 2016

vilicvane commented Feb 1, 2016

kitsonk commented Feb 1, 2016

vilicvane commented Feb 1, 2016

mhegazy commented Feb 6, 2016

mhegazy commented Apr 4, 2016

natew commented Jun 13, 2018 • edited Loading

gilbert commented Feb 6, 2019

natew commented Jun 13, 2018 •

edited

Loading