Skip to content

[native-image] Native image takes more time than regular java application #974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
moorsu opened this issue Feb 12, 2019 · 10 comments
Closed
Assignees

Comments

@moorsu
Copy link

moorsu commented Feb 12, 2019

Following code, when compiled and ran it as java application, completes it by 34 seconds
whereas when I build the native image and ran it, it took 1 min 30 seconds. Am I miss something here?

I used ./native-image -cp . writer.Test to create the image

$ time java -cp bin writer.Test

real 0m34.762s
user 0m25.064s
sys 0m10.100s

$ time ./writer.test

real 1m30.609s
user 1m9.447s
sys 0m20.056s

package writer;

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;

public class Test {

public static void main(String arg[]) {
	
	BufferedWriter bw = null;
	FileWriter fw = null;

	int classes = 2500;
	int filesCount = 14;
	int records = 5000;
	int columns = 10;
	
	try {

		for (int k=0; k<classes; k++) {
			for (int j=0; j<filesCount; j++) {
				StringBuilder builder = new StringBuilder();
				for (int i=0; i<records; i++) {
					for (int c=0; c<columns; c++) {
						builder.append(c);
						builder.append(",");
					}
				}

				fw = new FileWriter("/tmp/" + k + "_" + j + ".txt");
				bw = new BufferedWriter(fw);
				bw.write(builder.toString());
				bw.close();
				fw.close();
			}
		}

	} catch (IOException e) {
		e.printStackTrace();

	} finally {
		try {

			if (bw != null)
				bw.close();

			if (fw != null)
				fw.close();

		} catch (IOException ex) {
			ex.printStackTrace();
		}
	}
}

}

@vjovanov
Copy link
Member

This is most likely caused by: 1) the slow implementation of array copy that we use at the moment and 2) that native-image is probably not inlining builder.append completely.

We are currently working on a faster array copy. Let's see how well do we perform after that fix is made.

@tarsa
Copy link

tarsa commented Feb 12, 2019

Native image is slower in many more benchmarks, see here: https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/java-substratevm.html (they are calling Graal's nativeimage SubstrateVM).

@SergejIsbrecht
Copy link

System:
Linux sergej-P50 4.18.0-13-generic #14-Ubuntu SMP Wed Dec 5 09:04:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment (build 11.0.1+13-Ubuntu-3ubuntu3.18.10.1)
OpenJDK 64-Bit Server VM (build 11.0.1+13-Ubuntu-3ubuntu3.18.10.1, mixed mode, sharing)

sergej@sergej-P50:~/Development/GitHub/GraalVMTest/oracleIssue/build/classes/java/main$ /usr/bin/time -v java -cp . GraalIssue974
        Command being timed: "java -cp . GraalIssue974"
        User time (seconds): 14.24
        System time (seconds): 2.96
        Percent of CPU this job got: 106%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.21
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 375772
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 89396
        Voluntary context switches: 3197
        Involuntary context switches: 1929
        Swaps: 0
        File system inputs: 8
        File system outputs: 7000064
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

GraalVM RC12 CE (native-image)

sergej@sergej-P50:~/Development/GitHub/GraalVMTest/oracleIssue/build/graal$ /usr/bin/time -v ./hello-world 
        Command being timed: "./hello-world"
        User time (seconds): 61.32
        System time (seconds): 8.68
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:10.07
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 271604
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 5579856
        Voluntary context switches: 19
        Involuntary context switches: 576
        Swaps: 0
        File system inputs: 24
        File system outputs: 7000000
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Tomorrow I will provide some perf data. This will show why the native-image is that much slower.

@olpaw
Copy link
Member

olpaw commented Feb 13, 2019

Command being timed: "java -cp . GraalIssue974"
Minor (reclaiming a frame) page faults: 89396

vs.

Command being timed: "./hello-world"
Minor (reclaiming a frame) page faults: 5579856

Yikes!

@SergejIsbrecht
Copy link

SergejIsbrecht commented Feb 13, 2019

sudo perf record -F 99 --call-graph fp ~/Development/GitHub/GraalVMTest/oracleIssue/build/graal/hello-world

FlameGraph from recorded native-image data: https://gist.github.com/SergejIsbrecht/83c89c66caa9b06cc3e457db82546b17

It looks like most of the cpu cycles will be used for:
String_getChars_c1abef6cde0009d1f55ec852da6e07b19eb1cade -> ArraycopySnippets_doArraycopy_043961c4b8a78c521bb6ac0bcc8243e2da5fd903 (28%)

page faults make up 5.3 percent of captured stacks

I will also provide a FlameGraph for JDK but I needs some time to set it up.

Note: Active FP burn lots of CPU cycles, see #916. I could possibly use lbr as call-graph param to reduce FP overhead.

@SergejIsbrecht
Copy link

@vjovanov, do you have a branch on github for this change? I would like to look into it if possible.

@vjovanov
Copy link
Member

No branch yet, for now, I have moved all the ArrayCopy snippets from HotSpot to common code. Now, this needs to be used in native-image and all of the array copy calls need to be implemented as foreign functions for native-image.

For inlining the StringBuilder.append you can try and increase the inlining threshold: -H=MaxNodesInTrivialMethod, -H:MaxNodesInTrivialLeafMethod, -H:MaxInvokesInTrivialMethod.

@turing85
Copy link

turing85 commented Feb 14, 2019

I have observed the same behaviour on multiple benchmarks (GraalVM-EE 1.0.0-RC12). You can find the benchmarks I wrote here: https://github.com/turing85/graal-playground

If you compare execution time of the JMH benchmarks of the execution time of the Fibonacci-, PrimeNumber- and stream-examples with the execution time of their native counterpart, you will see that the native image is up to 5x slower.

@SergejIsbrecht
Copy link

@turing85, probably create a new ticket sad case, because it could not be related to Array.copy, as this is the case here. Also Java is faster by design when the C2 gets really hot. The native-image does make sense in an embedded environment or for function as a service, when quick startup is really important.

@vjovanov
Copy link
Member

Latest measurements with PGO:

$ time java Test

________________________________________________________
Executed in   17.60 secs    fish           external
   usr time   14.05 secs  202.00 micros   14.05 secs
   sys time    3.92 secs  104.00 micros    3.92 secs

$ time ./test

________________________________________________________
Executed in   48.40 secs    fish           external
   usr time   42.91 secs  222.00 micros   42.91 secs
   sys time    5.45 secs  116.00 micros    5.45 secs


$ time ./test-pgo-native-architecture

________________________________________________________
Executed in   20.97 secs    fish           external
   usr time   15.83 secs  275.00 micros   15.83 secs
   sys time    5.11 secs  146.00 micros    5.11 secs

Closing the ticket as the gap is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants