Codegen generates inefficient Java code for `Long` and `Boolean` mandatory parameters #2985

jkronegg · 2025-02-06T13:01:17Z

👓 What did you see?

For Long parameters, the codegen generates Java code which looks like this:

private final Long seconds;
private final Long nanos;

public Duration(
    Long seconds,
    Long nanos
) {
    this.seconds = requireNonNull(seconds, "Duration.seconds cannot be null");
    this.nanos = requireNonNull(nanos, "Duration.nanos cannot be null");
}

The parameters of type Long are most of the time required to be non-null, but still we are storing them into a object, not a primitive type.

This occurs for the following generated types:

Duration (both seconds and nanos fields)
Location (line field but not column field)
TestCaseStarted (attempt field)
Timestamp (both seconds and nanos fields)

Note that Group contains a non-mandatory field start.

The Boolean type is also impacted by the same issue, i.e. for the generated classes :

ParameterType (preferForRegularExpressionMatch and useForSnippets fields)
TestCaseFinished (willBeRetried field)
TestRunFinished (success field)

✅ What did you expect to see?

If the java.java.erb code generator see a Long which is mandatory, it should generate a long primitive type, so that no conversion is done.
This will avoid back and forth Long <-> long conversions, so will lead to faster code.

The same apply to Boolean type.

This would probably cause a breaking change because all interfaces to the impacted classes will change.

📦 Which tool/library version are you using?

gherkin 31.0.0 (which uses messages-27.2.0)

🔬 How could we reproduce it?

N/A (Look at the generated code).

📚 Any additional context?

The performance impact has been detected while working on cucumber/gherkin#361, using IntelliJ Profiler (it's hard to see the real impact because there is a lot of generated classes which suffer from this syndrome).

The text was updated successfully, but these errors were encountered:

mpkorstanje · 2025-02-06T13:33:23Z

I've always understood null checks to be quite efficient and we're not boxing/unboxing any primitives.

How much time is lost here in the context of parsing Gherkin?

jkronegg · 2025-02-06T13:58:45Z

It's hard to tell the full performance impact, but at least creating the Location object takes about 6% of io.cucumber.gherkin.Parser.parse(). That's relatively low and because of the breaking change, so I'm not sure it's the top priority to correct this issue.

mpkorstanje · 2025-02-06T19:52:16Z

That doesn't sound too terrible, nearly every line has a location, multiple if there table cells on the line.

jkronegg · 2025-03-11T23:44:01Z

I found the issue. There a lot of Location objects that are created (at least one for every feature file). This means there is a int to long to Long conversion. Converting int to long is cheap. But converting long to Long is cheap only for numbers lower than 128 (the Long value is cached). The issue comes from the fact that feature files are usually much longer than 128 lines.

By caching the about 4000 Long values (a reasonable feature file length), the Location creation becomes much cheaper, see cucumber/gherkin#372.

Separating out the elements from #372 that involve Location * Replaced `io.cucumber.gherkin.Location` with `io.cucumber.messages.types.Location` to reduce creation of temporary objects. * Ensure `Token.location` is non-null. * `Location` instances are created through `Locations` utility allowing for reuse of `Long` objects. * Pre-create `Long` objects for 0 through 4000 to avoid churn while parsing. Fixes: #384

mpkorstanje · 2025-04-05T11:04:56Z

Moved to Cucumber-JVM for tracking, changing this in messages will be a major change for Cucumber-JVM.

mpkorstanje · 2025-04-05T11:08:41Z

It would be worth while to use long for required values.

While OptionalLong would work for optional values, the problem comes with signaling the absence/presence of a value in the constructor (for example when used by Jackson) and all solutions there lead to boxing primitives.

Same for other required primitives.

https://docs.oracle.com/javase/8/docs/api/java/util/OptionalLong.html

jkronegg mentioned this issue Feb 11, 2025

java: Optimize GherkinLine performance cucumber/gherkin#372

Merged

7 tasks

mpkorstanje transferred this issue from cucumber/messages Mar 30, 2025

mpkorstanje mentioned this issue Mar 30, 2025

java: Optimize Location cucumber/gherkin#385

Merged

3 tasks

mpkorstanje closed this as completed in cucumber/gherkin#385 Mar 30, 2025

mpkorstanje closed this as completed in cucumber/gherkin@bdbe821 Mar 30, 2025

mpkorstanje reopened this Apr 4, 2025

mpkorstanje transferred this issue from cucumber/gherkin Apr 4, 2025

mpkorstanje added this to the v8.0.0 milestone Apr 5, 2025

mpkorstanje transferred this issue from cucumber/messages Apr 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codegen generates inefficient Java code for `Long` and `Boolean` mandatory parameters #2985

Codegen generates inefficient Java code for `Long` and `Boolean` mandatory parameters #2985

jkronegg commented Feb 6, 2025

mpkorstanje commented Feb 6, 2025

jkronegg commented Feb 6, 2025

mpkorstanje commented Feb 6, 2025

jkronegg commented Mar 11, 2025

mpkorstanje commented Apr 5, 2025

mpkorstanje commented Apr 5, 2025 •

edited

Loading

Codegen generates inefficient Java code for Long and Boolean mandatory parameters #2985

Codegen generates inefficient Java code for Long and Boolean mandatory parameters #2985

Comments

jkronegg commented Feb 6, 2025

👓 What did you see?

✅ What did you expect to see?

📦 Which tool/library version are you using?

🔬 How could we reproduce it?

📚 Any additional context?

mpkorstanje commented Feb 6, 2025

jkronegg commented Feb 6, 2025

mpkorstanje commented Feb 6, 2025

jkronegg commented Mar 11, 2025

mpkorstanje commented Apr 5, 2025

mpkorstanje commented Apr 5, 2025 • edited Loading

Codegen generates inefficient Java code for `Long` and `Boolean` mandatory parameters #2985

Codegen generates inefficient Java code for `Long` and `Boolean` mandatory parameters #2985

mpkorstanje commented Apr 5, 2025 •

edited

Loading