Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codegen generates inefficient Java code for Long and Boolean mandatory parameters #2985

Open
jkronegg opened this issue Feb 6, 2025 · 6 comments · Fixed by cucumber/gherkin#385
Milestone

Comments

@jkronegg
Copy link
Contributor

jkronegg commented Feb 6, 2025

👓 What did you see?

For Long parameters, the codegen generates Java code which looks like this:

private final Long seconds;
private final Long nanos;

public Duration(
    Long seconds,
    Long nanos
) {
    this.seconds = requireNonNull(seconds, "Duration.seconds cannot be null");
    this.nanos = requireNonNull(nanos, "Duration.nanos cannot be null");
}

The parameters of type Long are most of the time required to be non-null, but still we are storing them into a object, not a primitive type.

This occurs for the following generated types:

  • Duration (both seconds and nanos fields)
  • Location (line field but not column field)
  • TestCaseStarted (attempt field)
  • Timestamp (both seconds and nanos fields)

Note that Group contains a non-mandatory field start.

The Boolean type is also impacted by the same issue, i.e. for the generated classes :

  • ParameterType (preferForRegularExpressionMatch and useForSnippets fields)
  • TestCaseFinished (willBeRetried field)
  • TestRunFinished (success field)

✅ What did you expect to see?

If the java.java.erb code generator see a Long which is mandatory, it should generate a long primitive type, so that no conversion is done.
This will avoid back and forth Long <-> long conversions, so will lead to faster code.

The same apply to Boolean type.

This would probably cause a breaking change because all interfaces to the impacted classes will change.

📦 Which tool/library version are you using?

gherkin 31.0.0 (which uses messages-27.2.0)

🔬 How could we reproduce it?

N/A (Look at the generated code).

📚 Any additional context?

The performance impact has been detected while working on cucumber/gherkin#361, using IntelliJ Profiler (it's hard to see the real impact because there is a lot of generated classes which suffer from this syndrome).

@mpkorstanje
Copy link
Contributor

I've always understood null checks to be quite efficient and we're not boxing/unboxing any primitives.

How much time is lost here in the context of parsing Gherkin?

@jkronegg
Copy link
Contributor Author

jkronegg commented Feb 6, 2025

It's hard to tell the full performance impact, but at least creating the Location object takes about 6% of io.cucumber.gherkin.Parser.parse(). That's relatively low and because of the breaking change, so I'm not sure it's the top priority to correct this issue.

@mpkorstanje
Copy link
Contributor

That doesn't sound too terrible, nearly every line has a location, multiple if there table cells on the line.

@jkronegg
Copy link
Contributor Author

I found the issue. There a lot of Location objects that are created (at least one for every feature file). This means there is a int to long to Long conversion. Converting int to long is cheap. But converting long to Long is cheap only for numbers lower than 128 (the Long value is cached). The issue comes from the fact that feature files are usually much longer than 128 lines.

By caching the about 4000 Long values (a reasonable feature file length), the Location creation becomes much cheaper, see cucumber/gherkin#372.

@mpkorstanje mpkorstanje transferred this issue from cucumber/messages Mar 30, 2025
mpkorstanje referenced this issue in cucumber/gherkin Mar 30, 2025
Separating out the elements from #372 that involve Location

* Replaced `io.cucumber.gherkin.Location` with
  `io.cucumber.messages.types.Location` to reduce creation of temporary
  objects.

* Ensure `Token.location` is non-null.

* `Location` instances are created through `Locations` utility allowing
  for reuse of `Long` objects.

* Pre-create `Long` objects for 0 through 4000 to avoid churn while
  parsing.

Fixes: #384
@mpkorstanje mpkorstanje reopened this Apr 4, 2025
@mpkorstanje mpkorstanje transferred this issue from cucumber/gherkin Apr 4, 2025
@mpkorstanje mpkorstanje added this to the v8.0.0 milestone Apr 5, 2025
@mpkorstanje mpkorstanje transferred this issue from cucumber/messages Apr 5, 2025
@mpkorstanje
Copy link
Contributor

Moved to Cucumber-JVM for tracking, changing this in messages will be a major change for Cucumber-JVM.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Apr 5, 2025

It would be worth while to use long for required values.

While OptionalLong would work for optional values, the problem comes with signaling the absence/presence of a value in the constructor (for example when used by Jackson) and all solutions there lead to boxing primitives.

Same for other required primitives.

https://docs.oracle.com/javase/8/docs/api/java/util/OptionalLong.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants