Skip to content

--trace-compile ignores inferred const-return Methods #58482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
topolarity opened this issue May 21, 2025 · 13 comments · May be fixed by #58535
Open

--trace-compile ignores inferred const-return Methods #58482

topolarity opened this issue May 21, 2025 · 13 comments · May be fixed by #58535
Labels
latency Latency observability metrics, timing, understandability, reflection, logging, ...

Comments

@topolarity
Copy link
Member

topolarity commented May 21, 2025

If you write a method that ends up being inferred to a Const(...) return value:

julia> const k = rand(1:100)
61
julia> foo() = *(k, 15, 12)
julia> @code_typed foo()
CodeInfo(
1return 10980
) => Int64

it will not be included in the --trace-compile output:

$ julia --trace-compile=stderr -e "const k = rand(1:100); (foo() = *(k, 15)); println(foo())"
precompile(Tuple{typeof(Base.rand), Base.UnitRange{Int64}})
precompile(Tuple{typeof(Base.println), Int64})
precompile(Tuple{typeof(Base.println), Base.TTY, Int64})
705

Even though we ran our compilation pipeline on foo() for the first time (generating a new CodeInstance), it ended up not needing to compile via LLVM so it is excluded from the output.

A user may hope to use the output of --trace-compile to determine whether they should add precompile(foo, ()) to the pre-compilation workload, If we had reported this MethodInstance, it would have saved them that inference time.

@topolarity topolarity added the latency Latency label May 21, 2025
@Keno
Copy link
Member

Keno commented May 21, 2025

I think it makes sense for --trace-compile to include these. Possibly we may want additional filtering options later.

@vtjnash
Copy link
Member

vtjnash commented May 21, 2025

I believe we deprecated trace-compile for this purpose a while ago because of various reasons like this, replacing it with trace-dispatch for users that need more correctness

@topolarity
Copy link
Member Author

I'm not sure --trace-dispatch is a very good replacement for this use case

It doesn't help you to distinguish between whether you are dispatching to existing code or if you are compiling brand new code, which is very useful, e.g., for hunting down invalidations and caching bugs.

@topolarity
Copy link
Member Author

There's actually an additional wrinkle here, which is that these signatures may not be "compileable".

These work fine:

julia> precompile(Tuple{typeof(Base.tail), Tuple{Int}})
true
julia> precompile(Tuple{typeof(Base.isempty), Tuple{Base.SubString{String}}})
true
julia> precompile(Tuple{typeof(Base.haslength), Base.Dict{Int, Nothing}})
true

but these do not:

julia> precompile(Tuple{typeof(Base.tail), Tuple{Any}})
false
julia> precompile(Tuple{typeof(Base.isempty), Tuple{Vararg{Base.SubString{String}}}})
false
julia> precompile(Tuple{typeof(Base.haslength), Base.Dict{_A, Nothing} where _A})
false

The false means that we refused to pre-compile this, in particular because the JIT does not expect to be able to compile these abstract signatures. Its default specialization behavior says that they need to be more specific.

The irony is that the signatures are, in fact, easily compiled to the most precise inference / performant codegen that we have (a side-effect-free function w/ a Const(...) result), but precompile(...) has no way to know that until after it runs inference.

That said, even if precompile(...) did run on these, I don't think these are actually useful to cache right now. Inference can use the results, but dispatch / JIT won't look for code (or a const-return) attached to a widened signature like this. We can probably leave these non-"compileable" signatures out of the --trace-compile output for now, but I wanted to mention it for awareness.

@nsajko nsajko added the observability metrics, timing, understandability, reflection, logging, ... label May 24, 2025
@NHDaly
Copy link
Member

NHDaly commented May 27, 2025

Yes, leaving those out makes sense to me, since presumably they will be compiled (and cached?) as part of compiling whatever other functions reached them, during those other functions' compilations?

@topolarity
Copy link
Member Author

Yes, leaving those out makes sense to me, since presumably they will be compiled (and cached?) as part of compiling whatever other functions reached them, during those other functions' compilations?

Yeah, that sounds right to me. In principle, we only need to log the compiled functions that were also the target of dispatches (I think the most compact log would be an intersection of --trace-dispatch and --trace-compile, so that you only see 'roots' of compilation work and you only consider compilation that was done 'freshly' this session)

@NHDaly
Copy link
Member

NHDaly commented May 27, 2025

Agreed, except that I think --trace-compile should already be only a strict subset of --trace-dispatch, since I think we only log it during dispatch (except for the cases that you're identifying in this ticket, where we don't do it).

@vtjnash
Copy link
Member

vtjnash commented May 27, 2025

Right, I think this PR just makes --trace-compile a much worse version (unreliable, unstable, inaccurate) of --trace-dispatch. I say we just delete this option and call it done.

@KristofferC
Copy link
Member

KristofferC commented May 27, 2025

I say we just delete this option and call it done.

For some history, the printing of precompile signatures used to be done by defining the TRACE_COMPILE variable and was used to get some basic precompile signatures for the sysimage. In a series of PRs that were intended to improve Julia latency:

#28075
#28118
#28371
#28419

this evolved to the --trace-compile workflow we all know and love ;).

Now, the point of that is that the only reason --trace-compile is used is to generate signatures to improve latency. If more things should be included in it, then include more things in it. If it should alias --trace-dispatch for best behavior then do so. Arguing what is semantically compilation and not etc is not really useful, anything that is of use to reduce latency for a given workload should be in it. But there is no need to "delete" it (would be breaking), just improve it (or alias it).

@vtjnash
Copy link
Member

vtjnash commented May 27, 2025

I think at this point, all parts of all those PRs related to trace-compile have been removed due to the problems with trace-compile

@topolarity
Copy link
Member Author

topolarity commented May 27, 2025

Agreed, except that I think --trace-compile should already be only a strict subset of --trace-dispatch, since I think we only log it during dispatch (except for the cases that you're identifying in this ticket, where we don't do it).

I don't think that's the case - this ticket is making me realize that I can't precisely say what the definition of --trace-dispatch is either, but it sometimes reports elements that have already been compiled:

$ julia +nightly --trace-dispatch=stderr -e "Core.invoke(Base.getindex, Tuple{Vector{Int}, Int}, Int[1,2,3], 1)"
precompile(Tuple{typeof(Base.getindex), Array{Int64, 1}, Int64})
$ julia +nightly --trace-compile=stderr -e "Core.invoke(Base.getindex, Tuple{Vector{Int}, Int}, Int[1,2,3], 1)"
# no output

It also does not report some dynamic dispatches, depending on where there are found in the cache:

julia +nightly --trace-dispatch=stderr -e "v = Int[1,2,3]; (foo() = v[1]); foo()"
precompile(Tuple{typeof(Main.foo)})
# no entry for: precompile(Tuple{typeof(getindex), Vector{Int}, Int})

despite having a dynamic dispatch:

$ julia +nightly -e "using InteractiveUtils; v = Int[1,2,3]; (foo() = v[1]); display(@code_typed foo())"
CodeInfo(
1%1 = Main.v::Any%2 =   dynamic Base.getindex(%1, 1)::Any
└──      return %2
) => Any

That's not an issue for pre-compilation lists / reducing compilation latency, but it could be considered a bug if you are using --trace-dispatch to look for dynamic dispatches (do we care about that use case?). Anyway, I'll see what can be done to align these.

@KristofferC
Copy link
Member

I think at this point, all parts of all those PRs related to trace-compile have been removed

I wouldn't say "all" :P

hardcoded_precompile_statements = """
precompile(Base.unsafe_string, (Ptr{UInt8},))
precompile(Base.unsafe_string, (Ptr{Int8},))
# loading.jl - without these each precompile worker would precompile these because they're hit before pkgimages are loaded
precompile(Base.__require, (Module, Symbol))
precompile(Base.__require, (Base.PkgId,))
precompile(Base.indexed_iterate, (Pair{Symbol, Union{Nothing, String}}, Int))
precompile(Base.indexed_iterate, (Pair{Symbol, Union{Nothing, String}}, Int, Int))
precompile(Tuple{typeof(Base.Threads.atomic_add!), Base.Threads.Atomic{Int}, Int})
precompile(Tuple{typeof(Base.Threads.atomic_sub!), Base.Threads.Atomic{Int}, Int})
precompile(Tuple{Type{Pair{A, B} where B where A}, Base.PkgId, UInt128})
precompile(Tuple{typeof(Base.in!), Tuple{Module, String, UInt64, UInt32, Float64}, Base.Set{Any}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:allow_typevars, :volatile_inf_result), Tuple{Bool, Nothing}}, typeof(Base.Compiler.handle_match!), Array{Base.Compiler.InliningCase, 1}, Core.MethodMatch, Array{Any, 1}, Base.Compiler.CallInfo, UInt32, Base.Compiler.InliningState{Base.Compiler.NativeInterpreter}})
precompile(Tuple{typeof(Base.Compiler.ir_to_codeinf!), Base.Compiler.OptimizationState{Base.Compiler.NativeInterpreter}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:allow_typevars, :volatile_inf_result), Tuple{Bool, Base.Compiler.VolatileInferenceResult}}, typeof(Base.Compiler.handle_match!), Array{Base.Compiler.InliningCase, 1}, Core.MethodMatch, Array{Any, 1}, Base.Compiler.CallInfo, UInt32, Base.Compiler.InliningState{Base.Compiler.NativeInterpreter}})
precompile(Tuple{typeof(Base.getindex), Type{Pair{Base.PkgId, UInt128}}, Pair{Base.PkgId, UInt128}, Pair{Base.PkgId, UInt128}, Pair{Base.PkgId, UInt128}, Vararg{Pair{Base.PkgId, UInt128}}})
precompile(Tuple{typeof(Base.Compiler.ir_to_codeinf!), Base.Compiler.OptimizationState{Base.Compiler.NativeInterpreter}, Core.SimpleVector})
precompile(Tuple{typeof(Base.Compiler.ir_to_codeinf!), Base.Compiler.OptimizationState{Base.Compiler.NativeInterpreter}})
# LazyArtifacts (but more generally helpful)
precompile(Tuple{Type{Base.Val{x} where x}, Module})
precompile(Tuple{Type{NamedTuple{(:honor_overrides,), T} where T<:Tuple}, Tuple{Bool}})
precompile(Tuple{typeof(Base.unique!), Array{String, 1}})
precompile(Tuple{typeof(Base.vcat), Array{String, 1}, Array{String, 1}})
# Pkg loading
precompile(Tuple{typeof(Base.Filesystem.normpath), String, String, Vararg{String}})
precompile(Tuple{typeof(Base.append!), Array{String, 1}, Array{String, 1}})
precompile(Tuple{typeof(Base.join), Array{String, 1}, Char})
precompile(Tuple{typeof(Base.getindex), Base.Dict{Any, Any}, Char})
precompile(Tuple{typeof(Base.delete!), Base.Set{Any}, Char})
# REPL
precompile(isequal, (String, String))
precompile(Base.check_open, (Base.TTY,))
precompile(Base.getproperty, (Base.TTY, Symbol))
precompile(write, (Base.TTY, String))
precompile(Tuple{typeof(Base.get), Base.TTY, Symbol, Bool})
precompile(Tuple{typeof(Base.hashindex), String, Int})
precompile(Tuple{typeof(Base.write), Base.GenericIOBuffer{Array{UInt8, 1}}, String})
precompile(Tuple{typeof(Base.indexed_iterate), Tuple{Nothing, Int}, Int})
precompile(Tuple{typeof(Base.indexed_iterate), Tuple{Nothing, Int}, Int, Int})
precompile(Tuple{typeof(Base._typeddict), Base.Dict{String, Any}, Base.Dict{String, Any}, Vararg{Base.Dict{String, Any}}})
precompile(Tuple{typeof(Base.promoteK), Type, Base.Dict{String, Any}, Base.Dict{String, Any}})
precompile(Tuple{typeof(Base.promoteK), Type, Base.Dict{String, Any}})
precompile(Tuple{typeof(Base.promoteV), Type, Base.Dict{String, Any}, Base.Dict{String, Any}})
precompile(Tuple{typeof(Base.eval_user_input), Base.PipeEndpoint, Any, Bool})
precompile(Tuple{typeof(Base.get), Base.PipeEndpoint, Symbol, Bool})
# used by Revise.jl
precompile(Tuple{typeof(Base.parse_cache_header), String})
precompile(Base.read_dependency_src, (String, String))
# used by Requires.jl
precompile(Tuple{typeof(get!), Type{Vector{Function}}, Dict{Base.PkgId,Vector{Function}}, Base.PkgId})
precompile(Tuple{typeof(haskey), Dict{Base.PkgId,Vector{Function}}, Base.PkgId})
precompile(Tuple{typeof(delete!), Dict{Base.PkgId,Vector{Function}}, Base.PkgId})
precompile(Tuple{typeof(push!), Vector{Function}, Function})
# preferences
precompile(Base.get_preferences, (Base.UUID,))
precompile(Base.record_compiletime_preference, (Base.UUID, String))
# miscellaneous
precompile(Tuple{typeof(Base.exit)})
precompile(Tuple{typeof(Base.require), Base.PkgId})
precompile(Tuple{typeof(Base.recursive_prefs_merge), Base.Dict{String, Any}})
precompile(Tuple{typeof(Base.recursive_prefs_merge), Base.Dict{String, Any}, Base.Dict{String, Any}, Vararg{Base.Dict{String, Any}}})
precompile(Tuple{typeof(Base.hashindex), Tuple{Base.PkgId, Nothing}, Int})
precompile(Tuple{typeof(Base.hashindex), Tuple{Base.PkgId, String}, Int})
precompile(Tuple{typeof(isassigned), Core.SimpleVector, Int})
precompile(Tuple{typeof(getindex), Core.SimpleVector, Int})
precompile(Tuple{typeof(Base.Experimental.register_error_hint), Any, Type})
precompile(Tuple{typeof(Base.display_error), Base.ExceptionStack})
precompile(Tuple{Core.kwftype(typeof(Type)), NamedTuple{(:sizehint,), Tuple{Int}}, Type{IOBuffer}})
precompile(Base.CoreLogging.current_logger_for_env, (Base.CoreLogging.LogLevel, String, Module))
precompile(Base.CoreLogging.current_logger_for_env, (Base.CoreLogging.LogLevel, Symbol, Module))
precompile(Base.CoreLogging.env_override_minlevel, (Symbol, Module))
precompile(Base.StackTraces.lookup, (Ptr{Nothing},))
precompile(Tuple{typeof(Base.run_module_init), Module, Int})
precompile(Tuple{Type{Base.VersionNumber}, Int32, Int32, Int32})
# Presence tested in the tests
precompile(Tuple{typeof(Base.print), Base.IOStream, String})
# precompilepkgs
precompile(Tuple{typeof(Base.get), Type{Array{String, 1}}, Base.Dict{String, Any}, String})
precompile(Tuple{typeof(Base.get), Type{Base.Dict{String, Any}}, Base.Dict{String, Any}, String})
precompile(Tuple{typeof(Base.haskey), Base.Dict{String, Any}, String})
precompile(Tuple{typeof(Base.indexed_iterate), Tuple{Base.TTY, Bool}, Int, Int})
precompile(Tuple{typeof(Base.indexed_iterate), Tuple{Base.TTY, Bool}, Int})
precompile(Tuple{typeof(Base.open), Base.CmdRedirect, String, Base.TTY})
precompile(Tuple{typeof(Base.Precompilation.precompilepkgs)})
precompile(Tuple{typeof(Base.Precompilation.printpkgstyle), Base.TTY, Symbol, String})
precompile(Tuple{typeof(Base.rawhandle), Base.TTY})
precompile(Tuple{typeof(Base.setindex!), Base.Dict{String, Array{String, 1}}, Array{String, 1}, String})
precompile(Tuple{typeof(Base.setindex!), GenericMemory{:not_atomic, Union{Base.Libc.RawFD, Base.SyncCloseFD, IO}, Core.AddrSpace{Core}(0x00)}, Base.TTY, Int})
precompile(Tuple{typeof(Base.setup_stdio), Base.TTY, Bool})
precompile(Tuple{typeof(Base.spawn_opts_inherit), Base.DevNull, Base.TTY, Base.TTY})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:context,), Tuple{Base.TTY}}, typeof(Base.sprint), Function})
precompile(Tuple{Type{Base.UUID}, Base.UUID})
"""
for T in (Float16, Float32, Float64), IO in (IOBuffer, IOContext{IOBuffer}, Base.TTY, IOContext{Base.TTY})
global hardcoded_precompile_statements
hardcoded_precompile_statements *= "precompile(Tuple{typeof(show), $IO, $T})\n"
end
# Precompiles for Revise and other packages
precompile_script = """
for match = Base._methods(+, (Int, Int), -1, Base.get_world_counter())
m = match.method
delete!(push!(Set{Method}(), m), m)
copy(Core.Compiler.retrieve_code_info(Core.Compiler.specialize_method(match), typemax(UInt)))
break # only actually need to do this once
end
empty!(Set())
push!(push!(Set{Union{GlobalRef,Symbol}}(), :two), GlobalRef(Base, :two))
get!(ENV, "___DUMMY", "")
ENV["___DUMMY"]
delete!(ENV, "___DUMMY")
(setindex!(Dict{String,Base.PkgId}(), Base.PkgId(Base), "file.jl"))["file.jl"]
(setindex!(Dict{Symbol,Vector{Int}}(), [1], :two))[:two]
(setindex!(Dict{Base.PkgId,String}(), "file.jl", Base.PkgId(Base)))[Base.PkgId(Base)]
(setindex!(Dict{Union{GlobalRef,Symbol}, Vector{Int}}(), [1], :two))[:two]
(setindex!(IdDict{Type, Union{Missing, Vector{Tuple{LineNumberNode, Expr}}}}(), missing, Int))[Int]
Dict{Symbol, Union{Nothing, Bool, Symbol}}(:one => false)[:one]
Dict(Base => [:(1+1)])[Base]
Dict(:one => [1])[:one]
Dict("abc" => Set())["abc"]
pushfirst!([], sum)
get(Base.pkgorigins, Base.PkgId(Base), nothing)
sort!([1,2,3])
unique!([1,2,3])
cumsum([1,2,3])
append!(Int[], BitSet())
isempty(BitSet())
delete!(BitSet([1,2]), 3)
deleteat!(Int32[1,2,3], [1,3])
deleteat!(Any[1,2,3], [1,3])
Core.svec(1, 2) == Core.svec(3, 4)
any(t->t[1].line > 1, [(LineNumberNode(2,:none), :(1+1))])
# Code loading uses this
sortperm(mtime.(readdir(".")), rev=true)
# JLLWrappers uses these
Dict{Base.UUID,Set{String}}()[Base.UUID("692b3bcd-3c85-4b1f-b108-f13ce0eb3210")] = Set{String}()
get!(Set{String}, Dict{Base.UUID,Set{String}}(), Base.UUID("692b3bcd-3c85-4b1f-b108-f13ce0eb3210"))
eachindex(IndexLinear(), Expr[])
push!(Expr[], Expr(:return, false))
vcat(String[], String[])
k, v = (:hello => nothing)
Base.print_time_imports_report(Base)
Base.print_time_imports_report_init(Base)
# Preferences uses these
get(Dict{String,Any}(), "missing", nothing)
delete!(Dict{String,Any}(), "missing")
for (k, v) in Dict{String,Any}()
println(k)
end
# interactive startup uses this
write(IOBuffer(), "")
# precompile @time report generation and printing
@time @eval Base.Experimental.@force_compile
"""
julia_exepath() = joinpath(Sys.BINDIR, Base.julia_exename())
Artifacts = get(Base.loaded_modules,
Base.PkgId(Base.UUID("56f22d72-fd6d-98f1-02f0-08ddc0907c33"), "Artifacts"),
nothing)
if Artifacts !== nothing
precompile_script *= """
using Artifacts, Base.BinaryPlatforms, Libdl
artifacts_toml = abspath(joinpath(Sys.STDLIB, "Artifacts", "test", "Artifacts.toml"))
artifact_hash("HelloWorldC", artifacts_toml)
artifacts = Artifacts.load_artifacts_toml(artifacts_toml)
platforms = [Artifacts.unpack_platform(e, "HelloWorldC", artifacts_toml) for e in artifacts["HelloWorldC"]]
best_platform = select_platform(Dict(p => triplet(p) for p in platforms))
if best_platform !== nothing
# @artifact errors for unsupported platforms
oldpwd = pwd(); cd(dirname(artifacts_toml))
macroexpand(Main, :(@artifact_str("HelloWorldC")))
cd(oldpwd)
end
dlopen("libjulia$(Base.isdebugbuild() ? "-debug" : "")", RTLD_LAZY | RTLD_DEEPBIND)
"""
hardcoded_precompile_statements *= """
precompile(Tuple{typeof(Artifacts._artifact_str), Module, String, Base.SubString{String}, String, Base.Dict{String, Any}, Base.SHA1, Base.BinaryPlatforms.Platform, Base.Val{Artifacts}})
precompile(Tuple{typeof(Base.tryparse), Type{Base.BinaryPlatforms.Platform}, String})
"""
end
FileWatching = get(Base.loaded_modules,
Base.PkgId(Base.UUID("7b1f6079-737a-58dc-b8bc-7a2ca5c1b5ee"), "FileWatching"),
nothing)
if FileWatching !== nothing
hardcoded_precompile_statements *= """
precompile(Tuple{typeof(FileWatching.watch_file), String, Float64})
precompile(Tuple{typeof(FileWatching.watch_file), String, Int})
precompile(Tuple{typeof(FileWatching._uv_hook_close), FileWatching.FileMonitor})
"""

@NHDaly
Copy link
Member

NHDaly commented Jun 2, 2025

Huh, thanks Cody. Yeah i think with --trace-dispatch i would expect to find all the dynamic dispatches 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
latency Latency observability metrics, timing, understandability, reflection, logging, ...
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants