-
Notifications
You must be signed in to change notification settings - Fork 199
[Incremental] Optimize incremental build logic by interning strings #800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please test macos platform |
d09932f
to
5a33277
Compare
@swift-ci please test |
rdar://75386359 |
Relative savings on some test: |
@swift-ci please test |
let index: Int | ||
let table: InternedStringTable | ||
|
||
fileprivate init(_ s: String, host: ModuleDependencyGraph) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of the module dependency graph here is confusing.
Why should a generic interned string have any connection to an incremental-build-related data structure?
I think it should be the String cache entity itself that we rely on here, not where that cache is meant to live (ModuleDependencyGraph
).
@@ -55,12 +67,15 @@ extension DependencySource { | |||
/// Throws if a read error | |||
/// Returns nil if no dependency info there. | |||
public func read( | |||
info: IncrementalCompilationState.IncrementalDependencyAndInputSetup | |||
info: IncrementalCompilationState.IncrementalDependencyAndInputSetup, | |||
host: ModuleDependencyGraph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now often have the use of host
throughout that I can't wrap my head around by locally looking at the code. From the rest of this PR I gather that we are passing around the dependency graph because that's where the interned string cache lives, but the name host
doesn't tell me that. Perhaps we can pass around a reference to the actual string cache to where it is needed?
} | ||
|
||
/// Hardcode empty as 0 | ||
public class InternedStringTable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels very close to our existing PathCache
implementation (https://github.com/apple/swift-driver/blob/main/Sources/SwiftDriver/Utilities/VirtualPath.swift#L325).
- Is this functionality we can common-out or maybe model this approach after the already-existing one?
- Why do we not require synchronization in this case? If this table lives in the module-dependency-graph, can't we be accessing that concurrently after compilation jobs finish?
- With filepaths, we have one global cache with synchronized accesses, whereas here we are relying on the specific instance of the string cache/table to be specified whenever we are dealing with interned strings. I wonder why we can't rely on a global cache here as well, much like
VirtualPath
? That way we won't have to tie this to the dependency graph and plumb the graph to all the use-sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not require synchronization in this case?
In theory, all usages of this class should be as thread safe as the calling context. Integration is currently a serial process in all of the mutating paths, but that's a somewhat ad-hoc thing at the moment in that it's only really being enforced manually by IncrementalCompilationState
. In practice, there will be interners on both the main queue and on the private queue maintained by IncrementalCompilationState
which could lead to data races on the cache.
What this suggests to me is that we need to make it structurally impossible for anything but serialization and the module dependency graph to mutate these things, which is currently not the case with this API. Where that is not possible, we should be asserting that accesses are occurring on the expected queue (dispatchPrecondition
can help).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the refactor I did before set up IncrementalCompilationState
as the serialization point for the graph. Maybe the table should live there instead of the graph, and be protected there?
I worry that dispatchPrecondition
on every access could damage performance. How does its overhead compare to a dictionary-keyed-by-string lookup?
public var isEmpty: Bool { index == 0 } | ||
|
||
// PRIVATE? | ||
init(index: Int, table: InternedStringTable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think we should allow this to be used publicly.
444b6e1
to
ba887d1
Compare
Driver CPU times measured by Instruments, on a 1000+ source-file project, 3 runs each clean build baseline: 20.65, 20.67, 20.08 incremental build baseline: 3.63, 3.46, 3.63 |
]) | ||
self.abbreviate(.moduleDepGraphNode, [ | ||
.literal(RecordID.moduleDepGraphNode.rawValue), | ||
let dependencyKeyOperands: [Bitstream.Abbreviation.Operand] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this factorization relevant to the patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code duplicated this information, and yes, I could have done the patch by keeping the duplication. Since the patch involved changing the content of the serialized key, the factorization contributed to the quality of the patch by centralizing the invariant around what was needed in the file.
Someday, I'd love to further centralize so that the structure, the code to read, and the code to write each type of node were factored together.
In the meantime, the round-trip test you wrote provides a way to catch many of the potential bugs.
let s = try internedString(field: i) | ||
return s.isEmpty ? nil : s | ||
} | ||
func dependencyKey(fields: (kindCode: Int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The use of a tuple makes the call sites opaque. Perhaps this should be unpacked back out into separate operands in the signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easy enough to fix!
|
||
extension InternedString { | ||
public static func < (lhs: InternedString, rhs: InternedString) -> Bool { | ||
lhs.index < rhs.index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem correct. And if it is, it's a very surprising implementation of comparison for this type. I would expect that we would force callers to go through isInIncreasingOrder
et al.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a holdover. It's gone.
return r | ||
} | ||
|
||
public func lookup(`in` holder: InternedStringTableHolder) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: in
does not need to be escaped here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done!
|
||
public init() {} | ||
|
||
fileprivate func index(_ s: String) -> Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is a mutating operation with a name that does not convey as much. I would expect this to be named intern
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no prob.
Some general comments:
|
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
import Foundation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I don't believe we need this import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Gone!
var strings = [""] | ||
fileprivate var indices = ["": 0] | ||
|
||
public init() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also benchmark the high water mark for this table and see if reserveCapacity
on both collections helps here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean to save the table size in the priors, and reserve it when reading the priors. Great idea! I'll do that.
@swift-ci please test |
d285529
to
e4047fb
Compare
@swift-ci please test |
/// Any instance that can find the confinmentQueue for the incremental compilation state. | ||
/// By confirming, a type gains the use of the shorthand methods in the extension. | ||
public protocol IncrementalCompilationSynchronizer { | ||
var incrementalCompilationQueue: DispatchQueue {get} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just have the table own a confinement queue, or be accessible only from behind a class (say the module dependency graph) that protects it with a confinement queue? By introducing this extra axis of customizability, the it's harder to reason about the concurrency guarantees for the table since the queue it is accessed from could be (but, in this patch, is not after manual inspection) distinct from the one it is initialized with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, an earlier version of this PR required a module dependency graph to be passed in to every intern / un-intern operation. IIRC, @artemcm correctly thought that code was a bit unclear. I can do away with the protocol if that would help. Again, for the sake of realizing the performance gains, I would love to find an architecture that would be above threshold for you. What can you suggest? I can try going back to the string-table hidden in the module dependency graph, but then operations such as reading a source file dependency graph for testing, or reading the priors will have to start from an empty module dependency graph. Not the worst thing in the world, but also a source of awkwardness, passing in a whole graph just to gain the ability to intern strings.
public var isEmpty: Bool { index == 0 } | ||
|
||
// PRIVATE? | ||
init(index: Int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this must be private, or the handle needs to become an opaque type a la VirtualPath.Handle. Callers should never be able to construct these things without going through the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reading a serialized ModuleDependencyGraph
, this PR saves work by using the indices of identifiers in the serialized graph as the indices into the InternedStringTable
. So, there must be a bridge from the method in the ModuleDependencyGraph.swift code into this method, because the reader must take an index and turn it into an InternedString
.
Can you suggest something better? Something like a C++ friend
would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR saves work by using the indices of identifiers in the serialized graph as the indices into the InternedStringTable
I'm not sure it is worth tying these concepts together? It is not much work to have a table that just increments an index for every added string. And tying creation of interned strings to the table has a benefit of this mechanism always being explicit so that clients only have one way to deal with this, as Robert suggests.
It would also simplify this patch a bit, this doesn't seem like the kind of change that calls for modifying the serialization format in addition to adding string interning and changing the synchronization mechanisms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that ModuleDependencyGraph/deserialize/Visitor
already has a table it is building up, so we would be able to make an interned string's index to be private
and rid of this init
.
} | ||
|
||
/// Block any threads from mutating `ProtectedState` | ||
public func blockingConcurrentMutation<R>( | ||
/// Allow concurrent access to while preventing mutation of ``IncrementalCompilationState/protectedState`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: sentence fragment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the in-source link here isn't valid. DocC doesn't support (at least, ergonomically) linking to instance variables like this. A plain protectedState
will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done!
|
||
/// State needed for incremental compilation that can change during a run and must be protected from | ||
/// concurrent mutation and access. Concurrent accesses are OK. | ||
private var protectedState: ProtectedState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The protected state should not have to include info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
fileprivate func checkAccess() { | ||
dispatchPrecondition(condition: .onQueue(confinmentQueue)) | ||
} | ||
} | ||
|
||
// MARK: - 2nd wave | ||
extension IncrementalCompilationState.ProtectedState { | ||
mutating func collectBatchedJobsDiscoveredToBeNeededAfterFinishing( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mark this fileprivate so it cannot be accessed without going through the confinement queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
Do you have numbers here? The confinement queues shouldn't add that much overhead, but it should still appear in traces. |
Fair question: Just to be sure remeasured, using Instruments and filtering the tree on symbols matching "dispatchPrecondition", inverting the call tree.
|
6c2001a
to
b674cdf
Compare
@swift-ci please test |
@swift-ci please test |
|
||
/// A filename from another module | ||
/*@_spi(Testing)*/ final public class ExternalDependency: Hashable, Comparable, CustomStringConvertible { | ||
|
||
|
||
/// Delay computing the path as an optimization. | ||
let fileName: String | ||
let fileName: InternedString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we don't really need the InternedString
here if we must have its raw value stored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I interpret Xcode's callers facility correctly, the fileName
is used in lots of places. Storing a String
here would add some amount of overhead, as well as needing the string table at the use point.
default: break | ||
} | ||
|
||
/// Preserves the ordering that obtained before interned strings were introduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reader of this code probably shouldn't care too much about the context behind introducing this versus why is it that we need this ordering in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to address the concern here.
/// Preserves the ordering that obtained before interned strings were introduced. | ||
func kindOrdering(_ d: DependencyKey.Designator) -> Int { | ||
switch d { | ||
case .topLevel: return 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you can make Designator
conform to Comparable
and implement this there to avoid exposing this kind of detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is needing the string table as a parameter to the less-than function.
@@ -606,6 +657,7 @@ extension ModuleDependencyGraph { | |||
case malformedIdentifierRecord | |||
case malformedModuleDepGraphNodeRecord | |||
case malformedDependsOnRecord | |||
case malforedUseIDRecord |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case malforedUseIDRecord | |
case malformedUseIDRecord |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
case .dependsOnNode: | ||
self = .malformedDependsOnRecord | ||
case .useIDNode: | ||
self = .malforedUseIDRecord |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self = .malforedUseIDRecord | |
self = .malformedUseIDRecord |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed or rendered moot.
public var isEmpty: Bool { index == 0 } | ||
|
||
// PRIVATE? | ||
init(index: Int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR saves work by using the indices of identifiers in the serialized graph as the indices into the InternedStringTable
I'm not sure it is worth tying these concepts together? It is not much work to have a table that just increments an index for every added string. And tying creation of interned strings to the table has a benefit of this mechanism always being explicit so that clients only have one way to deal with this, as Robert suggests.
It would also simplify this patch a bit, this doesn't seem like the kind of change that calls for modifying the serialization format in addition to adding string interning and changing the synchronization mechanisms.
public var isEmpty: Bool { index == 0 } | ||
|
||
// PRIVATE? | ||
init(index: Int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that ModuleDependencyGraph/deserialize/Visitor
already has a table it is building up, so we would be able to make an interned string's index to be private
and rid of this init
.
After an offline discussion, we agreed it would be robust enough to make the |
@swift-ci please test |
Also helps rdar://79192750 |
No description provided.