Skip to content

Commit 2f2ea35

Browse files
authored
[Serialization] No transitive identifier change (llvm#92085)
Following of llvm#92083 The motivation is still cutting of the unnecessary change in the dependency chain. See the above link (recursively) for details. After this patch, (and the above patch), we can already do something pretty interesting. For example, #### Motivation example ``` //--- m-partA.cppm export module m:partA; export inline int getA() { return 43; } export class A { public: int getMem(); }; export template <typename T> class ATempl { public: T getT(); }; //--- m-partA.v1.cppm export module m:partA; export inline int getA() { return 43; } // Now we add a new declaration without introducing a new type. // The consuming module which didn't use m:partA completely is expected to be // not changed. export inline int getA2() { return 88; } export class A { public: int getMem(); // Now we add a new declaration without introducing a new type. // The consuming module which didn't use m:partA completely is expected to be // not changed. int getMem2(); }; export template <typename T> class ATempl { public: T getT(); // Add a new declaration without introducing a new type. T getT2(); }; //--- m-partB.cppm export module m:partB; export inline int getB() { return 430; } //--- m.cppm export module m; export import :partA; export import :partB; //--- useBOnly.cppm export module useBOnly; import m; export inline int get() { return getB(); } ``` In this example, module `m` exports two partitions `:partA` and `:partB`. And a consumer `useBOnly` only consumes the entities from `:partB`. So we don't hope the BMI of `useBOnly` changes if only `:partA` changes. After this patch, we can make it if the change of `:partA` doesn't introduce new types. (And we can get rid of this if we make no-transitive-type-change). As the example shows, when we change the implementation of `:partA` from `m-partA.cppm` to `m-partA.v1.cppm`, we add new function declaration `getA2()` at the global namespace, add a new member function `getMem2()` to class `A` and add a new member function to `getT2()` to class template `ATempl`. And since `:partA` is not used by `useBOnly` completely, the BMI of `useBOnly` won't change after we made above changes. #### Design details Method used in this patch is similar with llvm#92083 and llvm#86912. It extends the 32 bit IdentifierID to 64 bits and use the higher 32 bits to store the module file index. So that the encoding of the identifier won't get affected by other modules. #### Overhead Similar with llvm#92083 and llvm#86912. The change is only expected to increase the size of the on-disk .pcm files and not affect the compile-time performances. And from my experiment, the size of the on-disk change only increase 1%+ and observe no compile-time impacts. #### Future Plans I'll try to do the same thing for type ids. IIRC, it won't change the dependency graph if we add a new type in an unused units. I do think this is a significant win. And this will be a pretty good answer to "why modules are better than headers."
1 parent 5ef02d9 commit 2f2ea35

File tree

9 files changed

+210
-82
lines changed

9 files changed

+210
-82
lines changed

clang/include/clang/Lex/ExternalPreprocessorSource.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ class ExternalPreprocessorSource {
3636
/// Return the identifier associated with the given ID number.
3737
///
3838
/// The ID 0 is associated with the NULL identifier.
39-
virtual IdentifierInfo *GetIdentifier(unsigned ID) = 0;
39+
virtual IdentifierInfo *GetIdentifier(uint64_t ID) = 0;
4040

4141
/// Map a module ID to a module.
4242
virtual Module *getModule(unsigned ModuleID) = 0;

clang/include/clang/Serialization/ASTBitCodes.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ const unsigned VERSION_MINOR = 1;
5959
///
6060
/// The ID numbers of identifiers are consecutive (in order of discovery)
6161
/// and start at 1. 0 is reserved for NULL.
62-
using IdentifierID = uint32_t;
62+
using IdentifierID = uint64_t;
6363

6464
/// The number of predefined identifier IDs.
6565
const unsigned int NUM_PREDEF_IDENT_IDS = 1;

clang/include/clang/Serialization/ASTReader.h

+8-11
Original file line numberDiff line numberDiff line change
@@ -661,14 +661,6 @@ class ASTReader
661661
/// been loaded.
662662
std::vector<IdentifierInfo *> IdentifiersLoaded;
663663

664-
using GlobalIdentifierMapType =
665-
ContinuousRangeMap<serialization::IdentifierID, ModuleFile *, 4>;
666-
667-
/// Mapping from global identifier IDs to the module in which the
668-
/// identifier resides along with the offset that should be added to the
669-
/// global identifier ID to produce a local ID.
670-
GlobalIdentifierMapType GlobalIdentifierMap;
671-
672664
/// A vector containing macros that have already been
673665
/// loaded.
674666
///
@@ -1547,6 +1539,11 @@ class ASTReader
15471539
/// Translate a \param GlobalDeclID to the index of DeclsLoaded array.
15481540
unsigned translateGlobalDeclIDToIndex(GlobalDeclID ID) const;
15491541

1542+
/// Translate an \param IdentifierID ID to the index of IdentifiersLoaded
1543+
/// array and the corresponding module file.
1544+
std::pair<ModuleFile *, unsigned>
1545+
translateIdentifierIDToIndex(serialization::IdentifierID ID) const;
1546+
15501547
public:
15511548
/// Load the AST file and validate its contents against the given
15521549
/// Preprocessor.
@@ -2131,7 +2128,7 @@ class ASTReader
21312128
/// Load a selector from disk, registering its ID if it exists.
21322129
void LoadSelector(Selector Sel);
21332130

2134-
void SetIdentifierInfo(unsigned ID, IdentifierInfo *II);
2131+
void SetIdentifierInfo(serialization::IdentifierID ID, IdentifierInfo *II);
21352132
void SetGloballyVisibleDecls(IdentifierInfo *II,
21362133
const SmallVectorImpl<GlobalDeclID> &DeclIDs,
21372134
SmallVectorImpl<Decl *> *Decls = nullptr);
@@ -2158,10 +2155,10 @@ class ASTReader
21582155
return DecodeIdentifierInfo(ID);
21592156
}
21602157

2161-
IdentifierInfo *getLocalIdentifier(ModuleFile &M, unsigned LocalID);
2158+
IdentifierInfo *getLocalIdentifier(ModuleFile &M, uint64_t LocalID);
21622159

21632160
serialization::IdentifierID getGlobalIdentifierID(ModuleFile &M,
2164-
unsigned LocalID);
2161+
uint64_t LocalID);
21652162

21662163
void resolvePendingMacro(IdentifierInfo *II, const PendingMacroInfo &PMInfo);
21672164

clang/include/clang/Serialization/ModuleFile.h

-3
Original file line numberDiff line numberDiff line change
@@ -310,9 +310,6 @@ class ModuleFile {
310310
/// Base identifier ID for identifiers local to this module.
311311
serialization::IdentifierID BaseIdentifierID = 0;
312312

313-
/// Remapping table for identifier IDs in this module.
314-
ContinuousRangeMap<uint32_t, int, 2> IdentifierRemap;
315-
316313
/// Actual data for the on-disk hash table of identifiers.
317314
///
318315
/// This pointer points into a memory buffer, where the on-disk hash

clang/lib/Serialization/ASTReader.cpp

+51-45
Original file line numberDiff line numberDiff line change
@@ -954,7 +954,7 @@ ASTSelectorLookupTrait::ReadKey(const unsigned char* d, unsigned) {
954954
SelectorTable &SelTable = Reader.getContext().Selectors;
955955
unsigned N = endian::readNext<uint16_t, llvm::endianness::little>(d);
956956
const IdentifierInfo *FirstII = Reader.getLocalIdentifier(
957-
F, endian::readNext<uint32_t, llvm::endianness::little>(d));
957+
F, endian::readNext<IdentifierID, llvm::endianness::little>(d));
958958
if (N == 0)
959959
return SelTable.getNullarySelector(FirstII);
960960
else if (N == 1)
@@ -964,7 +964,7 @@ ASTSelectorLookupTrait::ReadKey(const unsigned char* d, unsigned) {
964964
Args.push_back(FirstII);
965965
for (unsigned I = 1; I != N; ++I)
966966
Args.push_back(Reader.getLocalIdentifier(
967-
F, endian::readNext<uint32_t, llvm::endianness::little>(d)));
967+
F, endian::readNext<IdentifierID, llvm::endianness::little>(d)));
968968

969969
return SelTable.getSelector(N, Args.data());
970970
}
@@ -1047,7 +1047,8 @@ static bool readBit(unsigned &Bits) {
10471047
IdentifierID ASTIdentifierLookupTrait::ReadIdentifierID(const unsigned char *d) {
10481048
using namespace llvm::support;
10491049

1050-
unsigned RawID = endian::readNext<uint32_t, llvm::endianness::little>(d);
1050+
IdentifierID RawID =
1051+
endian::readNext<IdentifierID, llvm::endianness::little>(d);
10511052
return Reader.getGlobalIdentifierID(F, RawID >> 1);
10521053
}
10531054

@@ -1065,9 +1066,12 @@ IdentifierInfo *ASTIdentifierLookupTrait::ReadData(const internal_key_type& k,
10651066
unsigned DataLen) {
10661067
using namespace llvm::support;
10671068

1068-
unsigned RawID = endian::readNext<uint32_t, llvm::endianness::little>(d);
1069+
IdentifierID RawID =
1070+
endian::readNext<IdentifierID, llvm::endianness::little>(d);
10691071
bool IsInteresting = RawID & 0x01;
10701072

1073+
DataLen -= sizeof(IdentifierID);
1074+
10711075
// Wipe out the "is interesting" bit.
10721076
RawID = RawID >> 1;
10731077

@@ -1098,7 +1102,7 @@ IdentifierInfo *ASTIdentifierLookupTrait::ReadData(const internal_key_type& k,
10981102
bool HadMacroDefinition = readBit(Bits);
10991103

11001104
assert(Bits == 0 && "Extra bits in the identifier?");
1101-
DataLen -= 8;
1105+
DataLen -= sizeof(uint16_t) * 2;
11021106

11031107
// Set or check the various bits in the IdentifierInfo structure.
11041108
// Token IDs are read-only.
@@ -1225,7 +1229,7 @@ ASTDeclContextNameLookupTrait::ReadKey(const unsigned char *d, unsigned) {
12251229
case DeclarationName::CXXLiteralOperatorName:
12261230
case DeclarationName::CXXDeductionGuideName:
12271231
Data = (uint64_t)Reader.getLocalIdentifier(
1228-
F, endian::readNext<uint32_t, llvm::endianness::little>(d));
1232+
F, endian::readNext<IdentifierID, llvm::endianness::little>(d));
12291233
break;
12301234
case DeclarationName::ObjCZeroArgSelector:
12311235
case DeclarationName::ObjCOneArgSelector:
@@ -2093,7 +2097,7 @@ HeaderFileInfoTrait::ReadData(internal_key_ref key, const unsigned char *d,
20932097
HFI.DirInfo = (Flags >> 1) & 0x07;
20942098
HFI.IndexHeaderMapHeader = Flags & 0x01;
20952099
HFI.LazyControllingMacro = Reader.getGlobalIdentifierID(
2096-
M, endian::readNext<uint32_t, llvm::endianness::little>(d));
2100+
M, endian::readNext<IdentifierID, llvm::endianness::little>(d));
20972101
if (unsigned FrameworkOffset =
20982102
endian::readNext<uint32_t, llvm::endianness::little>(d)) {
20992103
// The framework offset is 1 greater than the actual offset,
@@ -3467,24 +3471,11 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
34673471
"duplicate IDENTIFIER_OFFSET record in AST file");
34683472
F.IdentifierOffsets = (const uint32_t *)Blob.data();
34693473
F.LocalNumIdentifiers = Record[0];
3470-
unsigned LocalBaseIdentifierID = Record[1];
34713474
F.BaseIdentifierID = getTotalNumIdentifiers();
34723475

3473-
if (F.LocalNumIdentifiers > 0) {
3474-
// Introduce the global -> local mapping for identifiers within this
3475-
// module.
3476-
GlobalIdentifierMap.insert(std::make_pair(getTotalNumIdentifiers() + 1,
3477-
&F));
3478-
3479-
// Introduce the local -> global mapping for identifiers within this
3480-
// module.
3481-
F.IdentifierRemap.insertOrReplace(
3482-
std::make_pair(LocalBaseIdentifierID,
3483-
F.BaseIdentifierID - LocalBaseIdentifierID));
3484-
3476+
if (F.LocalNumIdentifiers > 0)
34853477
IdentifiersLoaded.resize(IdentifiersLoaded.size()
34863478
+ F.LocalNumIdentifiers);
3487-
}
34883479
break;
34893480
}
34903481

@@ -4089,7 +4080,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
40894080
F.ModuleOffsetMap = StringRef();
40904081

40914082
using RemapBuilder = ContinuousRangeMap<uint32_t, int, 2>::Builder;
4092-
RemapBuilder IdentifierRemap(F.IdentifierRemap);
40934083
RemapBuilder MacroRemap(F.MacroRemap);
40944084
RemapBuilder PreprocessedEntityRemap(F.PreprocessedEntityRemap);
40954085
RemapBuilder SubmoduleRemap(F.SubmoduleRemap);
@@ -4122,8 +4112,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
41224112

41234113
ImportedModuleVector.push_back(OM);
41244114

4125-
uint32_t IdentifierIDOffset =
4126-
endian::readNext<uint32_t, llvm::endianness::little>(Data);
41274115
uint32_t MacroIDOffset =
41284116
endian::readNext<uint32_t, llvm::endianness::little>(Data);
41294117
uint32_t PreprocessedEntityIDOffset =
@@ -4143,7 +4131,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
41434131
static_cast<int>(BaseOffset - Offset)));
41444132
};
41454133

4146-
mapOffset(IdentifierIDOffset, OM->BaseIdentifierID, IdentifierRemap);
41474134
mapOffset(MacroIDOffset, OM->BaseMacroID, MacroRemap);
41484135
mapOffset(PreprocessedEntityIDOffset, OM->BasePreprocessedEntityID,
41494136
PreprocessedEntityRemap);
@@ -8238,7 +8225,6 @@ LLVM_DUMP_METHOD void ASTReader::dump() {
82388225
dumpModuleIDMap("Global bit offset map", GlobalBitOffsetsMap);
82398226
dumpModuleIDMap("Global source location entry map", GlobalSLocEntryMap);
82408227
dumpModuleIDMap("Global type map", GlobalTypeMap);
8241-
dumpModuleIDMap("Global identifier map", GlobalIdentifierMap);
82428228
dumpModuleIDMap("Global macro map", GlobalMacroMap);
82438229
dumpModuleIDMap("Global submodule map", GlobalSubmoduleMap);
82448230
dumpModuleIDMap("Global selector map", GlobalSelectorMap);
@@ -8878,8 +8864,9 @@ void ASTReader::LoadSelector(Selector Sel) {
88788864

88798865
void ASTReader::SetIdentifierInfo(IdentifierID ID, IdentifierInfo *II) {
88808866
assert(ID && "Non-zero identifier ID required");
8881-
assert(ID <= IdentifiersLoaded.size() && "identifier ID out of range");
8882-
IdentifiersLoaded[ID - 1] = II;
8867+
unsigned Index = translateIdentifierIDToIndex(ID).second;
8868+
assert(Index < IdentifiersLoaded.size() && "identifier ID out of range");
8869+
IdentifiersLoaded[Index] = II;
88838870
if (DeserializationListener)
88848871
DeserializationListener->IdentifierRead(ID, II);
88858872
}
@@ -8932,6 +8919,22 @@ void ASTReader::SetGloballyVisibleDecls(
89328919
}
89338920
}
89348921

8922+
std::pair<ModuleFile *, unsigned>
8923+
ASTReader::translateIdentifierIDToIndex(IdentifierID ID) const {
8924+
if (ID == 0)
8925+
return {nullptr, 0};
8926+
8927+
unsigned ModuleFileIndex = ID >> 32;
8928+
unsigned LocalID = ID & llvm::maskTrailingOnes<IdentifierID>(32);
8929+
8930+
assert(ModuleFileIndex && "not translating loaded IdentifierID?");
8931+
assert(getModuleManager().size() > ModuleFileIndex - 1);
8932+
8933+
ModuleFile &MF = getModuleManager()[ModuleFileIndex - 1];
8934+
assert(LocalID < MF.LocalNumIdentifiers);
8935+
return {&MF, MF.BaseIdentifierID + LocalID};
8936+
}
8937+
89358938
IdentifierInfo *ASTReader::DecodeIdentifierInfo(IdentifierID ID) {
89368939
if (ID == 0)
89378940
return nullptr;
@@ -8941,45 +8944,48 @@ IdentifierInfo *ASTReader::DecodeIdentifierInfo(IdentifierID ID) {
89418944
return nullptr;
89428945
}
89438946

8944-
ID -= 1;
8945-
if (!IdentifiersLoaded[ID]) {
8946-
GlobalIdentifierMapType::iterator I = GlobalIdentifierMap.find(ID + 1);
8947-
assert(I != GlobalIdentifierMap.end() && "Corrupted global identifier map");
8948-
ModuleFile *M = I->second;
8949-
unsigned Index = ID - M->BaseIdentifierID;
8947+
auto [M, Index] = translateIdentifierIDToIndex(ID);
8948+
if (!IdentifiersLoaded[Index]) {
8949+
assert(M != nullptr && "Untranslated Identifier ID?");
8950+
assert(Index >= M->BaseIdentifierID);
8951+
unsigned LocalIndex = Index - M->BaseIdentifierID;
89508952
const unsigned char *Data =
8951-
M->IdentifierTableData + M->IdentifierOffsets[Index];
8953+
M->IdentifierTableData + M->IdentifierOffsets[LocalIndex];
89528954

89538955
ASTIdentifierLookupTrait Trait(*this, *M);
89548956
auto KeyDataLen = Trait.ReadKeyDataLength(Data);
89558957
auto Key = Trait.ReadKey(Data, KeyDataLen.first);
89568958
auto &II = PP.getIdentifierTable().get(Key);
8957-
IdentifiersLoaded[ID] = &II;
8959+
IdentifiersLoaded[Index] = &II;
89588960
markIdentifierFromAST(*this, II);
89598961
if (DeserializationListener)
8960-
DeserializationListener->IdentifierRead(ID + 1, &II);
8962+
DeserializationListener->IdentifierRead(ID, &II);
89618963
}
89628964

8963-
return IdentifiersLoaded[ID];
8965+
return IdentifiersLoaded[Index];
89648966
}
89658967

8966-
IdentifierInfo *ASTReader::getLocalIdentifier(ModuleFile &M, unsigned LocalID) {
8968+
IdentifierInfo *ASTReader::getLocalIdentifier(ModuleFile &M, uint64_t LocalID) {
89678969
return DecodeIdentifierInfo(getGlobalIdentifierID(M, LocalID));
89688970
}
89698971

8970-
IdentifierID ASTReader::getGlobalIdentifierID(ModuleFile &M, unsigned LocalID) {
8972+
IdentifierID ASTReader::getGlobalIdentifierID(ModuleFile &M, uint64_t LocalID) {
89718973
if (LocalID < NUM_PREDEF_IDENT_IDS)
89728974
return LocalID;
89738975

89748976
if (!M.ModuleOffsetMap.empty())
89758977
ReadModuleOffsetMap(M);
89768978

8977-
ContinuousRangeMap<uint32_t, int, 2>::iterator I
8978-
= M.IdentifierRemap.find(LocalID - NUM_PREDEF_IDENT_IDS);
8979-
assert(I != M.IdentifierRemap.end()
8980-
&& "Invalid index into identifier index remap");
8979+
unsigned ModuleFileIndex = LocalID >> 32;
8980+
LocalID &= llvm::maskTrailingOnes<IdentifierID>(32);
8981+
ModuleFile *MF =
8982+
ModuleFileIndex ? M.TransitiveImports[ModuleFileIndex - 1] : &M;
8983+
assert(MF && "malformed identifier ID encoding?");
89818984

8982-
return LocalID + I->second;
8985+
if (!ModuleFileIndex)
8986+
LocalID -= NUM_PREDEF_IDENT_IDS;
8987+
8988+
return ((IdentifierID)(MF->Index + 1) << 32) | LocalID;
89838989
}
89848990

89858991
MacroInfo *ASTReader::getMacro(MacroID ID) {

0 commit comments

Comments
 (0)