Skip to content

[libclang] Add API to generate a reproducer for the explicitly-built modules. #10577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: next
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions clang/include/clang-c/Dependencies.h
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,38 @@ CINDEX_LINKAGE void
clang_experimental_DependencyScannerWorkerScanSettings_dispose(
CXDependencyScannerWorkerScanSettings);

/**
* Generates a reproducer to compile a requested file with required modules.
*
* Here the reproducer means the required input data with the commands to
* compile intermediate modules and a requested file. Required intermediate
* modules and the order of their compilation are determined by the function
* and don't need to be provided.
*
* \param argc the number of compiler invocation arguments (including argv[0]).
* \param argv the compiler driver invocation arguments (including argv[0]).
* \param ModuleName If non-NULL, reproduce building the named module and all
* the intermediate modules. Otherwise, reproduce building
* the whole translation unit.
* \param WorkingDirectory the directory in which the invocation runs.
* \param ReproducerLocation the directory where to store the reproducer files.
* If NULL, use a temporary location.
* \param UseUniqueReproducerName if reproducer files should have unique names
* to avoid collisions with existing files.
* \param [out] MessageOut A pointer to store the human-readable message
* describing the result of the operation. If non-NULL,
* owned and should be disposed by the caller.
*
* \returns \c CXError_Success on success; otherwise a non-zero \c CXErrorCode
* indicating the kind of error. \p MessageOut is guaranteed to be populated
* for a success case but is allowed to be empty when encountered an error.
*/
CINDEX_LINKAGE enum CXErrorCode
clang_experimental_DependencyScanner_generateReproducer(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should use CXDependencyScannerWorkerScanSettings instead of int argc, const char *const *argv, const char *ModuleName, const char *WorkingDirectory for the basic compilation setup.

How sure are we that const char *ReproducerLocation, bool UseUniqueReproducerName, are the only two reproducer-specific options we will ever want here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should use CXDependencyScannerWorkerScanSettings instead of int argc, const char *const *argv, const char *ModuleName, const char *WorkingDirectory for the basic compilation setup.

I was going back and forth about it and can be persuaded either way. My [weak] justification is that we don't pass the settings to a dependency service/worker/tool. So it's not a self-sufficient entity but more a convenience container that happens to be the same between different functions. So I avoid treating it as an independent entity.

How sure are we that const char *ReproducerLocation, bool UseUniqueReproducerName, are the only two reproducer-specific options we will ever want here?

For the potential enhancements I expect only something to help capture the scanner state. Unfortunately, I don't know what that would be. Maybe .pcm cache, maybe dumping stat cache, maybe something else. So far it is too vague and not the most important direction for the next steps. That's why I'd like to land something reasonably future-proof and iterate on it to see what else we might need.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My [weak] justification is that we don't pass the settings to a dependency service/worker/tool.

Isn't that exactly what we do? The difference is that you're creating a new one instead of using an existing one.

hat's why I'd like to land something reasonably future-proof and iterate on it to see what else we might need.

If you want to make this future proof the C API should probably be much more opaque about the configuration, similar to what we do with the worker scan settings where the settings become their own opaque type and we can add new C functions over time to add new settings to that configuration object without breaking the existing API.

int argc, const char *const *argv, const char *ModuleName,
const char *WorkingDirectory, const char *ReproducerLocation,
bool UseUniqueReproducerName, CXString *MessageOut);

/**
* Produces the dependency graph for a particular compiler invocation.
*
Expand Down
42 changes: 42 additions & 0 deletions clang/test/Modules/reproducer-with-module-dependencies.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
// Test generating a reproducer for a modular build where required modules are
// built explicitly as separate steps.

// RUN: rm -rf %t
// RUN: split-file %s %t
//
// RUN: c-index-test core -gen-deps-reproducer -working-dir %t \
// RUN: -- clang-executable -c %t/reproducer.c -o %t/reproducer.o \
// RUN: -fmodules -fmodules-cache-path=%t | FileCheck %t/reproducer.c

// Test a failed attempt at generating a reproducer.
// RUN: not c-index-test core -gen-deps-reproducer -working-dir %t \
// RUN: -- clang-executable -c %t/failed-reproducer.c -o %t/reproducer.o \
// RUN: -fmodules -fmodules-cache-path=%t 2>&1 | FileCheck %t/failed-reproducer.c

// Test the content of a reproducer script.
// RUN: c-index-test core -gen-deps-reproducer -working-dir %t -o %t/repro-content \
// RUN: -- clang-executable -c %t/reproducer.c -o %t/reproducer.o \
// RUN: -fmodules -fmodules-cache-path=%t
// RUN: FileCheck %t/script-expectations.txt --input-file %t/repro-content/reproducer.sh

//--- modular-header.h
void fn_in_modular_header(void);

//--- module.modulemap
module Test { header "modular-header.h" export * }

//--- reproducer.c
// CHECK: Sources and associated run script(s) are located at:
#include "modular-header.h"

void test(void) {
fn_in_modular_header();
}

//--- failed-reproducer.c
// CHECK: fatal error: 'non-existing-header.h' file not found
#include "non-existing-header.h"

//--- script-expectations.txt
CHECK: clang-executable
CHECK: -fmodule-file=Test=reproducer.cache/explicitly-built-modules/Test-{{.*}}.pcm
33 changes: 33 additions & 0 deletions clang/tools/c-index-test/core_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ enum class ActionType {
AggregateAsJSON,
ScanDeps,
ScanDepsByModuleName,
GenerateDepsReproducer,
UploadCachedJob,
MaterializeCachedJob,
ReplayCachedJob,
Expand Down Expand Up @@ -87,6 +88,8 @@ Action(cl::desc("Action:"), cl::init(ActionType::None),
"Get file dependencies"),
clEnumValN(ActionType::ScanDepsByModuleName, "scan-deps-by-mod-name",
"Get file dependencies by module name alone"),
clEnumValN(ActionType::GenerateDepsReproducer, "gen-deps-reproducer",
"Generate a reproducer for the file"),
clEnumValN(ActionType::UploadCachedJob, "upload-cached-job",
"Upload cached compilation data to upstream CAS"),
clEnumValN(ActionType::MaterializeCachedJob, "materialize-cached-job",
Expand Down Expand Up @@ -921,6 +924,27 @@ static int scanDeps(ArrayRef<const char *> Args, std::string WorkingDirectory,
return 1;
}

static int generateDepsReproducer(ArrayRef<const char *> Args,
std::string WorkingDirectory,
std::string ReproLocation) {
CXString MessageString;
auto DisposeMessageString = llvm::make_scope_exit([&]() {
clang_disposeString(MessageString);
});
CXErrorCode ExitCode =
clang_experimental_DependencyScanner_generateReproducer(
Args.size(), Args.data(), /*ModuleName=*/nullptr,
WorkingDirectory.c_str(),
ReproLocation.empty() ? nullptr : ReproLocation.c_str(),
/*UseUniqueReproducerName=*/ReproLocation.empty(), &MessageString);
if (ExitCode == CXError_Success) {
llvm::outs() << clang_getCString(MessageString) << "\n";
} else {
llvm::errs() << "error: " << clang_getCString(MessageString) << "\n";
}
return (ExitCode == CXError_Success) ? 0 : 1;
}

static int uploadCachedJob(std::string CacheKey, CXCASDatabases DBs) {
CXError Err = nullptr;
CXCASCachedCompilation CComp = clang_experimental_cas_getCachedCompilation(
Expand Down Expand Up @@ -1546,6 +1570,15 @@ int indextest_core_main(int argc, const char **argv) {
options::OutputDir, DBs, options::ModuleName);
}

if (options::Action == ActionType::GenerateDepsReproducer) {
if (options::WorkingDir.empty()) {
errs() << "error: missing -working-dir\n";
return 1;
}
return generateDepsReproducer(CompArgs, options::WorkingDir,
options::OutputFile);
}

if (options::Action == ActionType::UploadCachedJob) {
if (options::InputFiles.empty()) {
errs() << "error: missing cache key\n";
Expand Down
144 changes: 144 additions & 0 deletions clang/tools/libclang/CDependencies.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,150 @@ void clang_experimental_DependencyScannerWorkerScanSettings_dispose(
delete unwrap(Settings);
}

namespace {
// Helper class to capture a returnable error code and to return a formatted
// message in a provided CXString pointer.
class MessageEmitter {
const CXErrorCode ErrorCode;
CXString *OutputString;
std::string Buffer;
llvm::raw_string_ostream Stream;

public:
MessageEmitter(CXErrorCode Code, CXString *Output)
: ErrorCode(Code), OutputString(Output), Stream(Buffer) {}
~MessageEmitter() {
if (OutputString)
*OutputString = clang::cxstring::createDup(Buffer.c_str());
}

operator CXErrorCode() const { return ErrorCode; }

template <typename T> MessageEmitter &operator<<(const T &t) {
Stream << t;
return *this;
}
};
} // end anonymous namespace

enum CXErrorCode clang_experimental_DependencyScanner_generateReproducer(
int argc, const char *const *argv, const char *ModuleName,
const char *WorkingDirectory, const char *ReproducerLocation,
bool UseUniqueReproducerName, CXString *MessageOut) {
auto Report = [MessageOut](CXErrorCode ErrorCode) -> MessageEmitter {
return MessageEmitter(ErrorCode, MessageOut);
};
auto ReportFailure = [&Report]() -> MessageEmitter {
return Report(CXError_Failure);
};

if (argc < 2 || !argv)
return Report(CXError_InvalidArguments) << "missing compilation command";
if (!WorkingDirectory)
return Report(CXError_InvalidArguments) << "missing working directory";
if (!UseUniqueReproducerName && !ReproducerLocation)
return Report(CXError_InvalidArguments)
<< "non-unique reproducer is allowed only in a custom location";

CASOptions CASOpts;
IntrusiveRefCntPtr<llvm::cas::CachingOnDiskFileSystem> FS;
DependencyScanningService DepsService(
ScanningMode::DependencyDirectivesScan, ScanningOutputFormat::Full,
CASOpts, /*CAS=*/nullptr, /*ActionCache=*/nullptr, FS);
DependencyScanningTool DepsTool(DepsService);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this API create its own service/worker/tool instead of taking a CXDependencyScannerWorker?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this API create its own service/worker/tool instead of taking a CXDependencyScannerWorker?

I wanted the reproducer scan to be independent from previous scans, that's why don't use an existing worker. It is a trade-off, not a uniquely correct approach. Reproducers should be useful in general, not only for debugging issues with EBM, that's why I'm leaning towards from-scratch scanning. And I hope it is easier to reason about than to take into account an invisible worker's state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

independent from previous scans

Can you clarify what specifically you want to be independent from? Is it the file system cache?


llvm::SmallString<128> ReproScriptPath;
int ScriptFD;
if (ReproducerLocation) {
if (auto EC = llvm::sys::fs::create_directories(ReproducerLocation))
return ReportFailure() << "failed to create a reproducer location '"
<< ReproducerLocation << "'\n"
<< EC.message();
SmallString<128> Path(ReproducerLocation);
llvm::sys::path::append(Path, "reproducer");
const char *UniqueSuffix = UseUniqueReproducerName ? "-%%%%%%" : "";
if (auto EC = llvm::sys::fs::createUniqueFile(Path + UniqueSuffix + ".sh",
ScriptFD, ReproScriptPath))
return ReportFailure() << "failed to create a reproducer script file\n"
<< EC.message();
} else {
if (auto EC = llvm::sys::fs::createTemporaryFile(
"reproducer", "sh", ScriptFD, ReproScriptPath)) {
return ReportFailure() << "failed to create a reproducer script file\n"
<< EC.message();
}
}
SmallString<128> FileCachePath = ReproScriptPath;
llvm::sys::path::replace_extension(FileCachePath, ".cache");

std::string FileCacheName = llvm::sys::path::filename(FileCachePath).str();
auto LookupOutput = [&FileCacheName](const ModuleDeps &MD,
ModuleOutputKind MOK) -> std::string {
if (MOK != ModuleOutputKind::ModuleFile)
return "";
return FileCacheName + "/explicitly-built-modules/" +
MD.ID.ModuleName + "-" + MD.ID.ContextHash + ".pcm";
};

std::vector<std::string> Compilation{argv, argv + argc};
llvm::DenseSet<ModuleID> AlreadySeen;
auto TUDepsOrErr = DepsTool.getTranslationUnitDependencies(
Compilation, WorkingDirectory, AlreadySeen, std::move(LookupOutput));
if (!TUDepsOrErr)
return ReportFailure() << "failed to generate a reproducer\n"
<< toString(TUDepsOrErr.takeError());

TranslationUnitDeps TU = *TUDepsOrErr;
llvm::raw_fd_ostream ScriptOS(ScriptFD, /*shouldClose=*/true);
ScriptOS << "# Original command:\n#";
for (StringRef Arg : Compilation)
ScriptOS << ' ' << Arg;
ScriptOS << "\n\n";

ScriptOS << "# Dependencies:\n";
std::string ReproExecutable = std::string(argv[0]);
auto PrintArguments = [&ReproExecutable,
&FileCacheName](llvm::raw_fd_ostream &OS,
ArrayRef<std::string> Arguments) {
OS << ReproExecutable;
for (int I = 0, E = Arguments.size(); I < E; ++I)
OS << ' ' << Arguments[I];
OS << " -ivfsoverlay \"" << FileCacheName << "/vfs/vfs.yaml\"";
OS << '\n';
};
for (ModuleDeps &Dep : TU.ModuleGraph)
PrintArguments(ScriptOS, Dep.getBuildArguments());
ScriptOS << "\n# Translation unit:\n";
for (const Command &BuildCommand : TU.Commands)
PrintArguments(ScriptOS, BuildCommand.Arguments);

SmallString<128> VFSCachePath = FileCachePath;
llvm::sys::path::append(VFSCachePath, "vfs");
std::string VFSCachePathStr = VFSCachePath.str().str();
llvm::FileCollector FileCollector(VFSCachePathStr,
/*OverlayRoot=*/VFSCachePathStr);
for (const auto &FileDep : TU.FileDeps) {
FileCollector.addFile(FileDep);
}
for (ModuleDeps &ModuleDep : TU.ModuleGraph) {
ModuleDep.forEachFileDep([&FileCollector](StringRef FileDep) {
FileCollector.addFile(FileDep);
});
}
if (FileCollector.copyFiles(/*StopOnError=*/true))
return ReportFailure()
<< "failed to copy the files used for the compilation";
SmallString<128> VFSOverlayPath = VFSCachePath;
llvm::sys::path::append(VFSOverlayPath, "vfs.yaml");
if (FileCollector.writeMapping(VFSOverlayPath))
return ReportFailure() << "failed to write a VFS overlay mapping";

return Report(CXError_Success)
<< "Created a reproducer. Sources and associated run script(s) are "
"located at:\n "
<< FileCachePath << "\n " << ReproScriptPath;
}

enum CXErrorCode clang_experimental_DependencyScannerWorker_getDepGraph(
CXDependencyScannerWorker W,
CXDependencyScannerWorkerScanSettings CXSettings, CXDepGraph *Out) {
Expand Down
1 change: 1 addition & 0 deletions clang/tools/libclang/libclang.map
Original file line number Diff line number Diff line change
Expand Up @@ -496,6 +496,7 @@ LLVM_16 {
clang_experimental_cas_replayCompilation;
clang_experimental_cas_ReplayResult_dispose;
clang_experimental_cas_ReplayResult_getStderr;
clang_experimental_DependencyScanner_generateReproducer;
clang_experimental_DependencyScannerService_create_v1;
clang_experimental_DependencyScannerServiceOptions_create;
clang_experimental_DependencyScannerServiceOptions_dispose;
Expand Down