Skip to content

[libclang] Add API to generate a reproducer for the explicitly-built modules. #10577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: next
Choose a base branch
from

Conversation

vsapsai
Copy link

@vsapsai vsapsai commented Apr 29, 2025

Capturing a single -cc1 command to reproduce a bug with the explicitly-built modules isn't sufficient, we need to know what .pcm files were built and how.

rdar://59743925

@vsapsai vsapsai requested a review from jansvoboda11 April 29, 2025 04:06
@vsapsai
Copy link
Author

vsapsai commented Apr 29, 2025

Right now I'd like to get a high-level opinion if such an approach for reproducers is reasonable. This feature will have more commits, to get an idea of how it is supposed to look like you can check users/vsapsai/reproducer-explicitly-built-modules branch.

@vsapsai
Copy link
Author

vsapsai commented May 7, 2025

An example of API usage is in swiftlang/swift-build#455

@vsapsai
Copy link
Author

vsapsai commented May 7, 2025

By the way, this doesn't work as a reproducer for dependency scanning, I'm just capturing the scanning results. My hope that with including the original command [used for scanning] and with the captured files you can still do something like

c-index-test core --scan-deps -- original_command_from_reproducer

After landing the proposed level of reproducing we can think how to support scan-deps better.

I was considering an alternative of invoking clang-scan-deps from the shell script and building modules based on the scanning results. But I've decided that embedding a little build system in a shell script is too much complexity and not justified at this stage. Maybe we'll see a need for something like that, then we can bite the bullet and add the required complexity.

@vsapsai
Copy link
Author

vsapsai commented May 14, 2025

I had an idea of adding DependencyScanningWorker to the reproducer generation API so that we can capture some internal worker state to help with the debugging. But after additional consideration I think it is too vague and unclear as I don't have specific use cases when it should help. And such functionality we can always add in a new API version.

@vsapsai vsapsai force-pushed the users/vsapsai/reproducer-explicitly-built-modules_01 branch from 5185b66 to dd4cc19 Compare May 14, 2025 21:58
@vsapsai vsapsai marked this pull request as ready for review May 14, 2025 22:00
@vsapsai
Copy link
Author

vsapsai commented May 22, 2025

Ping.

@jansvoboda11
Copy link

I'd like the API changes from users/vsapsai/reproducer-explicitly-built-modules (and the addition of const char *ModuleName to align with the scan API that we talked about offline) to be included in this PR. C APIs cannot change between Clang versions, so getting this right in the first commit makes the most sense to me.

vsapsai added a commit that referenced this pull request May 22, 2025
…modules. (#10577)

Capturing a single `-cc1` command to reproduce a bug with the explicitly-built
modules isn't sufficient, we need to know what .pcm files were built and how.

rdar://59743925
@vsapsai vsapsai force-pushed the users/vsapsai/reproducer-explicitly-built-modules_01 branch from dd4cc19 to 9977985 Compare May 22, 2025 22:07
vsapsai added a commit that referenced this pull request May 22, 2025
Tools using the reproducers can specify a custom location to support
scenarios similar to those covered by `-fcrash-diagnostics-dir` and
`CLANG_CRASH_DIAGNOSTICS_DIR`.
…modules. (#10577)

Capturing a single `-cc1` command to reproduce a bug with the explicitly-built
modules isn't sufficient, we need to know what .pcm files were built and how.

rdar://59743925
vsapsai added a commit that referenced this pull request May 22, 2025
Tools using the reproducers can specify a custom location to support
scenarios similar to those covered by `-fcrash-diagnostics-dir` and
`CLANG_CRASH_DIAGNOSTICS_DIR`.
@vsapsai vsapsai force-pushed the users/vsapsai/reproducer-explicitly-built-modules_01 branch from 9977985 to d440bf1 Compare May 22, 2025 23:28
@vsapsai
Copy link
Author

vsapsai commented May 22, 2025

Added ModuleName [that's not used right now]. And added another commit with ReproducerLocation to this branch.

Copy link

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but let's see if @Bigcheese or @benlangmuir have any objections.

Copy link

@benlangmuir benlangmuir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have an entry point in clang-scan-deps? It's unusual to have something in CDependencies that doesn't have an equivalent clang-scan-deps command-line interface, though it's not strictly necessary

* for a success case but is allowed to be empty when encountered an error.
*/
CINDEX_LINKAGE enum CXErrorCode
clang_experimental_DependencyScanner_generateReproducer(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should use CXDependencyScannerWorkerScanSettings instead of int argc, const char *const *argv, const char *ModuleName, const char *WorkingDirectory for the basic compilation setup.

How sure are we that const char *ReproducerLocation, bool UseUniqueReproducerName, are the only two reproducer-specific options we will ever want here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should use CXDependencyScannerWorkerScanSettings instead of int argc, const char *const *argv, const char *ModuleName, const char *WorkingDirectory for the basic compilation setup.

I was going back and forth about it and can be persuaded either way. My [weak] justification is that we don't pass the settings to a dependency service/worker/tool. So it's not a self-sufficient entity but more a convenience container that happens to be the same between different functions. So I avoid treating it as an independent entity.

How sure are we that const char *ReproducerLocation, bool UseUniqueReproducerName, are the only two reproducer-specific options we will ever want here?

For the potential enhancements I expect only something to help capture the scanner state. Unfortunately, I don't know what that would be. Maybe .pcm cache, maybe dumping stat cache, maybe something else. So far it is too vague and not the most important direction for the next steps. That's why I'd like to land something reasonably future-proof and iterate on it to see what else we might need.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My [weak] justification is that we don't pass the settings to a dependency service/worker/tool.

Isn't that exactly what we do? The difference is that you're creating a new one instead of using an existing one.

hat's why I'd like to land something reasonably future-proof and iterate on it to see what else we might need.

If you want to make this future proof the C API should probably be much more opaque about the configuration, similar to what we do with the worker scan settings where the settings become their own opaque type and we can add new C functions over time to add new settings to that configuration object without breaking the existing API.

DependencyScanningService DepsService(
ScanningMode::DependencyDirectivesScan, ScanningOutputFormat::Full,
CASOpts, /*CAS=*/nullptr, /*ActionCache=*/nullptr, FS);
DependencyScanningTool DepsTool(DepsService);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this API create its own service/worker/tool instead of taking a CXDependencyScannerWorker?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this API create its own service/worker/tool instead of taking a CXDependencyScannerWorker?

I wanted the reproducer scan to be independent from previous scans, that's why don't use an existing worker. It is a trade-off, not a uniquely correct approach. Reproducers should be useful in general, not only for debugging issues with EBM, that's why I'm leaning towards from-scratch scanning. And I hope it is easier to reason about than to take into account an invisible worker's state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

independent from previous scans

Can you clarify what specifically you want to be independent from? Is it the file system cache?

llvm::SmallString<128> ReproScriptPath;
int ScriptFD;
if (ReproducerLocation) {
if (!llvm::sys::fs::exists(ReproducerLocation)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_directories has bool IgnoreExisting = true

@vsapsai
Copy link
Author

vsapsai commented May 23, 2025

Should this have an entry point in clang-scan-deps? It's unusual to have something in CDependencies that doesn't have an equivalent clang-scan-deps command-line interface, though it's not strictly necessary

I agree it is a good idea. But when I've started working on it, I've realized that for the proper API I need a build system ready to use it. And I don't have such a build system. I'm not strongly against it but I'd rather avoid developing functionality nobody uses.

Tools using the reproducers can specify a custom location to support
scenarios similar to those covered by `-fcrash-diagnostics-dir` and
`CLANG_CRASH_DIAGNOSTICS_DIR`.
@vsapsai vsapsai force-pushed the users/vsapsai/reproducer-explicitly-built-modules_01 branch from d440bf1 to fc7ae9f Compare May 23, 2025 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants