Skip to content

Bootstrap should better handle cases where the lock is hold by nobody #135972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WaffleLapkin opened this issue Jan 24, 2025 · 5 comments · Fixed by #136009
Closed

Bootstrap should better handle cases where the lock is hold by nobody #135972

WaffleLapkin opened this issue Jan 24, 2025 · 5 comments · Fixed by #136009
Assignees
Labels
C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)

Comments

@WaffleLapkin
Copy link
Member

Summary

I just saw this:

WARNING: build directory locked by process , waiting for lock

Which made me quite confused confused. is , the process? (not unheard of, I have a binary called ,) Is the process empty string?

The actual result is (most likely) that because rust-analyzer OOM'ed after using >44GiB of RAM today, the lock was not released and so is held by no process (thus, empty string).

Command used

x b proc-macro-srv-cli rustfmt --stage 0 --build-dir build/rust-analyzer

(although it does not matter)

Expected behaviour

Bootstrap explicitly saying that no process is holding the lock & possibly even suggesting removing the lockfile.

Operating system

NixOS

HEAD

99768c8

@WaffleLapkin WaffleLapkin added C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Jan 24, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 24, 2025
@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 24, 2025
@jieyouxu
Copy link
Member

jieyouxu commented Jan 24, 2025

I thought we changed this in #132737... That can happen if the PID is somehow not available
cc @clubby789 maybe you know a bit more about the dir lock?

@bjorn3
Copy link
Member

bjorn3 commented Jan 24, 2025

The actual result is (most likely) that because rust-analyzer OOM'ed after using >44GiB of RAM today, the lock was not released and so is held by no process (thus, empty string).

We don't use the existence of a lockfile to indicate that another process is running. Rather we take an advisory lock on the lockfile. Advisory locks are automatically unlocked by the OS when the process that holds them exits for whatever reason.

Which made me quite confused confused. is , the process? (not unheard of, I have a binary called ,) Is the process empty string?

It prints the content of the lockfile which should contain the pid of the process that last locked it (even after said process exited).

@clubby789
Copy link
Contributor

That's very strange. Maybe a race between two bootstrap processes;

P1: Creates lockfile and takes lock
P2: Reads (currently empty) lockfile
P1: Writes PID to file
P2: Fails to take lock, prints the empty file contents previously read

@clubby789
Copy link
Contributor

We could probably fix this by being smarter and writing more atomically but since the PID is just for diagnostics, maybe we can just check .is_empty() before printing the first message

@clubby789 clubby789 self-assigned this Jan 24, 2025
@WaffleLapkin
Copy link
Member Author

We don't use the existence of a lockfile to indicate that another process is running. Rather we take an advisory lock on the lockfile. Advisory locks are automatically unlocked by the OS when the process that holds them exits for whatever reason.

That's weird, cause I'm sure there were no other processes running at the same time 🤔

@bors bors closed this as completed in 6cb2820 Jan 25, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jan 25, 2025
Rollup merge of rust-lang#136009 - clubby789:pidfile-race, r=jieyouxu

bootstrap: Handle bootstrap lockfile race condition better

Fixes rust-lang#135972

Tested by:
- Starting one build
- In another terminal, `echo -n '' > build/lock`
- Attempt to invoke bootstrap a second time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants