Skip to content

fs.writeFileSync intermittently does not create file on disk when called in parallel #1002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cratter49 opened this issue Nov 29, 2017 · 13 comments

Comments

@cratter49
Copy link

  • Node.js Version: v8.8.0
  • OS: Linux 729a00fd3f64 4.4.74-boot2docker Update README for help #1 SMP Mon Jun 26 18:01:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Scope (install, code, runtime, meta, other?): runtime
  • Module (and version) (if relevant): fs

I'm currently running a custom Jupyter notebook server within a Docker container. In this server there is code that executes a NodeJS script one or more times which launches an instance of headless chrome, extracts some information from the DOM of a page that instance of headless chrome points to, and then writes that information to a file in a temporary directory using fs.writeFileSync from the fs module. When this script is executed multiple times in parallel sometimes the fs.writeFileSync does not seem to actually write the file to disk even though there are no errors thrown from the script. For example, when the script is executed six times in parallel on my Docker system one NodeJS process out of the six I launched will fail to write the file to disk. Additionally, I know that the file is written to a unique path with each execution of the script and using the asynchronous version of writeFile does not work.

Also, it's important to note that the file does not exist before this call is made.

Any ideas why the file write my not occur?

@ORESoftware
Copy link

ORESoftware commented Nov 29, 2017

can you share you code?

fs.XSync won't actually run in parallel, one call should block another. Any sync call will block.

@cratter49
Copy link
Author

@ORESoftware

There isn't a whole lot of code I can share but I start the NodeJS script using the nvm (node version manager) package like so in Python:

 generateReportCommand = ("%s use 8.8.1 %s '%s'" % (nBinary, pipes.quote(reportScriptPath), reporting_url)) 
subprocess.check_output(generateReportCommand, stderr=subprocess.STDOUT, shell=True).decode('utf-8')

where nBinary is n, reportScriptPath is the path of the NodeJS script, and reporting_url is the URL I point my headless chrome browser to.

Then in the NodeJS script at reportScriptPath I write the file to the file:

fileSystem.writeFileSync("jsonObjForLatex.json", rectanglesAndInfoTuple.jsonObjForLatex);

where fileSystem contains the fs module and rectanglesAndInfoTuple.jsonObjForLatex contains a string with JSON formatting.

All of this code is running multiple times in parallel on the same Docker container

If what you're saying about fs.writeFileSync methods blocking each other is occurring then wouldn't the writes still eventually happen once each call has gotten its turn?

I'm currently trying to flush the buffer cache using fsyncSync which successfully creates the files but the contents of the files seem to be completely erased so the files are blank.

@ORESoftware
Copy link

ORESoftware commented Nov 30, 2017

how many Node.js processes are running? Any fs.xSync call will block in a given Node.js process, but if there is more than one process, then the fs calls will run independently / in parallel.

@ORESoftware
Copy link

by the way it's probably not nvm that launches the node executable...nvm is only there to switch node.js versions in the shell.

@ORESoftware
Copy link

ORESoftware commented Nov 30, 2017

can you dump the actual string that generateReportCommand represents? Paste the actual command to this conversation. That variable seems to be the actual node command that you are using.

@cratter49
Copy link
Author

cratter49 commented Nov 30, 2017

@ORESoftware

There's always more than one Node.js process running. In my testing it usually ranges from 3-7 processes. Each Node.js process will call fs.writeFileSync 1 to X times.

The command is n use 8.8.1 /home/ubuntu/report.js https://localhost:8443/notebooks/prebuilt%20notebooks/0.%20Getting%20Started.ipynb

@ORESoftware
Copy link

ORESoftware commented Nov 30, 2017 via email

@cratter49
Copy link
Author

@ORESoftware

Actually each process initially starts in the same directory but os.chdir changes the path to a specific temporary directory that is unique to each process. This means that the jsonObjForLatex.json file being written to is unique for each process.

@ORESoftware
Copy link

ORESoftware commented Nov 30, 2017

Can you verify that it's actually changing directories properly? what exactly is the issue where the files are "overwriting" each other, if they are actually writing to different directories?

@cratter49
Copy link
Author

cratter49 commented Dec 1, 2017

@ORESoftware

I printed out the current working directory using a utility function and every process was operating within its own unique path. The overwriting issue revolves around the use of fsync which flushes the buffer cache to the file descriptor passed in as a parameter. Using fsync makes sure that all of the files are created but does not succeed in writing data to each file.

@gireeshpunathil
Copy link
Member

@cratter49 - is this still outstanding?

@gireeshpunathil
Copy link
Member

closing due to inactivity

@douglasg14b
Copy link

douglasg14b commented Jul 12, 2024

I'm also noticing this behavior in unit tests. However, it seems to be a timing issue.

The file is created, but writeFileSync returns control BEFORE the actual write completes. But only in parallel scenarios. If I "sleep" await new Promise((resolve) => setTimeout(resolve, 100)); after writeFileSync then all tests pass, and access to the written file succeeds. But as I lower that sleep time the rate of failures gradually increases until it fails every test run.

This is weird behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants