Skip to content
This repository was archived by the owner on Aug 18, 2020. It is now read-only.

[CDEC-470] Catch IOExceptions in productionReporter #3365

Merged
merged 1 commit into from
Aug 17, 2018

Conversation

mhuesch
Copy link
Contributor

@mhuesch mhuesch commented Aug 7, 2018

Description

Reporting is currently complex and brittle, and one failure mode occurs
when the network connection goes down, and reporting tries to report,
but dies due to an IOException because the network is down.

Installing a simple catchIO handler around reportNode should catch
such exceptions and report their occurence in the log.

Linked issue

https://iohk.myjetbrains.com/youtrack/issue/CDEC-470

Type of change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • [~] 🛠 New feature (non-breaking change which adds functionality)
  • [~] ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • [~] 🏭 Refactoring that does not change existing functionality but does improve things like code readability, structure etc
  • [~] 🔨 New or improved tests for existing code
  • [~] ⛑ git-flow chore (backport, hotfix, etc)

Developer checklist

  • I have read the style guide document, and my code follows the code style of this project.
  • If my code deals with exceptions, it follows the guidelines.
  • I have updated any documentation accordingly, if needed. Documentation changes can be reflected in opening a PR on cardanodocs.com, amending the inline Haddock comments, any relevant README file or one of the document listed in the docs directory.
  • CHANGELOG entry has been added and is linked to the correct PR on GitHub.

Testing checklist

  • [~] I have added tests to cover my changes.
  • All new and existing tests passed.
    ^ this PR modifies IO/orchestration layer code, which is not currently covered by tests.

QA Steps

$ git checkout develop

# start a cardano node using nix scripts
$ nix-shell
...
$ (in nix-shell) stack --nix build --ghc-options=-optl-Wl,-dead_strip_dylibs
$ (in reg shell) nix-build -A connectScripts.mainnet.wallet -o connect-to-mainnet
$ (in nix-shell) ./scripts/launch/connect-to-cluster/mainnet-staging.sh --nix

# look at log output - see blocks streaming by

# disable wifi. observe that blocks no longer stream by, and error messages complaining that network is down.

"""
[diffusion:ERROR:ThreadId 317] [2018-08-07 19:01:40.44 UTC] exception while resolving domains [NodeAddrDNS "relays.awstest.iohkdev.io" Nothing]: /etc/resolv.conf: openFile: does not exist (No such file or directory)
"""

# wait 3-5 minutes. enable wifi. observe (on develop) that block streaming does not resume.

$ <ctrl-c> to kill process

***


$ git checkout mhuesch/CDEC-470-develop

# repeat above process to build & prepare for deployment

...

# on re-enabling wifi, observe that block streaming resumes

Screenshots (if available)

https://asciinema.org/a/DlqCqrcv01NgUzNEiZhIKVP9w
~2:10 to ~8:30 are boring (just waiting until the worker should die, if exceptions weren't handled properly)

@mhuesch mhuesch requested a review from erikd as a code owner August 7, 2018 19:12
@mhuesch mhuesch force-pushed the mhuesch/CDEC-470-develop branch from ceb2e71 to 6bea9c3 Compare August 7, 2018 19:19
@mhuesch mhuesch requested a review from coot August 7, 2018 21:44
died when it tried to report over the down network. (CDEC-470 / [PR 3365])

[PR 3365]: https://github.com/input-output-hk/cardano-sl/pull/3365

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -31,10 +33,17 @@ productionReporter
-> Reporter IO
productionReporter params diffusion = Reporter $ \rt -> withWlogTempFile logConfig $ \mfp -> do
rt' <- extendWithNodeInfo diffusion rt
reportNode logTrace protocolMagic compileTimeInfo servers mfp rt'
(reportNode logTrace protocolMagic compileTimeInfo servers mfp rt'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are parens really needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - I had them there to make the initial lack-of-indentation less confusing. But with your suggested indent of the bottom 2 lines, they seem unnecessary. Just force-pushed a fix.

@mhuesch mhuesch force-pushed the mhuesch/CDEC-470-develop branch from 6bea9c3 to 8a5922e Compare August 8, 2018 14:34
Reporting is currently complex and brittle, and one failure mode occurs
when the network connection goes down, and reporting tries to report,
but dies due to an IOException because the network is down.

Installing a simple `catchIO` handler around `reportNode` should catch
such exceptions and report their occurence in the log.
@mhuesch mhuesch force-pushed the mhuesch/CDEC-470-develop branch from 8a5922e to 30f3621 Compare August 8, 2018 15:31
Copy link
Contributor

@CodiePP CodiePP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!
if this gets merged we can then adapt it to the new logging (PR #3401)

@CodiePP CodiePP merged commit 298773b into develop Aug 17, 2018
@mhuesch
Copy link
Contributor Author

mhuesch commented Aug 17, 2018

@CodiePP I’m not sure if we want this merged yet, although I don’t think it will cause problems. When last we tried to reproduce the problem and check that this PR fixed it, we were unable to reproduce the problem. So it wasn’t clear that this PR actually implemented a fix.

/cc @erikd

@erikd
Copy link
Member

erikd commented Aug 17, 2018

This at least seems correct. So YOLO!

@mhuesch
Copy link
Contributor Author

mhuesch commented Aug 18, 2018

🤘

@mhuesch mhuesch deleted the mhuesch/CDEC-470-develop branch October 24, 2018 11:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants