Improved status reporting #273

istalker2 · 2017-05-31T10:26:24Z

This commit rewrites status reporting used to control deployment
progress and get-status command

Old reporter machinery was removed
Resource classes were simplified:
- they no longer wrap resources in SimpleReporter
- they do not care about their metadata
- Status method was replaced with GetProgress which reports
  deployment progress in float number 0..1. Most resources have
  either 0 or 1, but replicated resources can have all the variety
- resources no longer dial with reporting
- handling of success factor is now done in scheduler
  As a result resource classes are now much simpler

On top of that new status methods were added that collect overall
deployment statistics and detailed per-resource status table.
Both can be retrieved using get-status CLI command

This change is

jellonek · 2017-05-31T11:23:28Z

I like the idea for moving test for success factor to scheduler, gaining in same time a possibility to test it e.x. for daemonsets, when someone would be so crazy that he would want to depend on smaller number of replicas from a number set as desired :)

Reviewed 53 of 53 files at r1.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.

docs/flows.md, line 245 at r1 (raw file):

One possible solution is to utilize technique shown above: make parameter value be part of resource name and
them duplicate the dependency that leads to this resource and pass different parameter value along each of

and then duplicate...
Also white space at the end of line.

docs/flows.md, line 251 at r1 (raw file):

Luckily, the dependencies can be automatically replicated. This is done through the `generateFor` field of the
`Dependency` object. `generateFor` is a map where keys are argument names and values are list expressions. Each list

Each expression list...

docs/flows.md, line 256 at r1 (raw file):

Then the dependency is going to be replicated automatically with each replica getting on of the list values as an
additional argument. There can be several `generateFor` arguments. In this case there is going to be one dependency
for each combination of the list values. For example

Missing : at the end of line.

docs/flows.md, line 270 at r1 (raw file):

has the same effect as

Replace space at end of line with :.

e2e/flows_test.go, line 37 at r1 (raw file):

		Eventually(func() int {
			return framework.countJobs("a-job-", false)
		}, 300*time.Second, 5*time.Second).Should(Equal(1), "1 a-job* should have been created")

Hmmm... Quite long times... It has to be full 5min there? Also time between checks - there is so many cases, each having 5sec for it. This can lead to quite big change in tests longevity.

Btw. I know that I should ask for this previously, but these tests just screams for putting them into table, with fields: func, string argument, bool argument, integer value to equal, failure description text.

e2e/utils/appcmanager.go, line 57 at r1 (raw file):

// RunWithOptions runs dependency graph deployment with given settings
func (a *AppControllerManager) RunWithOptions(runAsync bool, options interfaces.DependencyGraphOptions) {

I would prefer to introduce there second func - RunAsynchronouslyWithOptions, which would call that func at the start.
Adding runAsync parameter without changing func name there is a bit misleading - at first look few lines above I didn't know for what first parameter was introduced, needed to look forward on func metric.

pkg/client/flows.go, line 170 at r1 (raw file):

}

func (r *replicas) Update(replica *Replica) error {

Missing docstring.

Comments from Reviewable

istalker2 · 2017-06-01T02:00:27Z

Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.

docs/flows.md, line 251 at r1 (raw file):

Previously, jellonek (Piotr Skamruk) wrote…

Each expression list...

expression list = list of expressions. expression list is an expression which expresses the list. 1..3 is a list expression

docs/flows.md, line 256 at r1 (raw file):

Previously, jellonek (Piotr Skamruk) wrote…

Missing : at the end of line.

comma, I'd say because the sentence continues after example

docs/flows.md, line 270 at r1 (raw file):

Previously, jellonek (Piotr Skamruk) wrote…

Replace space at end of line with :.

same as above, the example is part of the sentence, "For example, X has the same effect as Y", just X and Y are yamls

e2e/flows_test.go, line 37 at r1 (raw file):

Previously, jellonek (Piotr Skamruk) wrote…

Hmmm... Quite long times... It has to be full 5min there? Also time between checks - there is so many cases, each having 5sec for it. This can lead to quite big change in tests longevity.

Btw. I know that I should ask for this previously, but these tests just screams for putting them into table, with fields: func, string argument, bool argument, integer value to equal, failure description text.

It doesn't increases running time since this is the maximum allowed time and in reality, all the checks pass instantly after deployment finishes. It doesn't matter if it is 5min or 5 hours. But if CI goes slower at least it won't fail

I'll try to change it to a table

Comments from Reviewable

istalker2 · 2017-06-05T02:48:21Z

Comments were addressed either here or in previous PRs where the issue was introduced

This change adds ability to replicate dependency with index parameters iterated over arbitrary number of lists. For each dependency it is now possible to specify map of indexVariableName -> listExpression listExpression := range|item + [, listExpression] range := number '..' number item := STRING for example, if for "i: 1..3" the dependency will be replicated into 3 clones, each one of them having argument i set to value in range [1, 3] This also allows to consume N flow replicas by replicating the dependency that leads to the consumed flow

For sequential flow, each next replica is attached to the leafs of previous one so that they will be deployed sequentially

* stopChan is now passed to the graph finalizers so that deployment can be canceled on the final stages * never write to stopChan. The only correct way to cancel deployment is to close the channel * pass nil instead of real chanel for unit tests that do not cancel deployment

In some cases deployment could hang forever: * if graph vertex depends on vertex which will never be created (because of timeout or permanent error) * if graph vertex was set to be created only if parent fails, but it didn't Because deployment algorithm waits for all vertexes to be created if any one of them remained blocked, Deploy() is going to run forever blocking AC process from handling other deployment tasks. Also on-error processing could be triggered by intermediate resource status. For example it could happen, if resource status was obtained prior to resource.Create() call. Another case if resource was set to have several deployment attempts. If the first attempt fails on-error dependency becomes activated, but on the second attempt the deployment may succeed. This commit reworks error handling: * Resources which cannot be created or time out, marked with error. * Resources, that depend on failed resources also fail * Thus all graph vertexes eventually become unblocked and deployment finishes * on-error handling is done based on the final resource status/error * Deploy() now returns true if deployment succeeded. Deployment fails if any of resources (and their dependents) went into failed state except for cases, where they were skipped because all dependencies had on-error meta and parent resource didn't fail Also: * e2e tests were updated so that most of them wait for deployment to finish rather than just waiting for resource status. Thus now they also test that deployment doesn't hang * Graph vertex type (ScheduledResource) and it fields are not exported anymore. The same goes for some of dependency graph methods. * wait() method doesn't create unnecessary goroutines and channels

* Removes one second delay before scheduler starts to check if dependent resources can be created. This drastically speeds up unit tests and subsequent deployments when resources already exist and thus resource is created instantly * Caches dependencies and definitions during deployment. When one flow is consumed from another, they were all fetched from k8s for each replica of the inner flow + once for the outer flow. Now it happens only once. It also makes deployment more consistent in case if definitions were modified in k8s in the middle of deployment * Check dependency graph for cycles only once per deployment

This commit rewrites status reporting used to control deployment progress and get-status command * Old reporter machinery was removed * Resource classes were simplified: - they no longer wrap resources in SimpleReporter - they do not care about their metadata - Status method was replaced with GetProgress which reports deployment progress in float number 0..1. Most resources have either 0 or 1, but replicated resources can have all the variety - resources no longer dial with reporting - handling of success factor is now done in scheduler As a result resource classes are now much simpler On top of that new status methods were added that collect overall deployment statistics and detailed per-resource status table. Both can be retrieved using get-status CLI command

pigmej added the in progress label May 31, 2017

istalker2 force-pushed the new-status branch from f2c2ce8 to 8d2225c Compare June 1, 2017 07:55

istalker2 force-pushed the new-status branch from 8d2225c to 251468a Compare June 5, 2017 06:43

Stan Lagun added 6 commits June 13, 2017 16:10

Sequential flow replication

4a8d7b0

For sequential flow, each next replica is attached to the leafs of previous one so that they will be deployed sequentially

istalker2 force-pushed the new-status branch from 251468a to ab95c24 Compare June 13, 2017 23:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved status reporting #273

Improved status reporting #273

Uh oh!

istalker2 commented May 31, 2017 •

edited by pigmej

Loading

Uh oh!

jellonek commented May 31, 2017

Uh oh!

istalker2 commented Jun 1, 2017

Uh oh!

istalker2 commented Jun 5, 2017

Uh oh!

Uh oh!

Improved status reporting #273

Are you sure you want to change the base?

Improved status reporting #273

Uh oh!

Conversation

istalker2 commented May 31, 2017 • edited by pigmej Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jellonek commented May 31, 2017

Uh oh!

istalker2 commented Jun 1, 2017

Uh oh!

istalker2 commented Jun 5, 2017

Uh oh!

Uh oh!

istalker2 commented May 31, 2017 •

edited by pigmej

Loading