I0831 06:35:37.548730 1 queuejob_controller_ex.go:323] [Controller] Agent mode E0831 06:35:37.548776 1 queuejobagent.go:64] [agentEventQueue] Invalid agent configuration: . Agent cluster will not be instantiated. I0831 06:35:37.548895 1 reflector.go:221] Starting reflector *v1beta1.AppWrapper (0s) from go/pkg/mod/k8s.io/client-go@v0.26.2/tools/cache/reflector.go:169 I0831 06:35:37.548913 1 reflector.go:257] Listing and watching *v1beta1.AppWrapper from go/pkg/mod/k8s.io/client-go@v0.26.2/tools/cache/reflector.go:169 I0831 06:35:37.648919 1 shared_informer.go:303] caches populated I0831 06:38:47.113514 1 queuejob_controller_ex.go:1405] [updateStatusInEtcdWithMergeFunction] trying to update 'test-ns-grslq/mnist' version '29833' called by 'worker - setQueueing' I0831 06:38:47.146276 1 queuejob_controller_ex.go:1432] [updateStatusInEtcdWithMergeFunction] update success 'test-ns-grslq/mnist' version '29839' called by 'worker - setQueueing' I0831 06:38:47.146734 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.033238 seconds RemainingLength=0 &qj=0xc0002b5000 Version=29839 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Queueing ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before worker - setQueueing Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.146789 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.168484 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.168513 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29841 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. }] [] 0 0 0 0 0} I0831 06:38:47.168550 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. }] [] 0 0 0 0 0} I0831 06:38:47.168566 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.168596 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:47.250230 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 77.289737ms I0831 06:38:47.250265 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.250282 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.250289 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:47.250299 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.250314 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0002b5000 Version=29841 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.250394 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.250403 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.136913 seconds RemainingLength=0 &qj=0xc0002b5c00 Version=29841 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.250488 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.293236 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.293382 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0002b5000 Version=29841 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:47.294375 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:47.294449 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.180963 seconds RemainingLength=0 &qj=0xc00044a400 Version=29851 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.294495 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.318350 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.318374 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29853 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.318406 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.318422 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.318431 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:47.379467 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 36.434432ms I0831 06:38:47.379493 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.379511 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.379517 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:47.379528 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.379543 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00044a400 Version=29853 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.379668 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.379790 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.266297 seconds RemainingLength=0 &qj=0xc000a30c00 Version=29853 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.380205 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.398230 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.398276 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00044a400 Version=29853 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:47.398747 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:47.398805 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.285319 seconds RemainingLength=0 &qj=0xc000a7a000 Version=29856 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.398833 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.422204 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.422489 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29857 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.422623 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.422949 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.423009 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:47.465823 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 37.89143ms I0831 06:38:47.465850 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.465868 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.465874 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:47.465886 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.465902 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000a7a000 Version=29857 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.466013 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.352521 seconds RemainingLength=0 &qj=0xc000a7ac00 Version=29857 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.466059 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.466058 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.471393 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.471414 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29857 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.471448 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.471464 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.471475 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:47.495229 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.495278 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000a7a000 Version=29857 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.535155 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 38.823045ms I0831 06:38:47.535186 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.535204 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.535211 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:47.535222 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.535238 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000a7ac00 Version=29857 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.535378 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.535355 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.421868 seconds RemainingLength=0 &qj=0xc0008aa000 Version=29859 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.535402 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.540863 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.540894 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000a7ac00 Version=29857 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.556855 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.556882 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29860 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.556929 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.556958 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.556977 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:47.600925 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 41.081081ms I0831 06:38:47.600968 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.600990 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.600997 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:47.601008 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.601027 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0008aa000 Version=29860 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.601101 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.487615 seconds RemainingLength=0 &qj=0xc0007ae000 Version=29860 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.601133 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.601133 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.627402 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.627429 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0008aa000 Version=29860 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:47.628289 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:47.628349 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.514862 seconds RemainingLength=0 &qj=0xc000762000 Version=29861 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.628376 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.657237 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.657265 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29863 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.657295 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.657313 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.657323 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:47.699896 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 39.676007ms I0831 06:38:47.699926 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.699943 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.699958 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:47.699968 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.699984 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000762000 Version=29863 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.700080 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.586594 seconds RemainingLength=0 &qj=0xc0007e1400 Version=29863 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.700120 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.700133 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.706037 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.706053 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29863 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.706097 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.706126 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.706143 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:47.722334 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.722406 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000762000 Version=29863 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.761348 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 38.806402ms I0831 06:38:47.761376 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.761395 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.761402 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:47.761413 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.761432 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007e1400 Version=29863 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.761547 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.761525 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.648035 seconds RemainingLength=0 &qj=0xc00080c800 Version=29864 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.761947 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.774859 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.774919 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29865 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.774954 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.774984 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.775001 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:47.775175 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:47.809089 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 32.294119ms I0831 06:38:47.809120 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.809141 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.809148 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:47.809159 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.809184 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00080c800 Version=29865 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.809282 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.809321 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.695825 seconds RemainingLength=0 &qj=0xc000730c00 Version=29865 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.809402 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.814037 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.814052 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29865 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.814074 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.814090 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.814100 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:47.849463 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.849481 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00080c800 Version=29865 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.876458 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.881083 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.881108 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007e1400 Version=29863 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.882833 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.473366ms I0831 06:38:47.882889 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.882913 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.882927 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:47.882944 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.882964 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000730c00 Version=29865 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.883075 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.883072 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.769586 seconds RemainingLength=0 &qj=0xc0007fc800 Version=29866 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.883136 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.887201 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.887220 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000730c00 Version=29865 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.907521 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.907610 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29867 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.907659 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.907693 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.907717 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:47.951030 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 40.887608ms I0831 06:38:47.951055 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.951070 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.951077 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:47.951094 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:47.951108 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007fc800 Version=29867 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.951218 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.951215 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.837725 seconds RemainingLength=0 &qj=0xc0007bb000 Version=29867 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.951301 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.976327 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:47.976353 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007fc800 Version=29867 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:47.976908 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:47.977020 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.863532 seconds RemainingLength=0 &qj=0xc0000a1400 Version=29869 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:47.977073 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.998998 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:47.999016 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29872 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.999042 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:47.999057 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:47.999066 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.050650 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 46.773704ms I0831 06:38:48.050673 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.050689 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.050695 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.050705 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.050729 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0000a1400 Version=29872 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.050825 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.050855 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=0.937364 seconds RemainingLength=0 &qj=0xc0008aa400 Version=29872 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.050908 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.055836 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.055880 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29872 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.055909 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.055934 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.055949 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.082278 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.082370 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0000a1400 Version=29872 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.114660 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 33.103462ms I0831 06:38:48.114685 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.114702 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.114708 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.114726 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.114739 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0008aa400 Version=29872 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.114848 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.115117 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.001630 seconds RemainingLength=0 &qj=0xc0008a8800 Version=29875 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.115164 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:48.133485 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:48.133666 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.133678 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29876 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.133704 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.133719 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.133733 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.167971 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 32.516987ms I0831 06:38:48.167996 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.168015 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.168022 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.168032 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.168047 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0008a8800 Version=29876 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.168148 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.168230 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.054741 seconds RemainingLength=0 &qj=0xc000754000 Version=29876 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.168344 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.172542 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.172557 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29876 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.172592 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.172605 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.172613 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.198648 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.198661 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0008a8800 Version=29876 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.227854 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 29.025703ms I0831 06:38:48.227875 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.227904 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.227910 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.227919 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.227933 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000754000 Version=29876 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.228007 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.228024 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.114535 seconds RemainingLength=0 &qj=0xc000754800 Version=29877 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.228070 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.233807 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' W0831 06:38:48.247668 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. W0831 06:38:48.247827 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:48.247964 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.247976 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29878 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.248001 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.248018 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.248026 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.286865 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 37.169214ms I0831 06:38:48.286891 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.286908 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.286914 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.286923 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.286939 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000754800 Version=29878 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.287089 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.173602 seconds RemainingLength=0 &qj=0xc000a7a800 Version=29878 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.287149 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.288074 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.292217 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.292235 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29878 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.292282 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.292299 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.292307 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.312848 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.313148 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000754800 Version=29878 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.344297 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 31.470002ms I0831 06:38:48.344322 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.344344 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.344350 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.344364 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.344382 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000a7a800 Version=29878 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.344505 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.344780 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.231292 seconds RemainingLength=0 &qj=0xc000752800 Version=29879 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.344830 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.347884 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.349107 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.349121 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000a7a800 Version=29878 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.369212 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.369232 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29880 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.369263 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.369278 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.369287 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:48.369560 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:48.405535 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.51083ms I0831 06:38:48.405568 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.405592 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.405603 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.405612 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.405632 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000752800 Version=29880 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.405739 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.405720 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.292235 seconds RemainingLength=0 &qj=0xc0008a9400 Version=29880 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.405826 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:48.436891 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:48.436972 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.437007 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000752800 Version=29880 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.437011 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.323523 seconds RemainingLength=0 &qj=0xc000754400 Version=29881 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.437053 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.447898 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.453977 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.454031 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29882 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.454093 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.454121 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.454137 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:48.454001 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:48.483432 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 27.490732ms I0831 06:38:48.483458 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.483472 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.483478 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.483487 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.483504 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000754400 Version=29882 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.483571 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.370086 seconds RemainingLength=0 &qj=0xc000755c00 Version=29882 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.483615 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.483642 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.487735 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.487774 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29882 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.487797 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.487810 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.487821 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.495639 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.495653 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000754400 Version=29882 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.525129 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 29.045617ms I0831 06:38:48.525152 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.525168 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.525174 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.525183 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.525225 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000755c00 Version=29882 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.525340 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.411854 seconds RemainingLength=0 &qj=0xc000079800 Version=29883 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.525369 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.525371 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.549829 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.549846 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29884 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.549870 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.549885 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.549899 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:48.550195 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:48.570562 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.587483 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 35.59004ms I0831 06:38:48.587505 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.587519 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.587525 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.587547 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.587561 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000079800 Version=29884 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.587636 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.474151 seconds RemainingLength=0 &qj=0xc00044b400 Version=29884 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.587662 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.587675 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.600491 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.600528 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000754000 Version=29876 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:48.600812 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:48.600858 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.487372 seconds RemainingLength=0 &qj=0xc000734000 Version=29885 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.600884 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:48.613370 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:48.619413 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.619433 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29886 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.619478 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.619496 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.619512 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.651326 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.653304 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 31.826538ms I0831 06:38:48.653325 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.653339 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.653345 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.653354 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.653368 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000734000 Version=29886 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.653433 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.653444 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.539956 seconds RemainingLength=0 &qj=0xc000734c00 Version=29886 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.653477 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.669638 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.669654 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000755c00 Version=29882 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:48.682597 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. W0831 06:38:48.682593 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:48.682847 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.569359 seconds RemainingLength=0 &qj=0xc0007e0800 Version=29887 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.682922 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.700091 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.700170 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29888 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.700202 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.700224 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.700240 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.713602 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.732696 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.732714 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000079800 Version=29884 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.737221 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 35.178589ms I0831 06:38:48.737280 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.737311 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.737324 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.737339 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.737367 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007e0800 Version=29888 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.737476 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.737557 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.624068 seconds RemainingLength=0 &qj=0xc000730000 Version=29889 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.737610 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.741665 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.741679 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007e0800 Version=29888 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.748957 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.748975 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29890 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.748999 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.749022 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.749032 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.782817 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.784239 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 32.788362ms I0831 06:38:48.784266 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.784284 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.784290 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.784300 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.784314 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000730000 Version=29890 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.784394 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.784426 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.670936 seconds RemainingLength=0 &qj=0xc000730400 Version=29890 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.784482 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.802491 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.802561 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000734000 Version=29886 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:48.814972 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. W0831 06:38:48.814969 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:48.815192 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.701705 seconds RemainingLength=0 &qj=0xc0007fd400 Version=29891 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.815271 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.833236 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.833322 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29892 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.833368 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.833400 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.833416 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.854376 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.865729 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.865795 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0008aa400 Version=29872 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.868768 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 33.355337ms I0831 06:38:48.868787 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.868801 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.868806 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.868816 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.868829 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007fd400 Version=29892 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.868907 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.868903 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.755418 seconds RemainingLength=0 &qj=0xc0007f2000 Version=29893 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.869015 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.894950 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.894969 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29894 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.894994 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.895015 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.895032 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:48.895036 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:48.916876 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.933099 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 36.203396ms I0831 06:38:48.933125 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.933141 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.933147 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:48.933162 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:48.933181 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007f2000 Version=29894 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.933298 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.933659 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.820171 seconds RemainingLength=0 &qj=0xc000734c00 Version=29894 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.933691 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.938169 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:48.938193 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000730000 Version=29890 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:48.951649 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:48.951739 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.838247 seconds RemainingLength=0 &qj=0xc0007e0c00 Version=29895 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:48.951803 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:48.952112 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:48.963621 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:48.963711 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29896 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.963752 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:48.963794 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:48.963811 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:48.995498 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.004639 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 38.676926ms I0831 06:38:49.004663 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.004679 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.004685 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.004694 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.004715 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007e0c00 Version=29896 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.004837 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.004880 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.891390 seconds RemainingLength=0 &qj=0xc0008ab000 Version=29896 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.004949 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:49.022226 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:49.022346 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.022361 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007fd400 Version=29892 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.022416 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.908929 seconds RemainingLength=0 &qj=0xc000762400 Version=29897 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.022461 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:49.042995 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:49.053063 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.056926 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.056948 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29898 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.056970 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.056985 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.056993 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:49.057630 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:49.095679 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 35.741061ms I0831 06:38:49.095706 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.095720 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.095726 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.095736 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.095749 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000762400 Version=29898 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.095846 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.095883 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=1.982363 seconds RemainingLength=0 &qj=0xc000731800 Version=29898 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.095953 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.132199 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.132307 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000762400 Version=29898 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:49.132470 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:49.132594 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.019086 seconds RemainingLength=0 &qj=0xc0007fd000 Version=29901 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.132666 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.143685 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.153065 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.153091 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29902 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.153130 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.153153 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.153169 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:49.153285 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:49.188651 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 32.113661ms I0831 06:38:49.188738 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.188764 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.188778 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.188795 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.188817 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007fd000 Version=29902 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.188905 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.075418 seconds RemainingLength=0 &qj=0xc00044b400 Version=29902 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.188968 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.188927 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.194587 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.194610 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29902 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.194634 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.194650 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.194663 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.208078 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.208095 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007fd000 Version=29902 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.239956 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 31.723623ms I0831 06:38:49.239983 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.239999 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.240005 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.240025 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.240040 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00044b400 Version=29902 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.240142 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.241666 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.128177 seconds RemainingLength=0 &qj=0xc0007f3000 Version=29903 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.241722 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.244961 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.244971 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00044b400 Version=29902 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.253202 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.253218 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29904 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.253239 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.253266 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.253277 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.258686 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.282213 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.282229 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007f2000 Version=29894 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.290451 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.970653ms I0831 06:38:49.290470 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.290484 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.290490 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.290499 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.290521 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007f3000 Version=29904 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.290613 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.290649 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.177156 seconds RemainingLength=0 &qj=0xc000a7a400 Version=29905 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.290681 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.294318 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.294330 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007f3000 Version=29904 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.310142 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.310157 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29906 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.310176 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.310192 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.310201 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.343077 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 29.701417ms I0831 06:38:49.343116 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.343132 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.343138 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.343158 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.343172 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000a7a400 Version=29906 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.343299 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.343306 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.229816 seconds RemainingLength=0 &qj=0xc000753c00 Version=29906 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.343365 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.353014 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.353031 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29906 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.353055 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.353069 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.353078 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.354980 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.367358 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.367378 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000a7a400 Version=29906 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:49.381709 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:49.433015 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 65.493593ms I0831 06:38:49.433043 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.433060 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.433066 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.433106 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.433122 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000753c00 Version=29906 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.433216 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.433232 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.319735 seconds RemainingLength=0 &qj=0xc0002b4400 Version=29907 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.433308 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.437887 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.437900 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000753c00 Version=29906 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.466966 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.467058 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29909 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.467104 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.467127 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.467151 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.500068 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 30.889084ms I0831 06:38:49.500095 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.500111 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.500117 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.500136 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.500150 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0002b4400 Version=29909 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.500257 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.500278 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.386780 seconds RemainingLength=0 &qj=0xc0007fc000 Version=29909 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.500328 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.520814 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.520889 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0002b4400 Version=29909 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:49.520903 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:49.520981 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.407494 seconds RemainingLength=0 &qj=0xc000731000 Version=29910 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.521029 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.532562 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.532607 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29911 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.532636 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.532650 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.532663 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.569669 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.755053ms I0831 06:38:49.569698 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.569725 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.569731 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.569752 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.569767 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000731000 Version=29911 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.569900 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.569928 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.456437 seconds RemainingLength=0 &qj=0xc000755400 Version=29911 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.569986 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.596191 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.596208 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000731000 Version=29911 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:49.596291 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:49.596348 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.482863 seconds RemainingLength=0 &qj=0xc00080cc00 Version=29912 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.596377 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.614679 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.614698 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29913 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.614726 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.614742 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.614752 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.651999 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.680876ms I0831 06:38:49.652207 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.652236 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.652251 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.652268 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.652291 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00080cc00 Version=29913 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.652398 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.538912 seconds RemainingLength=0 &qj=0xc0007f3800 Version=29913 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.652437 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.652445 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.658095 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.658121 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29913 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.658151 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.658170 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.658183 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.673579 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.673594 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00080cc00 Version=29913 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.708203 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.073267ms I0831 06:38:49.708234 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.708261 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.708269 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.708282 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.708300 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007f3800 Version=29913 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.708413 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.594925 seconds RemainingLength=0 &qj=0xc000752c00 Version=29915 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.708453 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.708453 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.728604 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.728636 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29916 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.728670 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.728685 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.728702 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:49.729267 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:49.765516 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.307025ms I0831 06:38:49.765542 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.765561 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.765567 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.765583 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.765598 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000752c00 Version=29916 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.765676 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.652189 seconds RemainingLength=0 &qj=0xc00071e000 Version=29916 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.765701 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.765704 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.771997 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.772021 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29916 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.772050 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.772078 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.772087 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.780918 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.780933 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000752c00 Version=29916 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.782020 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.788642 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.788741 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007e0c00 Version=29896 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.815176 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.197395ms I0831 06:38:49.815272 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.815299 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.815313 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.815337 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.815359 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00071e000 Version=29916 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.815458 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.701972 seconds RemainingLength=0 &qj=0xc00044a000 Version=29917 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.815510 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.815485 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.829664 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' W0831 06:38:49.842372 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. W0831 06:38:49.842438 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:49.842767 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.842788 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29918 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.842843 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.842883 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.842899 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.880619 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 35.240296ms I0831 06:38:49.880720 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.880754 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.880769 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.880786 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.880822 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00044a000 Version=29918 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.880933 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.881001 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.767514 seconds RemainingLength=0 &qj=0xc000a7b400 Version=29918 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.881050 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.886153 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.886174 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29918 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.886203 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.886218 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.886227 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.900943 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.900972 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00044a000 Version=29918 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.936925 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 35.4147ms I0831 06:38:49.936959 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.936976 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.936983 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.936994 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.937009 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000a7b400 Version=29918 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.937092 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.823606 seconds RemainingLength=0 &qj=0xc000731400 Version=29919 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.937137 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.937135 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.942797 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.943173 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.943219 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000a7b400 Version=29918 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:49.958442 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:49.958471 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:49.958484 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29920 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.958513 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:49.958533 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.958545 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:49.998331 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 37.377102ms I0831 06:38:49.998374 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.998393 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:49.998399 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:49.998410 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:49.998434 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000731400 Version=29920 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.998523 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:49.998777 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.885289 seconds RemainingLength=0 &qj=0xc000754c00 Version=29920 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:49.998809 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.024853 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.025018 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000731400 Version=29920 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:50.025557 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:50.025679 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.912193 seconds RemainingLength=0 &qj=0xc0007f2400 Version=29923 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.025717 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.042760 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.057312 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.057336 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29924 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.057372 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} W0831 06:38:50.057318 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:50.057396 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.057434 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.094942 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 35.221002ms I0831 06:38:50.094968 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.094984 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.094990 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.095010 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.095025 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007f2400 Version=29924 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.095119 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.095130 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=2.981640 seconds RemainingLength=0 &qj=0xc000735c00 Version=29924 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.095203 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.100174 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.100188 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29924 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.100209 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.100223 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.100231 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.108183 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.108206 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007f2400 Version=29924 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.142811 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 35.714122ms I0831 06:38:50.142908 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.142941 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.142957 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.142974 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.142996 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000735c00 Version=29924 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.143108 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.143385 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.029898 seconds RemainingLength=0 &qj=0xc00071e400 Version=29925 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.143450 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.148358 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.148372 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000735c00 Version=29924 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.159367 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' W0831 06:38:50.175299 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:50.175556 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.175606 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29926 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.175647 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.175785 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.175818 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.210272 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 32.417214ms I0831 06:38:50.210411 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.210443 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.210461 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.210480 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.210502 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00071e400 Version=29926 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.210619 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.210915 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.097421 seconds RemainingLength=0 &qj=0xc000808000 Version=29926 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.210994 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:50.230139 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:50.230205 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.230219 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00071e400 Version=29926 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.230552 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.117064 seconds RemainingLength=0 &qj=0xc000755800 Version=29927 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.230635 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.250488 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.250591 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29928 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.250669 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.250697 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.250814 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.292964 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.431493ms I0831 06:38:50.293041 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.293079 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.293100 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.293126 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.293147 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000755800 Version=29928 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.293276 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.293286 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.179796 seconds RemainingLength=0 &qj=0xc0007e0400 Version=29928 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.293341 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.311998 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' W0831 06:38:50.312184 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:50.312236 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.198750 seconds RemainingLength=0 &qj=0xc00080c400 Version=29930 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.312273 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.312014 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000755800 Version=29928 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.323523 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.323540 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29931 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.323568 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.323594 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.323606 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.360949 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 35.100669ms I0831 06:38:50.361022 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.361047 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.361061 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.361078 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.361099 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00080c400 Version=29931 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.361201 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.361211 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.247721 seconds RemainingLength=0 &qj=0xc000898800 Version=29931 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.361292 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.392010 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.392033 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00080c400 Version=29931 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:50.392120 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:50.392377 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.278891 seconds RemainingLength=0 &qj=0xc000762800 Version=29932 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.392406 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.498166 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.498642 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.498659 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29933 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.498686 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.498705 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.498716 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:50.502874 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:50.536379 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 33.343791ms I0831 06:38:50.536406 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.536422 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.536428 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.536438 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.536453 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000762800 Version=29933 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.536539 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.423053 seconds RemainingLength=0 &qj=0xc0007fc000 Version=29933 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.536566 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.536567 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' W0831 06:38:50.556245 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:50.556455 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.442968 seconds RemainingLength=0 &qj=0xc00075e000 Version=29934 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.556520 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.556909 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.556948 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000762800 Version=29933 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.567478 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.567559 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29935 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.567675 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.567700 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.567716 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.575360 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.594139 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.594157 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00071e000 Version=29916 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.600881 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 30.282835ms I0831 06:38:50.600898 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.600913 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.600923 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.600933 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.600951 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00075e000 Version=29935 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.601032 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.601026 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.487537 seconds RemainingLength=0 &qj=0xc00083a000 Version=29935 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.601070 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.606077 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.606132 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00075e000 Version=29935 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.620544 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.620613 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29937 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.620654 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.620682 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.620698 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.653038 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 30.480545ms I0831 06:38:50.653063 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.653079 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.653090 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.653105 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.653119 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00083a000 Version=29937 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.653208 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.653283 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.539794 seconds RemainingLength=0 &qj=0xc0007e0000 Version=29937 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.653340 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.657600 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.657614 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29937 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.657633 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.657646 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.657667 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.672569 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.672597 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00083a000 Version=29937 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.700942 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 29.03925ms I0831 06:38:50.700964 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.700979 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.700985 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.701006 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.701019 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007e0000 Version=29937 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.701112 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.701134 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.587644 seconds RemainingLength=0 &qj=0xc00080d400 Version=29938 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.701197 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:50.719663 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:50.719688 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.719701 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29939 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.719723 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.719740 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.719751 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.754986 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 33.440235ms I0831 06:38:50.755024 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.755044 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.755050 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.755060 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.755073 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00080d400 Version=29939 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.755161 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.641676 seconds RemainingLength=0 &qj=0xc0008ab000 Version=29939 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.755198 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.755223 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.760624 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.760638 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29939 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.760661 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.760690 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.760699 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.768296 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.768341 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00080d400 Version=29939 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.798129 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 29.717178ms I0831 06:38:50.798153 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.798168 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.798174 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.798193 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.798207 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0008ab000 Version=29939 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.798281 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.684796 seconds RemainingLength=0 &qj=0xc000730400 Version=29940 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.798313 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.798314 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.806979 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.809553 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0008ab000 Version=29939 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.819951 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.828685 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.828731 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29941 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.828761 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.828783 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.828800 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:50.828933 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:50.865332 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.582299ms I0831 06:38:50.865361 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.865378 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.865384 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.865403 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.865416 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000730400 Version=29941 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.865522 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.865819 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.752332 seconds RemainingLength=0 &qj=0xc000763c00 Version=29941 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.865858 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.876500 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.876517 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29941 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.876556 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.876597 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.876613 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.897309 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.897324 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000730400 Version=29941 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.925439 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 28.627416ms I0831 06:38:50.925475 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.925490 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.925497 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.925511 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.925525 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000763c00 Version=29941 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.925621 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.925660 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.812161 seconds RemainingLength=0 &qj=0xc00044a800 Version=29943 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.925714 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.930046 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.930071 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000763c00 Version=29941 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.945601 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.945622 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29944 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.945646 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:50.945663 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.945681 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:50.979944 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 32.403941ms I0831 06:38:50.979980 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.979999 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:50.980007 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:50.980019 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:50.980037 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00044a800 Version=29944 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.980129 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.866643 seconds RemainingLength=0 &qj=0xc000752000 Version=29944 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:50.980151 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.980166 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:50.998011 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:50.998040 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00044a800 Version=29944 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:50.998765 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:51.000190 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.886702 seconds RemainingLength=0 &qj=0xc000763400 Version=29945 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.000233 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.022112 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.022248 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29947 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.022276 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.022291 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.022302 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.029851 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.051210 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.051231 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007e0000 Version=29937 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.056225 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 31.948322ms I0831 06:38:51.056248 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.056263 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.056269 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.056278 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.056291 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000763400 Version=29947 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.056363 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.056635 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.942894 seconds RemainingLength=0 &qj=0xc000734800 Version=29948 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.057187 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.064148 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.064218 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000763400 Version=29947 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.076953 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.076977 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29949 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.077005 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.077025 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.077036 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.112596 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 30.894592ms I0831 06:38:51.112632 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.112658 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.112665 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.112675 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.112700 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000734800 Version=29949 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.112810 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.112822 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=3.999331 seconds RemainingLength=0 &qj=0xc0007e1000 Version=29949 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.112927 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.131378 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.131393 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000734800 Version=29949 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:51.131661 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:51.131753 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.018267 seconds RemainingLength=0 &qj=0xc00044bc00 Version=29950 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.131805 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.142738 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.142762 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29951 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.142790 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.142811 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.142821 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.182781 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 37.905412ms I0831 06:38:51.182810 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.182829 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.182835 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.182845 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.182861 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00044bc00 Version=29951 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.182965 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.182999 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.069484 seconds RemainingLength=0 &qj=0xc0007f3400 Version=29951 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.183067 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.187552 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.187583 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29951 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.187620 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.187635 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.187645 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.203062 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.203080 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00044bc00 Version=29951 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.254436 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 51.648643ms I0831 06:38:51.254468 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.254491 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.254503 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.254513 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.254529 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007f3400 Version=29951 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.254648 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.141156 seconds RemainingLength=0 &qj=0xc00071f000 Version=29953 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.254721 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.255633 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.266759 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.266778 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29954 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.266803 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.266818 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.266828 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:51.267950 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:51.303064 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.303508 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.350515ms I0831 06:38:51.303548 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.303569 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.303586 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.303598 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.303618 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00071f000 Version=29954 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.303700 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.303693 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.190207 seconds RemainingLength=0 &qj=0xc000763800 Version=29954 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.303733 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:51.338353 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:51.338553 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.338567 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007f3800 Version=29913 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:51.338708 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:51.338800 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.225313 seconds RemainingLength=0 &qj=0xc0009c8000 Version=29955 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.338849 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.358019 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.358043 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29956 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.358082 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.358097 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.358106 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.368763 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.380175 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.380194 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007f3400 Version=29951 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.406276 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 45.896199ms I0831 06:38:51.406308 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.406327 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.406341 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.406355 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.406371 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0009c8000 Version=29956 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.406502 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.406617 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.293127 seconds RemainingLength=0 &qj=0xc0009c8400 Version=29957 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.406691 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.412013 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.412039 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0009c8000 Version=29956 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.427430 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.427461 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29958 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.427505 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.427531 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.427542 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.438546 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.458164 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.458182 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00071f000 Version=29954 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.465706 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 36.131881ms I0831 06:38:51.465760 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.465784 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.465799 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.465816 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.465843 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0009c8400 Version=29958 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.465919 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.352433 seconds RemainingLength=0 &qj=0xc0006f2800 Version=29959 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.465970 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.465944 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' W0831 06:38:51.489040 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:51.489195 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.489244 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29960 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.489295 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.489319 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.489349 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.524904 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 33.387398ms I0831 06:38:51.524931 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.524948 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.524954 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.524976 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.524991 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0006f2800 Version=29960 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.525100 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.525117 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.411622 seconds RemainingLength=0 &qj=0xc0009c8c00 Version=29960 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.525176 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.529844 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.529859 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29960 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.529882 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.529899 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.529908 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.551392 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.551411 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0006f2800 Version=29960 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.589369 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.603227 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.603305 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0009c8400 Version=29958 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.607388 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 55.825112ms I0831 06:38:51.607436 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.607451 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.607458 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.607468 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.607481 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0009c8c00 Version=29960 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.607551 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.607568 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.494079 seconds RemainingLength=0 &qj=0xc000754c00 Version=29961 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.607624 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.618532 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.618545 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29962 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.618565 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.618589 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.618596 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:51.618815 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:51.650434 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 29.487009ms I0831 06:38:51.650458 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.650474 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.650480 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.650500 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.650513 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000754c00 Version=29962 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.650628 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.650654 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.537163 seconds RemainingLength=0 &qj=0xc000763800 Version=29962 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.650748 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.655828 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.655842 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29962 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.655875 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.655892 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.655901 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.671003 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.671054 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000754c00 Version=29962 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.719418 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.720000 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 49.789482ms I0831 06:38:51.720047 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.720069 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.720082 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.720115 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.720147 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000763800 Version=29962 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.720241 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.720320 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.606830 seconds RemainingLength=0 &qj=0xc0007fdc00 Version=29963 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.720391 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.724933 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.724973 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0009c8c00 Version=29960 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.725036 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.725049 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000763800 Version=29962 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.741398 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.741418 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29965 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.741443 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.741457 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.741467 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.773729 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 30.408587ms I0831 06:38:51.773793 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.773817 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.773830 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.773846 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.773867 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007fdc00 Version=29965 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.773948 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.773952 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.660462 seconds RemainingLength=0 &qj=0xc000731800 Version=29965 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.774023 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.791422 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.791440 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007fdc00 Version=29965 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:51.791716 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:51.791842 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.678347 seconds RemainingLength=0 &qj=0xc00083b800 Version=29966 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.791885 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.825171 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.825197 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29967 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.825236 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.825262 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.825273 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.859123 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 30.986539ms I0831 06:38:51.859216 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.859246 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.859260 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.859280 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.859301 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00083b800 Version=29967 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.859421 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.859433 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.745942 seconds RemainingLength=0 &qj=0xc00080d800 Version=29967 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.859483 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.864042 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.864057 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29967 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.864101 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.864116 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.864124 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.871929 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.871949 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00083b800 Version=29967 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.908520 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 36.031289ms I0831 06:38:51.908683 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.908729 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.908766 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.908795 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.908834 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00080d800 Version=29967 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.908965 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.795476 seconds RemainingLength=0 &qj=0xc0008aa800 Version=29968 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.909056 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.908998 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.914427 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:51.914460 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00080d800 Version=29967 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.942544 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.942689 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29969 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.942791 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:51.942855 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.942909 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:51.988817 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 42.475734ms I0831 06:38:51.988848 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.988868 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:51.988874 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:51.988884 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:51.988901 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0008aa800 Version=29969 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.988995 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.875509 seconds RemainingLength=0 &qj=0xc000734c00 Version=29969 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:51.989045 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:51.989047 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' W0831 06:38:52.008063 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:52.008138 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.894651 seconds RemainingLength=0 &qj=0xc00071e800 Version=29970 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.008172 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.008266 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.008292 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0008aa800 Version=29969 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.027270 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.027294 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29971 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.027333 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.027351 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.027365 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.069815 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 33.390556ms I0831 06:38:52.069914 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.069944 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.069963 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.069983 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.070011 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00071e800 Version=29971 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.070131 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.070139 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.956650 seconds RemainingLength=0 &qj=0xc000755400 Version=29971 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.070218 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.089368 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.089446 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00071e800 Version=29971 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:52.089619 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:52.089682 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=4.976196 seconds RemainingLength=0 &qj=0xc0008a9800 Version=29974 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.089713 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.107858 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.107878 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29975 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.107907 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.107923 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.107934 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.143301 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 33.144768ms I0831 06:38:52.143388 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.143414 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.143429 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.143452 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.143478 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0008a9800 Version=29975 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.143601 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.143604 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.030113 seconds RemainingLength=0 &qj=0xc0008a9c00 Version=29975 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.143659 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.148180 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.148193 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29975 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.148212 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.148235 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.148245 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.155977 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.155990 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0008a9800 Version=29975 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.196438 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 40.198449ms I0831 06:38:52.196467 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.196484 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.196490 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.196511 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.196533 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0008a9c00 Version=29975 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.196651 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.196676 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.083164 seconds RemainingLength=0 &qj=0xc00044b800 Version=29976 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.196750 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.201161 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.201180 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0008a9c00 Version=29975 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.217491 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.217506 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29977 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.217528 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.217560 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.217571 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.248023 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 28.627663ms I0831 06:38:52.248045 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.248060 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.248066 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.248086 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.248099 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00044b800 Version=29977 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.248250 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.248232 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.134729 seconds RemainingLength=0 &qj=0xc0007f2800 Version=29977 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.248311 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.298251 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.298275 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00044b800 Version=29977 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:52.298445 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:52.298527 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.185040 seconds RemainingLength=0 &qj=0xc00075ec00 Version=29978 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.298605 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.316564 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.316636 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29979 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.316691 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.316717 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.316735 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.348401 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 29.494689ms I0831 06:38:52.348423 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.348439 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.348445 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.348455 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.348469 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00075ec00 Version=29979 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.348580 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.349681 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.236181 seconds RemainingLength=0 &qj=0xc0006ea400 Version=29979 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.349753 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.354272 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.354287 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29979 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.354327 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.354357 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.354364 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.363863 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.363882 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00075ec00 Version=29979 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.396243 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 31.734345ms I0831 06:38:52.396268 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.396284 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.396290 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.396299 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.396335 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0006ea400 Version=29979 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.396433 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.396671 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.283184 seconds RemainingLength=0 &qj=0xc00071ec00 Version=29980 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.396906 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.426332 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.426353 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29981 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.426382 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.426396 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.426407 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:52.426563 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:52.456946 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 28.899557ms I0831 06:38:52.456967 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.456985 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.456991 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.457001 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.457016 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00071ec00 Version=29981 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.457113 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.457316 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.343830 seconds RemainingLength=0 &qj=0xc0008a9400 Version=29981 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.457359 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:52.475801 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:52.476114 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.362626 seconds RemainingLength=0 &qj=0xc000763000 Version=29982 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.476202 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.476524 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.476581 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00071ec00 Version=29981 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.495406 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.495463 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29983 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.495516 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.495540 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.495563 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.526669 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.527649 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 30.544881ms I0831 06:38:52.527694 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.527717 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.527737 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.527753 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.527774 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000763000 Version=29983 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.527869 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.527869 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.414383 seconds RemainingLength=0 &qj=0xc0007fc000 Version=29983 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.527929 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.559741 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.559801 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000763000 Version=29983 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:52.567026 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. W0831 06:38:52.567681 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:52.567784 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.454297 seconds RemainingLength=0 &qj=0xc00075f400 Version=29984 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.567824 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.592703 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.592723 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29985 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.592748 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.592762 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.592772 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.621562 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 27.036438ms I0831 06:38:52.621673 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.621707 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.621720 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.621882 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.621915 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00075f400 Version=29985 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.621988 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.508503 seconds RemainingLength=0 &qj=0xc00075f800 Version=29985 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.622045 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.622011 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.626206 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.626220 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29985 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.626242 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.626254 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.626263 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.653276 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.653296 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00075f400 Version=29985 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.683480 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 30.381739ms I0831 06:38:52.683508 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.683523 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.683529 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.683540 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.683555 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00075f800 Version=29985 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.683658 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.683679 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.570177 seconds RemainingLength=0 &qj=0xc000752400 Version=29986 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.683743 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:52.704408 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:52.704481 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.704502 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29987 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.704533 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.704550 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.704560 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.763170 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 47.641044ms I0831 06:38:52.763198 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.763218 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.763226 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.763240 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.763259 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000752400 Version=29987 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.763747 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.650258 seconds RemainingLength=0 &qj=0xc000730800 Version=29987 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.763819 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.764148 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.767170 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.771066 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.771082 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29987 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.771106 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.771123 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.771133 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.787438 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.787458 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000752400 Version=29987 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:52.800470 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:52.804808 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.813818 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.813838 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00075f800 Version=29985 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.847795 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 45.580884ms I0831 06:38:52.847824 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.847843 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.847850 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.847860 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.847875 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000730800 Version=29987 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.848047 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.848490 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.735000 seconds RemainingLength=0 &qj=0xc0008ab400 Version=29990 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.848557 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.852911 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.852979 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000730800 Version=29987 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.881378 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.881500 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29992 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.881550 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.881607 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.881625 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:52.923989 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 35.018523ms I0831 06:38:52.924018 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.924036 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.924043 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:52.924055 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:52.924071 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0008ab400 Version=29992 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.924149 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.810663 seconds RemainingLength=0 &qj=0xc000892000 Version=29992 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.924177 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.924217 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.944074 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:52.944096 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0008ab400 Version=29992 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:52.944669 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:52.945396 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.831907 seconds RemainingLength=0 &qj=0xc0007f2800 Version=29994 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:52.945524 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.957616 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:52.957751 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29995 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.958087 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:52.958155 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:52.958176 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.009937 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 48.979662ms I0831 06:38:53.009979 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.010000 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.010007 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.010018 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.010035 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007f2800 Version=29995 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.010839 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.011531 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.898042 seconds RemainingLength=0 &qj=0xc000a7b000 Version=29995 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.012070 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:53.037764 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:53.037832 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.924343 seconds RemainingLength=0 &qj=0xc00075fc00 Version=29996 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.038022 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.038057 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007f2800 Version=29995 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.039002 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.057812 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.057839 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=29997 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.057881 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.057898 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.057908 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.107956 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 47.297672ms I0831 06:38:53.108064 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.108093 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.108108 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.108158 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.108255 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00075fc00 Version=29997 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.108661 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.109021 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=5.995533 seconds RemainingLength=0 &qj=0xc00072c400 Version=29997 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.109111 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.127929 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.127952 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00075fc00 Version=29997 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:53.128413 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:53.128474 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.014988 seconds RemainingLength=0 &qj=0xc00080c000 Version=29998 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.128506 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.148313 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.148338 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30000 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.148378 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.148395 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.148404 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.188627 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 37.682515ms I0831 06:38:53.189049 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.189614 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.189650 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.189678 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.189713 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00080c000 Version=30000 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.189913 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.076408 seconds RemainingLength=0 &qj=0xc0007e0400 Version=30000 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.189996 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.190324 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.200856 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.210805 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.211647 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00080c000 Version=30000 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:53.229622 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. W0831 06:38:53.229681 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:53.229754 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.116267 seconds RemainingLength=0 &qj=0xc000712800 Version=30001 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.229812 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.252063 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.252767 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30003 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.253115 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.253179 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.253199 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.292084 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 35.940293ms I0831 06:38:53.292189 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.292237 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.292254 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.292281 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.292306 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000712800 Version=30003 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.292438 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.293568 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.180078 seconds RemainingLength=0 &qj=0xc00085c000 Version=30003 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.293682 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.298676 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.298774 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30003 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.298812 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.298845 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.298870 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.311789 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.311891 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000712800 Version=30003 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.348022 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 34.568523ms I0831 06:38:53.348052 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.348072 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.348079 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.348090 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.348106 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00085c000 Version=30003 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.348208 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.348331 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.234837 seconds RemainingLength=0 &qj=0xc0007fcc00 Version=30005 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.348428 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:53.368240 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:53.369415 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.369434 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30006 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.369472 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.369497 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.369518 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.405054 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 33.038927ms I0831 06:38:53.405089 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.405107 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.405113 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.405133 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.405148 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007fcc00 Version=30006 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.405249 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.405273 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.291781 seconds RemainingLength=0 &qj=0xc00083a400 Version=30006 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.405335 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.414044 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.414137 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30006 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.414200 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.414229 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.414251 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.429205 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.429294 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0007fcc00 Version=30006 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.462083 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 32.355329ms I0831 06:38:53.462112 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.462130 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.462137 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.462147 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.462163 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00083a400 Version=30006 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.462252 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.462324 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.348834 seconds RemainingLength=0 &qj=0xc0000a0c00 Version=30007 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.462380 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.467547 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.467640 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00083a400 Version=30006 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.468482 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' W0831 06:38:53.483625 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:53.483835 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.483864 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30008 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.483908 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.483975 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.483996 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.519655 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 32.887775ms I0831 06:38:53.519759 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.519796 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.519812 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.519831 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.519855 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0000a0c00 Version=30008 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.519971 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.520176 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.406684 seconds RemainingLength=0 &qj=0xc0008a8c00 Version=30008 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.521030 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.533304 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.533326 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0000a0c00 Version=30008 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:53.534211 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:53.535175 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.421687 seconds RemainingLength=0 &qj=0xc00071f400 Version=30009 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.535216 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.554800 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.554823 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30010 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.554852 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.554868 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.554878 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.600410 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 42.552097ms I0831 06:38:53.600520 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.600555 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.600571 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.600608 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.600641 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00071f400 Version=30010 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.600737 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.487250 seconds RemainingLength=0 &qj=0xc000a7b000 Version=30010 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.600786 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.600757 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.606430 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.606509 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30010 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.606564 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.607000 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.607027 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.620816 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.620900 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00071f400 Version=30010 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.657666 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 35.269044ms I0831 06:38:53.657803 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.657855 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.657884 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.657923 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.657955 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000a7b000 Version=30010 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.658120 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.658499 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.545011 seconds RemainingLength=0 &qj=0xc00080d000 Version=30011 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.660598 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.665254 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.665276 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000a7b000 Version=30010 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.683807 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' W0831 06:38:53.693217 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:53.693757 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.693898 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30012 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.694141 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.694194 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.694227 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.739488 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 43.227257ms I0831 06:38:53.739698 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.739788 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.739834 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.739868 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.739924 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00080d000 Version=30012 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.740189 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.740797 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.627294 seconds RemainingLength=0 &qj=0xc000712c00 Version=30012 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.740877 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.746808 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.746830 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30012 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.746865 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.746880 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.746889 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.761867 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.761969 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00080d000 Version=30012 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.799485 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 37.948587ms I0831 06:38:53.799525 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.799554 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.799564 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.799583 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.799600 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000712c00 Version=30012 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.799709 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.799753 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.686260 seconds RemainingLength=0 &qj=0xc00083b400 Version=30014 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.799824 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.807207 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.807227 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000712c00 Version=30012 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.814346 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.815140 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30015 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.815201 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.815395 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.815471 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.849853 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 32.171376ms I0831 06:38:53.850016 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.850055 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.850079 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.850117 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.850142 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc00083b400 Version=30015 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.850251 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.853715 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.740220 seconds RemainingLength=0 &qj=0xc0006eb800 Version=30015 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.853793 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.870526 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.870558 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00083b400 Version=30015 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:53.871480 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:53.871975 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.758483 seconds RemainingLength=0 &qj=0xc000755000 Version=30016 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.872495 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.892612 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.892633 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30017 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.892670 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.892687 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.892698 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:53.951110 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 55.071766ms I0831 06:38:53.951225 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.951300 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.951564 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:53.952172 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:53.953163 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000755000 Version=30017 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.953468 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.954088 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.840599 seconds RemainingLength=0 &qj=0xc000731800 Version=30017 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.954276 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.978888 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:53.978983 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000755000 Version=30017 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:53.979626 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:53.979774 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.866287 seconds RemainingLength=0 &qj=0xc0007e1c00 Version=30018 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:53.979840 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.996482 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:53.996608 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30019 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.996699 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:53.998224 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:53.998690 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:54.030161 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.046842 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 43.725663ms I0831 06:38:54.046870 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:54.046891 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:54.046897 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:54.046912 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:54.046931 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0007e1c00 Version=30019 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.047040 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.047326 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.933836 seconds RemainingLength=0 &qj=0xc000731c00 Version=30019 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.047372 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:54.059172 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.060928 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0006ea400 Version=29979 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} W0831 06:38:54.070645 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. W0831 06:38:54.070748 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:54.070761 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=6.957267 seconds RemainingLength=0 &qj=0xc0006ea800 Version=30020 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.070846 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:54.090765 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:54.090784 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30021 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:54.090810 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:54.090827 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:54.090837 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:54.093810 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.121487 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.121518 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc00085c000 Version=30003 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.159692 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 59.056428ms I0831 06:38:54.159720 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:54.159738 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:54.159745 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:54.159763 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:54.159785 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0006ea800 Version=30021 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.159913 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.159897 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=7.046410 seconds RemainingLength=0 &qj=0xc0006eb400 Version=30023 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.159937 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:54.171060 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' W0831 06:38:54.186083 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:54.187899 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:54.187927 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30026 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:54.188609 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:54.188830 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:54.188849 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:54.189916 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:54.238043 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 45.120735ms I0831 06:38:54.238072 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:54.238090 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:54.238098 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:54.238109 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:54.238125 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc0006eb400 Version=30026 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.238226 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.238310 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=7.124785 seconds RemainingLength=0 &qj=0xc00075e800 Version=30026 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.238498 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' W0831 06:38:54.272479 1 queuejob_controller_ex.go:1102] [ScheduleNext] Conflict error detected when updating status in etcd for app wrapper 'test-ns-grslq/mnist, status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} this may be due to appwrapper deletion. I0831 06:38:54.272746 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=7.159257 seconds RemainingLength=0 &qj=0xc000734400 Version=30027 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.272788 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:54.272898 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.272909 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0006eb400 Version=30026 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.296244 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:54.296272 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30028 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:54.296301 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:54.296322 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:54.296342 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 I0831 06:38:54.301658 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.322597 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.322616 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc0006ea800 Version=30021 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.351729 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 53.165052ms I0831 06:38:54.351758 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:54.351778 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:54.351785 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:54.351796 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:54.351814 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000734400 Version=30028 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.351955 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.351953 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=7.238459 seconds RemainingLength=0 &qj=0xc000734c00 Version=30029 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Backoff ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before [backoff] - Rejoining Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.352015 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:54.358537 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.358559 1 queuejob_controller_ex.go:1487] [backoff] mnist move to unschedulableQ before sleep for 20 seconds. activeQ=false Unsched=true &qj=0xc000734400 Version=30028 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.387381 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.387874 1 queuejob_controller_ex.go:1400] [updateStatusInEtcd] update success 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL' I0831 06:38:54.387924 1 queuejob_controller_ex.go:1111] [ScheduleNext] after Pop qjqLength=0 qj mnist Version=30030 activeQ=false Unsched=true Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:54.387981 1 queuejob_controller_ex.go:1115] [ScheduleNext] [Agent Mode] Attempting to dispatch next appwrapper: 'test-ns-grslq/mnist' Status={0 0 0 0 0 false false Pending 0 HeadOfLine 2023-08-31 06:38:47.113484 +0000 UTC 0001-01-01 00:00:00 +0000 UTC true before ScheduleNext - setHOL false [{Init True 2023-08-31 06:38:47.113484 +0000 UTC 2023-08-31 06:38:47.113484 +0000 UTC } {Queueing True 2023-08-31 06:38:47.113505 +0000 UTC 2023-08-31 06:38:47.113505 +0000 UTC AwaitingHeadOfLine } {HeadOfLine True 2023-08-31 06:38:47.146786 +0000 UTC 2023-08-31 06:38:47.146786 +0000 UTC FrontOfQueue. } {Backoff True 2023-08-31 06:38:47.250391 +0000 UTC 2023-08-31 06:38:47.250391 +0000 UTC AppWrapperNotRunnable. Insufficient resources to dispatch AppWrapper.}] [] 0 0 0 0 0} I0831 06:38:54.388011 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:54.388028 1 queuejob_controller_ex.go:1170] [ScheduleNext] [Agent Mode] Forwarding loop iteration: 1 W0831 06:38:54.392263 1 queuejob_controller_ex.go:1477] [backoff] Conflict when upating AW status in etcd 'test-ns-grslq/mnist'. Retrying. I0831 06:38:54.434117 1 queuejob_controller_ex.go:231] [allocatableCapacity] The avaible capacity to dispatch appwrapper is %v and time took to calculate is %vcpu 1380.00, memory 24521711616.00, GPU 0 42.715908ms I0831 06:38:54.434200 1 queuejob_controller_ex.go:817] [getAggAvaiResPri] Idle cluster resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:54.434232 1 genericresource.go:549] [GetResources] Requested total allocation resource from custompodresources `cpu 4000.00, memory 1000000000.00, GPU 0`. I0831 06:38:54.434260 1 queuejob_controller_ex.go:909] [getAggAvaiResPri] cpu 1380.00, memory 24521711616.00, GPU 0 available resources to schedule I0831 06:38:54.434273 1 queuejob_controller_ex.go:1192] [ScheduleNext] [Agent Mode] Appwrapper 'test-ns-grslq/mnist' with resources cpu 4000.00, memory 1000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 1380.00, memory 24521711616.00, GPU 0 I0831 06:38:54.434291 1 queuejob_controller_ex.go:1317] [ScheduleNext] [Agent Mode] Failed to dispatch app wrapper 'test-ns-grslq/mnist' due to insuficient resources, activeQ=false Unsched=true &qj=0xc000734c00 Version=30030 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.434432 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by '[backoff] - Rejoining' I0831 06:38:54.435067 1 queuejob_controller_ex.go:1020] [ScheduleNext] activeQ.Pop mnist *Delay=7.320958 seconds RemainingLength=0 &qj=0xc0008abc00 Version=30030 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-08-31 06:38:47.113484 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113484 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113484 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-08-31 06:38:47.113505 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.113505 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-08-31 06:38:47.146786 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.146786 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-08-31 06:38:47.250391 +0000 UTC LastTransitionMicroTime:2023-08-31 06:38:47.250391 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0} I0831 06:38:54.435331 1 queuejob_controller_ex.go:1389] [updateStatusInEtcd] trying to update 'test-ns-grslq/mnist' called by 'ScheduleNext - setHOL'