Skip to content

Azure JAVA sdk struck at the time of starting or tagging virtual machine even though if we use async #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nagappan080810 opened this issue Jul 13, 2022 · 18 comments
Assignees

Comments

@nagappan080810
Copy link

"When we try to add tag or remove tag to virtual machines in a bulk request of 500 vms, few of the vm request got struck or hung and not getting response even after waiting more than 10 mins. We are using java azure sdk and we also tried both the azure sync and async approach. It hangs in both the situation for bulk vms. Start and stop vms in bulk hangs for a very long time more than 45 minutes and not getting the response back. Please suggest an approach on the bulk VM operations."

@weidongxu-microsoft
Copy link
Contributor

weidongxu-microsoft commented Jul 19, 2022

For tagging, you might want to use the tagging API, e.g. https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/resourcemanager/azure-resourcemanager/src/samples/java/com/azure/resourcemanager/resources/generated/TagOperationsCreateOrUpdateAtScopeSamples.java

For start/stop, if you are doing it for 500+ at the same time, it might be more efficient if you call

        azure.virtualMachines().manager().serviceClient().getVirtualMachines()
                .beginStart("rg", "vm");

for all VMs.
Note that it does not poll the result, so after the call, you will need to check the status of all these VMs to see whether they are indeed started. This can be done either via a List or via API from Azure Resource Graph. https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/resourcegraph/azure-resourcemanager-resourcegraph


At present, we do not have idea on why sometimes or some VM stuck, we do not have any log (or even SDK and version) to work on.

add @XiaofeiCao as well.

@weidongxu-microsoft
Copy link
Contributor

PS: it is recommended to post query or bug here. That repository is the main Java repo, and being watched by developers.

@nagappan080810
Copy link
Author

nagappan080810 commented Jul 20, 2022

We could find the ComputeManager manager "azure.virtualMachines().manager()" only which doesn't have the servicesClients() method. Can you point us to the library to be used, We are using the recent library only.

com.microsoft.azure azure 1.41.3

@XiaofeiCao

@XiaofeiCao
Copy link
Contributor

Hi @nagappan080810 , the library you are using is our track1 version. It's officially deprecated since March.
You can try out our track2 SDK: https://github.com/Azure/azure-sdk-for-java

Entry class should be https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/resourcemanager/azure-resourcemanager/src/main/java/com/azure/resourcemanager/AzureResourceManager.java

@weidongxu-microsoft
Copy link
Contributor

weidongxu-microsoft commented Jul 20, 2022

1.41.3 can use azure.virtualMachines().inner().beginStart, but as mentioned above, the lib was deprecated in March.

@nagappan080810
Copy link
Author

But beginStart returns void, how we poll to check the VM is started or not ?

@weidongxu-microsoft
Copy link
Contributor

weidongxu-microsoft commented Jul 21, 2022

@nagappan080810

E.g. ARG https://docs.microsoft.com/en-us/azure/virtual-machines/resource-graph-samples?tabs=azure-cli#count-of-virtual-machines-by-power-state
Old ARG Java SDK (compatible with 1.41.3) is here (https://mvnrepository.com/artifact/com.microsoft.azure.resourcegraph.v2019_04_01/azure-mgmt-resourcegraph/1.0.0). But it is in same deprecated state.

the point is when you have 500 VM, you might not really want to poll them one by one (as it would be 500 POST + hundreds or thousands of GET to poll the result).


But if you still prefer simplicity on API calls, just stick with the usual start API.

@nagappan080810
Copy link
Author

Basically the requirement is to start bulk number of VMs and report the status of success count and failure count and if failure what are the machines failed also need to be reported.

We initially tried with start API, that hangs in the middle without any response from azure. Even if we try startAsync, it is happening synchronously only. So the hang problem appears again. That's the reason this thread was started for alternatives.

We can have success and failure count but struck scenario makes us clueless what is happening there.

count of virtual machines not able to find in the azure java sdk. Nearest we could find this but it doesnt have usage example for it.
https://docs.microsoft.com/en-us/java/api/com.microsoft.azure.management.compute.virtualmachinestatuscodecount?view=azure-java-stable#definition

Can you please guide us to right API for the requirement?

@XiaofeiCao

@XiaofeiCao
Copy link
Contributor

XiaofeiCao commented Aug 2, 2022

Hi @nagappan080810
For querying virtual machine count, you can use the following code:

ResourceGraphManager graphManager = ResourceGraphManager.authenticate();
QueryResponse response = graphManager
  .resourceProviders()
    .resources(
      new QueryRequest()
        .withSubscriptions(Lists.newArrayList("<Your Subscription Id>"))
        .withQuery("Resources | where type == 'microsoft.compute/virtualmachines' | summarize count() by PowerState = tostring(properties.extended.instanceView.powerState.code)"));
System.out.println(response.data());

response.data() will look like:
{columns=[{name=PowerState, type=string}, {name=count_, type=integer}], rows=[[PowerState/running, 1]]}

ResourceGraphManager can be found in https://mvnrepository.com/artifact/com.microsoft.azure.resourcegraph.v2019_04_01/azure-mgmt-resourcegraph/1.0.0,

Sample queries on Virtual Machines:
https://docs.microsoft.com/en-us/azure/virtual-machines/resource-graph-samples?tabs=azure-cli#sample-queries

Query syntax:
https://docs.microsoft.com/en-us/azure/governance/resource-graph/concepts/query-language

@nagappan080810
Copy link
Author

nagappan080810 commented Aug 2, 2022

beginStart for starting vm is blocking the thread till it starts the vms and it fails if vm is not in proper state. We need a option to start the vm in purely fire-and-forget mode so we will not get struck while starting the VM.

Looking the sdk code, it still blocks the main thread..

public void beginStart(String resourceGroupName, String vmName) {
        ((ServiceResponse)this.beginStartWithServiceResponseAsync(resourceGroupName, vmName).toBlocking().single()).body();
    }

So looking for a better api for our use case?

@weidongxu-microsoft
Copy link
Contributor

weidongxu-microsoft commented Aug 3, 2022

@nagappan080810 There is a beginStartAsync just nearby, if you use RxJava (as a rule, the Async method does not block).

@nagappan080810
Copy link
Author

I tried beginStartAsync for 13 vms and waited for 2 mins, none of the machines got started : getting the azure response : 2 {columns=[{name=PowerState, type=string}, {name=count_, type=integer}], rows=[[PowerState/deallocated, 1], [PowerState/stopped, 12]]}

@weidongxu-microsoft
Copy link
Contributor

weidongxu-microsoft commented Aug 4, 2022

What's your code?

RxJava does not run the code until subscribe.

@nagappan080810
Copy link
Author

If we put subscribe, it holds the thread till the machine gets started. and then only it takes the next machine to start it. This is almost same as beginStart. Whether we have option to bulk start or start without blocking the current thread?
Also is there an option to start vms with regex pattern like vm* so all vm1, vm2 get started?

@weidongxu-microsoft
Copy link
Contributor

Please make sure you understand RxJava.

You have to first subscribeOn your preferred background thread pool, then when subscribe it would work on that thread pool.

@nagappan080810
Copy link
Author

We need to create separate background thread for each instance's subscribeOn means how long max time to wait for the instance startup or add tag actions? Because we can't wait for a long period as we need to inform the system it failed or success after 2 minutes or 3 minutes?

@weidongxu-microsoft
Copy link
Contributor

weidongxu-microsoft commented Aug 22, 2022

Thread or thread pool or whatever (and who's thread pool, yours or RxJava's) is totally up-to-you.

There is not cancellation API to Azure resources, hence the timeout is also up-to-you. You can fail and delete the VM after say 2 or 3 minutes and didn't see success. This what YOUR service/client is going to handle.

@nagappan080810
Copy link
Author

If we wait for period of time and check, the results are coming as false positive. This says it is not completed successfully, but if we check in azure portal add tag was happened successfully. this circuit breaker kind of concept is not yielding us the right results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants