Skip to content

cleanup: refactor volume cloning #1521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

umagnus
Copy link
Contributor

@umagnus umagnus commented Jul 31, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

cleanup: refactor volume cloning

Which issue(s) this PR fixes:

Fixes #

Requirements:

Special notes for your reviewer:

Release note:

none

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 31, 2024
@k8s-ci-robot k8s-ci-robot requested review from andyzhangx and cvvz July 31, 2024 08:14
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 31, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @umagnus. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 31, 2024
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 31, 2024
SubscriptionID: accountOptions.SubscriptionID,
GetLatestAccountKey: accountOptions.GetLatestAccountKey,
}
if srcAccountSasToken, _, err = d.getAzcopyAuth(ctx, srcAccountName, "", storageEndpointSuffix, srcAccountOptions, nil, "", secretNamespace); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does getAzcopyAuth work on a source volume for which the controller identity only has Storage Blob Data Reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may failed with AuthorizationPermissionMismatch error, as azcopy team mentioned, it needs Storage Blob Data Contributor role

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know here, from intuition, Storage Blob Data Reader should work on source volume, need to verify, and if it failed, then fallback to use sas token

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it to only give Storage Blob Data Reader in source blob, and it also return AuthorizationPermissionMismatch error from azcopy.

2024/08/01 08:46:05 INFO: [P#0-T#0] Starting transfer: Source "https://fuse256643cb981cf47f286.blob.core.windows.net/pvc-454d10fc-47c0-4d75-847c-48b4093c6261/outfile" Destination "https://fuse29375444a7b114aaf91.blob.core.windows.net/pvc-88f8b354-4d77-4097-a65b-0ecb93607e87/outfile". Specified chunk size 8388608
2024/08/01 08:46:05 ==> REQUEST/RESPONSE (Try=1/9.557829ms, OpTime=30.325026ms) -- RESPONSE STATUS CODE ERROR
   PUT https://fuse29375444a7b114aaf91.blob.core.windows.net/pvc-88f8b354-4d77-4097-a65b-0ecb93607e87/outfile
   Accept: application/xml
   Authorization: REDACTED
   Content-Length: 0
   User-Agent: AzCopy/10.25.1 azsdk-go-azblob/v1.3.1 (go1.22.4; linux)
   X-Ms-Client-Request-Id: 79584760-5d17-4747-4048-df4f78b38efe
   x-ms-blob-content-md5: tTzxIJei9Hch9GOVLgQ+WA==
   x-ms-blob-content-type: application/octet-stream
   x-ms-blob-type: BlockBlob
   x-ms-copy-source: https://fuse256643cb981cf47f286.blob.core.windows.net/pvc-454d10fc-47c0-4d75-847c-48b4093c6261/outfile
   x-ms-copy-source-authorization: REDACTED
   x-ms-version: 2023-08-03
   --------------------------------------------------------------------------------
   RESPONSE Status: 403 This request is not authorized to perform this operation using this permission.
   Content-Length: 279
   Content-Type: application/xml
   Date: Thu, 01 Aug 2024 08:46:05 GMT
   Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
   X-Ms-Client-Request-Id: 79584760-5d17-4747-4048-df4f78b38efe
   X-Ms-Error-Code: AuthorizationPermissionMismatch
   X-Ms-Request-Id: aa17dd77-001e-0022-5def-e3ef22000000
   X-Ms-Version: 2023-08-03
Response Details: <Code>AuthorizationPermissionMismatch</Code><Message>This request is not authorized to perform this operation using this permission. </Message>

2024/08/01 08:46:05 ERR: [P#0-T#0] COPYFAILED: https://fuse256643cb981cf47f286.blob.core.windows.net/pvc-454d10fc-47c0-4d75-847c-48b4093c6261/outfile : 403 : 403 This request is not authorized to perform this operation using this permission.. When Put Blob from URL. X-Ms-Request-Id: aa17dd77-001e-0022-5def-e3ef22000000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems strange and doesn't match the public docs for AzCopy. I asked the team and they indicated that:

This request appears to be a response from the destination account. It is not a source validation failure, which indicates that it's the actual request to the destination, not the source that's failing. Furthermore, it's AuthorizationPermissionMismatch, meaning that authorization has succeeded, but permissions are lacking.

In short-- sounds like perms are lacking on the destination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a bicep POC that shows azcopy itself does work with only Blob Data Reader on the source and Blob Data Contributor on the destination.

The blob-csi-driver should support it as well for at least MSI scenarios.

param stem string = uniqueString(resourceGroup().name)
param location string = resourceGroup().location
param azCliVersion string = '2.52.0'
param forceUpdateTag string = utcNow()

var blobOwnerRoleId = subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'b7e6dc6d-f1e8-4753-8033-0f276bb0955b')
var blobReaderRoleId = subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '2a2b9908-6ea1-4ae2-8e65-a410df84e7d1')
var blobContributorRoleId = subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe')

resource srcAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  name: '${stem}src'
  location: location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'StorageV2'
  properties: {
    accessTier: 'Hot'
    allowSharedKeyAccess: false
    defaultToOAuthAuthentication: true
    isLocalUserEnabled: false
  }
}

resource dstAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  name: '${stem}dst'
  location: location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'StorageV2'
  properties: {
    accessTier: 'Hot'
    allowSharedKeyAccess: false
    defaultToOAuthAuthentication: true
    isLocalUserEnabled: false
  }
}

resource makeIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: '${stem}-make-id'

  location: location
}

resource copyIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: '${stem}-copy-id'
  location: location
}

resource makeSourceBlobOwner 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(makeIdentity.id, blobOwnerRoleId, srcAccount.id)
  properties: {
    roleDefinitionId: blobOwnerRoleId
    principalId: makeIdentity.properties.principalId
  }
  scope: srcAccount
}

resource copySourceBlobReader 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(copyIdentity.id, blobReaderRoleId, srcAccount.id)
  properties: {
    roleDefinitionId: blobReaderRoleId
    principalId: copyIdentity.properties.principalId
  }
  scope: srcAccount
}


resource copyDestinationBlobContributor 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(copyIdentity.id, blobContributorRoleId, dstAccount.id)
  properties: {
    roleDefinitionId: blobContributorRoleId
    principalId: copyIdentity.properties.principalId
  }
  scope: dstAccount
}

resource makeScript 'Microsoft.Resources/deploymentScripts@2023-08-01' = {
  name: '${stem}-make-script'
  location: location
  kind: 'AzureCLI'
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: {
      '${makeIdentity.id}': {}
    }
  }
  properties: {
    azCliVersion: azCliVersion
    forceUpdateTag: forceUpdateTag
    environmentVariables: [
      {
        name: 'AZCOPY_AUTO_LOGIN_TYPE'
        value: 'MSI'
      }
      {
        name: 'SRCACCOUNT'
        value: srcAccount.properties.primaryEndpoints.blob
      }
      {
        name: 'SRCCONTAINER'
        value: 'data'
      }
    ]
    scriptContent: '${installScriptContent}\n${makeScriptContent}'
    retentionInterval: 'PT1H'
  }
  dependsOn: [
    makeSourceBlobOwner
  ]
}

resource copyScript 'Microsoft.Resources/deploymentScripts@2023-08-01' = {
  name: '${stem}-copy-script'
  location: location
  kind: 'AzureCLI'
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: {
      '${copyIdentity.id}': {}
    }
  }
  properties: {
    azCliVersion: azCliVersion
    forceUpdateTag: forceUpdateTag
    environmentVariables: [
      {
        name: 'AZCOPY_AUTO_LOGIN_TYPE'
        value: 'MSI'
      }
      {
        name: 'SRCACCOUNT'
        value: srcAccount.properties.primaryEndpoints.blob
      }
      {
        name: 'SRCCONTAINER'
        value: 'data'
      }
      {
        name: 'DSTACCOUNT'
        value: dstAccount.properties.primaryEndpoints.blob
      }
      {
        name: 'DSTCONTAINER'
        value: 'data'
      }
    ]
    scriptContent: '${installScriptContent}\n${copyScriptContent}'
    retentionInterval: 'PT1H'
  }
  dependsOn: [
    copySourceBlobReader
    copyDestinationBlobContributor
    makeScript
  ]
}

var installScriptContent = '''
cd /tmp
mkdir bin
apk add -q --no-interactive gcompat
wget -q https://aka.ms/downloadazcopy-v10-linux
tar -xf downloadazcopy-v10-linux
cp -vf ./azcopy_linux_amd64_*/azcopy bin/
rm -rf ./azcopy_linux_amd64_* downloadazcopy-v10-linux
'''

var makeScriptContent = '''
mkdir data
date > data/0.txt
date > data/1.txt
date > data/3.txt
date > data/4.txt
./bin/azcopy jobs list
./bin/azcopy list ${SRCACCOUNT}
./bin/azcopy make ${SRCACCOUNT}${SRCCONTAINER}
./bin/azcopy copy data ${SRCACCOUNT}${SRCCONTAINER} --recursive
 '''

var copyScriptContent = '''
./bin/azcopy jobs list
./bin/azcopy list ${SRCACCOUNT}
./bin/azcopy list ${DSTACCOUNT}
./bin/azcopy copy ${SRCACCOUNT}${SRCCONTAINER} ${DSTACCOUNT}${DSTCONTAINER} --recursive --check-length=false --s2s-preserve-access-tier=false
'''

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the verification, we could address this in the doc or other PR

@umagnus umagnus force-pushed the refactor_volume_cloning branch from 434b4d8 to d39b8b7 Compare August 1, 2024 09:00
dstAzcopyAuthEnv := srcAzcopyAuthEnv
dstAccountSASToken := srcAccountSASToken
if srcAccountName != accountName {
if dstAccountSASToken, dstAzcopyAuthEnv, err = d.getAzcopyAuth(ctx, accountName, accountKey, storageEndpointSuffix, accountOptions, secrets, secretName, secretNamespace); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might have been causing your failures.

getAzcopyAuth() has a few issues:

  • only tries to use MSI auth when len(secrets) == 0 && len(secretName) == 0 .
  • tries to get storage access key and sas token as a fallback even if disabled on the account
  • uses the cluster identity to get the fallback key / sas which requires even more permissions

All of this account key / sas handling seems like hacks around the root issue that the default permissions for the control plane identity neglected to include Storage Blob Data Contributor along with Contributor.

Changing secrets, secretName, secretNamespace to nil, "", "" would be a quick fix, and preferably getAzcopyAuth would always try the identity first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could address this in other PR, this is refactor not bug fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tried again and Reader can be worked after it failed before. But I think the reason is not here since we have use MSI in that test since len(secrets) == 0 && len(secretName) == 0 , I will still work on this issue to find the reason.

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2024
@andyzhangx
Copy link
Member

/retest

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, umagnus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 2, 2024
@k8s-ci-robot k8s-ci-robot merged commit d7e076b into kubernetes-sigs:master Aug 2, 2024
22 checks passed
@andyzhangx
Copy link
Member

/cherrypick release-1.24

@k8s-infra-cherrypick-robot

@andyzhangx: new pull request created: #1532

In response to this:

/cherrypick release-1.24

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@andyzhangx
Copy link
Member

/cherrypick release-1.23

@k8s-infra-cherrypick-robot

@andyzhangx: #1521 failed to apply on top of branch "release-1.23":

Applying: fix volume cloning and add e2e
Using index info to reconstruct a base tree...
M	pkg/blob/blob.go
M	pkg/blob/controllerserver.go
M	pkg/blob/controllerserver_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/blob/controllerserver_test.go
CONFLICT (content): Merge conflict in pkg/blob/controllerserver_test.go
Auto-merging pkg/blob/controllerserver.go
CONFLICT (content): Merge conflict in pkg/blob/controllerserver.go
Auto-merging pkg/blob/blob.go
CONFLICT (content): Merge conflict in pkg/blob/blob.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 fix volume cloning and add e2e
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-1.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants