Skip to content

Azure Blob trigger function AKS trigger issue #1613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anime-shed opened this issue Nov 21, 2024 · 3 comments
Open

Azure Blob trigger function AKS trigger issue #1613

anime-shed opened this issue Nov 21, 2024 · 3 comments
Assignees

Comments

@anime-shed
Copy link

I have a blob-triggered Azure function, which is deployed on AKS with keda scaling based on the blob entries.
I used Azure/azure-functions-host#10624 to make each function accept only one blob item. The problem I have is that all created pods read the same file, but if I use queues-based triggers and scaling, different queue elements are read by different functions. According to my understanding, the blob trigger internally uses queues to do its tasks, so why is the behaviour different from having a blob trigger?

P.S.: I am moving the files to a different folder after the process is completed.

host.json

{
  "version": "2.0",
  "logging": {
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "excludedTypes": "Request"
      }
    }
  },
  "extensions": {
    "blobs": {
      "maxDegreeOfParallelism": 1
    }
  },
  "extensionBundle": {
    "id": "Microsoft.Azure.Functions.ExtensionBundle",
    "version": "[4.*, 5.0.0)"
  }
}
host: 2.0
"FUNCTIONS_WORKER_RUNTIME": "python",
"AzureWebJobsFeatureFlags": "EnableWorkerIndex",
"PYTHON_ISOLATE_WORKER_DEPENDENCIES": "1",
from blob_helper import initialize_blob_service_client,upload_dataframe_to_blob
import logging
app = func.FunctionApp(http_auth_level=func.AuthLevel.ANONYMOUS)
@app.function_name(name="PythonFunction")
@app.blob_trigger(
    arg_name="myblob", 
    path="sheets/input/{name}",  # Blob path for trigger
    connection="DataLakeConnectionString"
)

keda

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: python-fuction-scaler
  namespace: prod
spec:
  scaleTargetRef:
    name: python-fuction 
  minReplicaCount: 0
  maxReplicaCount: 10
  triggers:
    - type: azure-blob
      metadata:
        blobContainerName: "sheets"
        blobPrefix: "input"
        connectionFromEnv: "DataLakeConnectionString"
        targetBlobCount: "1"
      authenticationRef:
        name: secrets
@anime-shed
Copy link
Author

What I was able to find out was that each instance of the Azure Blob triggered function was creating a new queue and setting a lock in that, so is it possible to have a common queue for all? I think with that my issue should be solved.

@JAdluri JAdluri self-assigned this Mar 18, 2025
@JAdluri
Copy link

JAdluri commented Apr 9, 2025

HEllo @anime-shed thankyou for sharing your findings, as far as i worked with it is not possible to have common queue in that scenerio. but please validate this document and let me know if it help you - https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-trigger?tabs=python-v2%2Cisolated-process%2Cnodejs-v4%2Cextensionv5&pivots=programming-language-csharp

@anime-shed
Copy link
Author

@JAdluri, from what I am able to get from the documentation you attached, I can provide the

Queue Service URI (required for blob triggers2)	<CONNECTION_NAME_PREFIX>__queueServiceUri

but that will not help me with the issue I am facing.

I want my blob-triggered function to know that a particular blob is already being read in another pod so it can read the next message in the blob.

Eg:
Current behaviour:
Blob name: Input_Files
Blob1, Blob2, Blob3, Blob4..

AKS pods:
Pod 1: triggers read Blob1 and creale Input_Files_12342 and locks the element in that queue
Pod 2: triggers read Blob1 and creale Input_Files_34342 and locks the element in that queue

Expected behaviour:
Blob name: Input_Files
Blob1, Blob2, Blob3, Blob4..

AKS pods:
Pod 1: triggers read Blob1 and creale Input_Files_12342 and locks the element in that queue
Pod 2: triggers read Blob1 find it already read, so read Blob2 and add it in Input_Files_34342 and locks the element in that queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants