-
Notifications
You must be signed in to change notification settings - Fork 90
fix: make arns of all task resources aws-partition aware #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add comment to method explanation.
src/stepfunctions/steps/utils.py
Outdated
|
||
# Obtain matching aws partition name based on region | ||
# Retrun "aws" as default if no region detected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oups! Typo here: "Return"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I will make the change in next commit
tests/unit/test_steps.py
Outdated
from stepfunctions.steps import Pass, Succeed, Fail, Wait, Choice, ChoiceRule, Parallel, Map, Task, Retry, Catch, Chain, \ | ||
utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we reorganize the imports so that each is on another line to make is more readable? It would also make it easier to spot the change next time a new import is added from stepfunctions.step.
Something like:
from stepfunctions.steps import (
Catch,
Chain,
Choice,
ChoiceRule,
Fail,
Map,
Parallel,
Pass,
Retry,
Succeed,
Task,
Wait,
utils,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the legacy across this repo and it has many places import like this. As these imports have relatively short name, to me importing same line is a better choice. Make the list longer is definitely clearer but the looking will be clumsy.
But let me know if this is hard to read. If so I can maybe divide them into 2 or 3 lines so that we have a clearer reading as well as keeping the context less bulky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add some linting (in a different PR) to enforce style preferences.
not strongly opinionated on whether we go with PEP8, PyLint, or other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. So far I think there are definitely different style existing in this repo.
src/stepfunctions/steps/utils.py
Outdated
# Retrun "aws" as default if no region detected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in "Retrun". May be worth mentioning this is from the default boto3 session.
# Obtain matching aws partition name based on region | |
# Retrun "aws" as default if no region detected | |
# Obtain matching aws partition name based on region | |
""" | |
Returns the aws partition for the the current boto3 session. | |
Defaults to 'aws' if the region could not be detected. | |
""" |
Python method comments are also usually inside the method blocks as a docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. Will make the fix in new commit.
src/stepfunctions/steps/utils.py
Outdated
|
||
if cur_region is None: | ||
logger.warning("No region detected for the session, will use default partition: aws") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it's worth mentioning this is for a boto3 session
logger.warning("No region detected for the session, will use default partition: aws") | |
logger.warning("No region detected for the boto3 session. Using default partition: aws") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will do
@@ -88,10 +95,12 @@ def test_succeed_state_creation(): | |||
'Comment': 'This is a comment' | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add extra new lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra new line came from the IDE. Usually it's a 2 blank line between python methods and it's the case for other files. I kept this auto-reformat to make it meet the convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think* that's the PEP8 standard. we should enable linting on the repo at some point to enforce style preferences
tests/unit/test_steps.py
Outdated
|
||
|
||
# Test if boto3 session can fetch correct aws partition info from test environment | ||
def test_util_get_aws_partition(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to a separate file for just the utils module with separate test case methods for aws
and aws-cn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. Will do
tests/unit/test_steps.py
Outdated
aws_partition = "aws" | ||
aws_cn_partition = "aws-cn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not really necessary to declare variables for these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking it makes the reading better if we mention what we are testing but they both will work I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems pretty obvious to me. But this is just nitpicking
assert utils.get_aws_partition() == 'aws'
assert utils.get_aws_partition() == 'aws-cn'
tests/unit/test_steps.py
Outdated
# Boto3 used either info from ~/.aws/config or AWS_DEFAULT_REGION in environment | ||
# to determine current region. We will replace/create AWS_DEFAULT_REGION with regions in | ||
# different aws partition to test that when regions are changed, correct partition info | ||
# can be retrieved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of mutating environment variables, did you consider mocking boto3 to override region_name
but call through to the remaining functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I didn't. I came up this testing flow during my local setup and think this will satisfy the unit test need as well as straightforward for users.
I can try boto3 mock if this makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, you can do this with patch decorators: https://docs.python.org/3/library/unittest.mock-examples.html#patch-decorators
@patch.object(boto3.session.Session, 'region_name', 'cn-north-1')
def test_get_aws_partition_for_china_region():
# ...
src/stepfunctions/steps/compute.py
Outdated
@@ -38,9 +39,9 @@ def __init__(self, state_id, wait_for_callback=False, **kwargs): | |||
output_path (str, optional): Path applied to the state’s output after the application of `result_path`, producing the effective output which serves as the raw input for the next state. (default: '$') | |||
""" | |||
if wait_for_callback: | |||
kwargs[Field.Resource.value] = 'arn:aws:states:::lambda:invoke.waitForTaskToken' | |||
kwargs[Field.Resource.value] = 'arn:' + get_aws_partition() + ':states:::lambda:invoke.waitForTaskToken' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily blocking: I would like to at least consider class for ARN construction instead of building strings with repeated calls to get_aws_partition()
inline everywhere. The AWS SDK in other languages have utilities for this, but python doesn't.
Something like:
kwards[Field.Resource.value] = ARN(service='states', resource='lambda:invoke')
Or even scoping it to service integration ARNs:
kwargs[Field.Resource.value] = ServiceIntegrationARN(service='lambda', api='invoke', integration_pattern=IntegrationPattern.WAIT_FOR_TASK_TOKEN)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will definitely make the code cleaner. I will look into it.
If I can not finish this quickly I will prepare a PR specific for it later.
src/stepfunctions/steps/compute.py
Outdated
@@ -67,9 +68,9 @@ def __init__(self, state_id, wait_for_completion=True, **kwargs): | |||
output_path (str, optional): Path applied to the state’s output after the application of `result_path`, producing the effective output which serves as the raw input for the next state. (default: '$') | |||
""" | |||
if wait_for_completion: | |||
kwargs[Field.Resource.value] = 'arn:aws:states:::glue:startJobRun.sync' | |||
kwargs[Field.Resource.value] = 'arn:' + get_aws_partition() + ':states:::glue:startJobRun.sync' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice if we extracted the logic for generating ARNs into a utility.
additionally, consider string interpolation using f-strings
rather than string concatenation, which is supported in Python 3.6+ (our current pre-requisites)
you can read more about it here
tests/integ/conftest.py
Outdated
@@ -43,11 +44,11 @@ def aws_account_id(): | |||
|
|||
@pytest.fixture(scope="session") | |||
def sfn_role_arn(aws_account_id): | |||
return "arn:aws:iam::{}:role/StepFunctionsMLWorkflowExecutionFullAccess".format(aws_account_id) | |||
return "arn:" + get_aws_partition() + ":iam::{}:role/StepFunctionsMLWorkflowExecutionFullAccess".format(aws_account_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-strings
would also help here as they're evaluated at runtime and would get you out of using both format
and concatenation on the same string.
tests/unit/test_steps.py
Outdated
from stepfunctions.steps import Pass, Succeed, Fail, Wait, Choice, ChoiceRule, Parallel, Map, Task, Retry, Catch, Chain, \ | ||
utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add some linting (in a different PR) to enforce style preferences.
not strongly opinionated on whether we go with PEP8, PyLint, or other
@@ -88,10 +95,12 @@ def test_succeed_state_creation(): | |||
'Comment': 'This is a comment' | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think* that's the PEP8 standard. we should enable linting on the repo at some point to enforce style preferences
tests/unit/test_steps.py
Outdated
default_region = None | ||
|
||
# Boto3 used either info from ~/.aws/config or AWS_DEFAULT_REGION in environment | ||
# to determine current region. We will replace/create AWS_DEFAULT_REGION with regions in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple questions here:
- What will happen if your assertion fails after
cn-north-1
is set as the default region? will those changes persist for the next boto session because the test will bail or is there a guarantee that environment variables that are being touched will always be restored to a pristine state? - does this work with named profiles and other mechanisms that can be used to establish aws credentials where the
DEFAULT_REGION
is not set as an environment variable?
It's generally not preferable for unit tests to have any dependencies. Extraneous processes could be running or setting the region and this test could interfere and introduce surprising behaviour by modifying them.
i know the stubber
allows you to create a session for a client with a specific region, but i'm not sure what mocking the session looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Good question. From my understanding this will continue to fail as the
cn-north-1
will be set as env var which will keep being effective till next run. To improve this specific issue I can save exceptions as non-fatal and assert at the very end of test so that changes got reversed before assertion. - Yes. I tested with both environment variable and ~/.aws/config file. Can't think of other configuration methods but please let me know if you have some in your mind.
- I do agree the test should be on itself and should be mocked without dependency. So far I tried to mock boto3 client/session but have no luck. Will give stubber a try.
…tegration resource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments about structure, naming, and documentation. Almost there!
src/stepfunctions/steps/utils.py
Outdated
ARN builder for task integration | ||
""" | ||
def resource_integration_arn_builder(service, api, integration_pattern=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This belongs in the service integration module
- Since this a method, let's name it with a verb instead. e.g.
get/build_service_integration_arn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(2) still applies
class IntegrationServices(Enum): | ||
Lambda = "lambda" | ||
SageMaker = "sagemaker" | ||
Glue = "glue" | ||
ECS = "ecs" | ||
Batch = "batch" | ||
DynamoDB = "dynamodb" | ||
SNS = "sns" | ||
SQS = "sqs" | ||
ElasticMapReduce = "elasticmapreduce" | ||
|
||
|
||
class LambdaApi(Enum): | ||
Invoke = "invoke" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove these. We don't need to refer to the service or api names outside the Step classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had similar thought before I start coding.
But during the implementation, I found it's important to have the well organized service/api for development. It will not only help with auto-completion which helps accelerate coding speed, but will also help reading/changing/examining the code as we have our resource gathered in a place rather than sparsely distributed among places.
Another reason is: we have the integration pattern passed in as enum, this is limited number hard-coded param exactly same as our integrated services and their apis.
I think this will also help with organizing further expansion when we are introducing more service and apis into the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced. Arguments for types and autocompletion make sense if they are to be used in many places. That is not the case for "lambda" and "invoke", etc.. There's no need to reference them outside the module defining the service integration steps, so don't expose them.
Another reason is: we have the integration pattern passed in as enum, this is limited number hard-coded param exactly same as our integrated services and their apis.
Not sure what you mean here
src/stepfunctions/steps/utils.py
Outdated
ARN builder for task integration | ||
Args: | ||
service(str): name of the task resource service | ||
api(enum): api to be integrated of the task resource service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
api(enum): api to be integrated of the task resource service | |
api(str): api to be integrated of the task resource service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will replace "str" with "Api" as it could provide a clearer explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's more confusing as it references a type that doesn't exist. Keep as str
. The parameter name is already api
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the new commit I replaced it with "Api" which I think it is clearer than only "Api". Sorry forgot to update it here.
Let me know if that's still confusing. I can change it to "str". The reason I think "str" is not the best choice is: I implemented the API as one enum and fetch the value in the get_service_integration_arn method. "str" is not really reflecting the input.
src/stepfunctions/steps/compute.py
Outdated
Lambda = "lambda" | ||
Glue = "glue" | ||
Ecs = "ecs" | ||
Batch = "batch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lambda = "lambda" | |
Glue = "glue" | |
Ecs = "ecs" | |
Batch = "batch" | |
LAMBDA_SERVICE_NAME = "lambda" | |
GLUE_SERVICE_NAME = "glue" | |
ECS_SERVICE_NAME = "ecs" | |
BATCH_SERVICE_NAME = "batch" |
nit: Variable names should written be in snake_case
. Class names are PascalCase
. Constants in CAPITAL_SNAKE_CASE
. Repeats.
Batch = "batch" | ||
|
||
|
||
class LambdaApi(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (not blocking): Not necessary to be a class
src/stepfunctions/steps/compute.py
Outdated
kwargs[Field.Resource.value] = get_service_integration_arn(Lambda, | ||
LambdaApi.Invoke, | ||
IntegrationPattern.WaitForTaskToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation is off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Approval conditional on applying the docstring suggestion for get_service_integration_arn
Co-authored-by: Adam Wong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 changes LGTM! provisional approval on adding doc-strings to classes without them.
|
||
""" | ||
Example resource arn: arn:aws:states:::dynamodb:getItem | ||
""" | ||
|
||
kwargs[Field.Resource.value] = get_service_integration_arn(DYNAMODB_SERVICE_NAME, | ||
DynamoDBApi.GetItem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding these through the PR, it definitely helps as a review and will benefit contributors!
Invoke = "invoke" | ||
|
||
|
||
class GlueApi(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not blocking: also agree that these don't need to be classes.
If you do keep them as classes, we should include docstrings that explain what they are and their purpose :)
if integration_pattern == IntegrationPattern.RequestResponse: | ||
arn = f"arn:{get_aws_partition()}:states:::{service}:{api.value}" | ||
else: | ||
arn = f"arn:{get_aws_partition()}:states:::{service}:{api.value}.{integration_pattern.value}" | ||
return arn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really nice to see us leveraging f-strings!
ELASTICMAPREDUCE_SERVICE_NAME = "elasticmapreduce" | ||
|
||
|
||
class DynamoDBApi(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all our classes should include docstrings
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Issue #, if available: #120
Update:
Resolve previous comments.
Use boto3 patch mock for unit tests.
Use f-string for string building.
Create arn builder for providing integration resource arn.
Create tests for arn builder.
Test:
All unit tests and integ tests passed.
Description of changes:
Support different aws partitions when SDK providing resource arns. Partitions will be determined by boto3 client session running region.
Available regions are based on SFN regional availability.
There are formatting changes for test_steps.py which done by IDE. They are valid changes and will not have functionality impact.
Test:
Created new unit under test_steps suite. All units tests passed.
All integration tests passed in both
us-east-1
andcn-north-1
using test account.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.