Skip to content

Is it Possible to use schema from ExecutionInput into container_arguments of ProcessingStep? #167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MrDataPsycho opened this issue Sep 15, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@MrDataPsycho
Copy link

MrDataPsycho commented Sep 15, 2021

Hi,
Lets say I have a execution schema as follows:

execution_input = ExecutionInput(
    schema={
        "PATH_INPUT": str,
        "DESTINATION_OUTPUT": str,
        "study_name": str,
        "ProcessingJobName": str,
        "input_code": str,
        "job_pk": str,
        "job_sk": str,
    }
)

How can I use the execution_input values in the Container Argument part bellow:

processing_step = steps.ProcessingStep(
    "SageMakerProcessingJob1",
    processor=get_processing_container_config(),
    job_name=execution_input["ProcessingJobName"],
    inputs=input_meta,
    outputs=output_meta,
    container_arguments=[
        "--input_filename", "file.docx", 
        "--study_name", execution_input["study_name"]
    ],
    container_entrypoint=["python3", "/opt/ml/processing/code/main.py"]
)

There the study name should come from the the execution input schema. But when trying to create the workflow graph it throughs following errors. Though in the jobname part it except the value from ExecutionInput

workflow_graph = steps.Chain([<over complicated steps>])
workflow = Workflow(
    name="ProcessingJob3_v1",
    definition=workflow_graph,
    role=workflow_execution_role,
    execution_input=execution_input
)
workflow.render_graph()
workflow_arn = workflow.create()

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-17fe64d66aa4> in <module>()
----> 1 workflow.render_graph()
      2 workflow_arn = workflow.create()

/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/site-packages/stepfunctions/workflow/stepfunctions.py in render_graph(self, portrait)
    374             portrait (bool, optional): Boolean flag set to `True` if the workflow graph should be rendered in portrait orientation. Set to `False`, if the graph should be rendered in landscape orientation. (default: False)
    375         """
--> 376         widget = WorkflowGraphWidget(self.definition.to_json())
    377         return widget.show(portrait=portrait)
    378 

/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/site-packages/stepfunctions/steps/states.py in to_json(self, pretty)
     91             return json.dumps(self.to_dict(), indent=4)
     92 
---> 93         return json.dumps(self.to_dict())
     94 
     95     def __repr__(self):

/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    229         cls is None and indent is None and separators is None and
    230         default is None and not sort_keys and not kw):
--> 231         return _default_encoder.encode(obj)
    232     if cls is None:
    233         cls = JSONEncoder

/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258 
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/home/ec2-user/SageMaker/.persisted_conda/dosjobs/lib/python3.6/json/encoder.py in default(self, o)
    178         """
    179         raise TypeError("Object of type '%s' is not JSON serializable" %
--> 180                         o.__class__.__name__)
    181 
    182     def encode(self, o):

TypeError: Object of type 'ExecutionInput' is not JSON serializable
@ca-nguyen
Copy link
Contributor

Hi @DataPsycho!

Currently, the only way to use Placeholders with container_arguments is to define the container arguments entirely as a Placeholder.

Something like this:

execution_input = ExecutionInput(
    schema={
        "PATH_INPUT": str,
        "DESTINATION_OUTPUT": str,
        "container_arguments": list,
        "ProcessingJobName": str,
        "input_code": str,
        "job_pk": str,
        "job_sk": str,
    }
)

processing_step = steps.ProcessingStep(
    "SageMakerProcessingJob1",
    processor=get_processing_container_config(),
    job_name=execution_input["ProcessingJobName"],
    inputs=input_meta,
    outputs=output_meta,
    container_arguments=execution_input["container_arguments"],
    container_entrypoint=["python3", "/opt/ml/processing/code/main.py"]
)

Being able to use Placeholder values for the individual arguments within the container_arguments would be a great enhancement to add and I can imagine many use cases for this. Thank you for bringing this to our attention! Tagging this as an enhancement and putting it on our radar.

Hope this helps!

@ca-nguyen ca-nguyen added the enhancement New feature or request label Sep 16, 2021
@Liks96
Copy link

Liks96 commented Sep 28, 2021

Hi! Im very glad that @DataPsycho raised this issue. I was wondering the same, since I found weird that individual arguments couldn't be specified nor directly referencing the execution input or with the step functions json referencing.
I've tried both (the first similar to the original issue and the json referncing) :
1)
container_arguments=["--metrics-type", execution_input['MetricsType'], "--metrics-name", execution_input['MetricsName'], "--label-name",execution_input['LabelName']]

container_arguments=["--metrics-type", "$$.Execution.Input['MetricsType']", "--metrics-name", "$$.Execution.Input['MetricsName']", "--label-name","$$.Execution.Input['LabelName']"]

It would be really nice to count with this enhancement! It expands and makes step creation more flexible.

@ca-nguyen
Copy link
Contributor

Thank you for showing interest in this feature @Liks96!
We are keeping this on our radar

@MrDataPsycho
Copy link
Author

Hi, Thanks for considering feature. Looking forward to use it when it is available. Thanks

@Rhuax
Copy link

Rhuax commented Dec 23, 2021

I got here after having the same issue. I confirm it would be really useful having this flexibility!

@francescocamussoni
Copy link

I have the same issue, it would be wounderfull to have execution inputs as container arguements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants