Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add support to mask/encrypt/decrypt Pydantic models, Dataclasses, and standard Python classes in the DataMasking utility #3473

Closed
1 of 2 tasks
leandrodamascena opened this issue Dec 8, 2023 · 6 comments · Fixed by #6413
Assignees
Labels
data-masking Sensitive Data Masking feature feature-request feature request good first issue Good for newcomers

Comments

@leandrodamascena
Copy link
Contributor

Use case

Currently, the DataMasking utility only supports operations with traversable types in Python, for example: Lists, Dict, str, and others. It is a limitation for customers who want to integrate DataMasking utility with their existing Pydantic models, data classes, or standard Python classes.

Solution/User Experience

Add support for mask, encrypt and decrypt Pydantic models, Dataclasses, and standard Python classes.

Alternative solutions

No response

Acknowledgment

@leandrodamascena leandrodamascena added triage Pending triage from maintainers feature-request feature request labels Dec 8, 2023
@leandrodamascena leandrodamascena added the data-masking Sensitive Data Masking feature label Dec 8, 2023
@heitorlessa heitorlessa moved this from Triage to Backlog in Powertools for AWS Lambda (Python) Dec 11, 2023
@rubenfonseca rubenfonseca removed the triage Pending triage from maintainers label Dec 18, 2023
@rubenfonseca
Copy link
Contributor

We've added this to our backlog, and we intend to work on this early next year.

@leandrodamascena leandrodamascena changed the title Feature request: Add support to mask/encrypt/decrypt Pydantic models, Dataclasses, and standard Python classes Feature request: Add support to mask/encrypt/decrypt Pydantic models, Dataclasses, and standard Python classes in the DataMasking utility Aug 11, 2024
@leandrodamascena leandrodamascena self-assigned this Aug 12, 2024
@anafalcao anafalcao added the help wanted Could use a second pair of eyes/hands label Jan 24, 2025
@dreamorosi
Copy link
Contributor

Hey @leandrodamascena, whenever you have time this week, could you please leave a comment with some more details about what needs to be done / implemented as part of this issue?

This will help potential contributors to orient themselves.

@leandrodamascena leandrodamascena added the good first issue Good for newcomers label Feb 14, 2025
@leandrodamascena
Copy link
Contributor Author

leandrodamascena commented Feb 14, 2025

Hey everyone! Our current implementation supports dict and lists as they can be directly converted. However, we recognize that customers may want to use more complex data structures such as Pydantic Models, DataClasses, or custom Python classes.

To accommodate these cases, we need to implement a pre-processing step to prepare the data before submitting it for erase, encrypt, or decrypt. This solution addresses the input data challenge, but it's important to note that we cannot guarantee the data will remain a valid instance of the original Pydantic model after processing, for example.

The encryption/erase process may alter the data type in ways that break the model's validation rules. For example:

from __future__ import annotations

from aws_lambda_powertools.utilities.data_masking import DataMasking

from pydantic import BaseModel

class MyModel(BaseModel):
    name: str
    age: int


data_masker = DataMasking()

data = MyModel(name="powertools", age=5)

erased = data_masker.erase(data, fields=["age"])  

print(erased)
# output: {'name': 'powertools', 'age': '*****'}

Not that age is now a string and no longer an int, breaking Pydantic's validation model.

For now, we can implement this support only when we input the data, not in the output. To do this, I suggest to create a new function called prepare_data in this file and in this function it check the data type and convert it to dict, this could be something like this:

def prepare_data(data: Any) -> Any:
    # Convert from dataclasses
    if hasattr(data, "__dataclass_fields__"):
        import dataclasses

        return dataclasses.asdict(data)

    # Convert from Pydantic model
    if callable(getattr(data, "model_dump", None)):
        return data.model_dump()
    
    # Convert from event source data class
    if callable(getattr(data, "dict", None)):
        return data.dict()

    return data

After that, call this method in the first line of erase, encrypt and decrypt methods.

Also, need to add more tests here https://github.com/aws-powertools/powertools-lambda-python/tree/develop/tests/functional/data_masking to make sure it working as expected.

@VatsalGoel3
Copy link
Contributor

@leandrodamascena, I see this a pretty old issue, I can pick this up

@leandrodamascena
Copy link
Contributor Author

Sure, go ahead @VatsalGoel3.

@leandrodamascena leandrodamascena moved this from Backlog to Working on it in Powertools for AWS Lambda (Python) Apr 4, 2025
@leandrodamascena leandrodamascena removed the help wanted Could use a second pair of eyes/hands label Apr 4, 2025
VatsalGoel3 added a commit to VatsalGoel3/powertools-lambda-python that referenced this issue Apr 6, 2025
VatsalGoel3 added a commit to VatsalGoel3/powertools-lambda-python that referenced this issue Apr 6, 2025
leandrodamascena added a commit that referenced this issue Apr 11, 2025
… standard classes (#6413)

* feat(data-masking): support masking of Pydantic models, dataclasses, and standard classes (#3473)

* feat(data_masking): support complex input types via robust prepare_data() with and updated tests

* docs(data-masking): add support docs for Pydantic, dataclasses, and custom classes and updated test code

* docs(data-masking): update examples to use Lambda function entry points for supported input types and updated codebase

* refactoring prepare_data method

---------

Co-authored-by: Leandro Damascena <[email protected]>
@github-project-automation github-project-automation bot moved this from Working on it to Coming soon in Powertools for AWS Lambda (Python) Apr 11, 2025
Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-masking Sensitive Data Masking feature feature-request feature request good first issue Good for newcomers
Projects
Status: Coming soon
5 participants