Skip to content

fix(s3-deployment): optimize memory usage for large files #34020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 15, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -328,19 +328,29 @@ def is_json_content(content):
return False

def replace_markers(filename, markers):
"""Replace markers in a file, with special handling for JSON files."""
# if there are no markers, skip
if not markers:
return

outfile = filename + '.new'
# We need to handle a special replacement in the case where the file content is JSON and
# if one of the marker's value contain a double quote
marker_contains_double_quote = any('"' in v for v in markers.values())
replace_tokens = dict([(k.encode('utf-8'), v.encode('utf-8')) for k, v in markers.items()])

# First try to read as text to check if it's JSON
try:
with open(filename, 'r', encoding='utf-8') as f:
content = f.read()
is_json = is_json_content(content)
except UnicodeDecodeError:
# If we can't read as text, it's definitely not JSON
is_json = False

if is_json:
# If no marker values contain a double quote, there is no need to check if the content format is JSON
if marker_contains_double_quote:
# First try to read as text to check if it's JSON
try:
with open(filename, 'r', encoding='utf-8') as f:
content = f.read()
is_json = is_json_content(content)
except UnicodeDecodeError:
# If we can't read as text, it's definitely not JSON
is_json = False

if marker_contains_double_quote and is_json:
# Handle JSON content with proper structure parsing
new_content = replace_markers_in_json(content, replace_tokens)
# Write JSON in text mode
Expand All @@ -355,12 +365,12 @@ def replace_markers(filename, markers):
line = line.replace(token, replacement)
fo.write(line)

# # delete the original file and rename the new one to the original
# Delete the original file and rename the new one to the original
os.remove(filename)
os.rename(outfile, filename)


def replace_markers_in_json(content, replace_tokens):
"""Replace markers in JSON content with proper escaping."""
try:
# Parse the JSON structure
data = json.loads(content)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM public.ecr.aws/lambda/python:latest

# add everything to /opt/awscli (this is where \`aws\` is executed from)
ADD . /opt/awscli

# install boto3, which is available on Lambda
RUN pip3 install boto3 PyYAML

# Set working directory
WORKDIR /opt/awscli

# Make the script executable
RUN chmod +x run_individual_tests.sh

# Run tests individually
ENTRYPOINT ["./run_individual_tests.sh"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash
# Script to run each test individually for more accurate memory measurements

echo "Running tests individually to get accurate memory measurements..."

# Initialize test files first
echo -e "\n\n=== Initializing Test Files ==="
TEST_DIR=$(python3 -m test_large_files --init | grep "TEST_DIR=" | cut -d'=' -f2)
echo -e "\n\nTest directory: $TEST_DIR"

# Run each test individually with the pre-created files
echo -e "\n\n=== Small JSON File Test ==="
python3 -m test_large_files --test test_small_json_file_performance --test-dir "$TEST_DIR"

echo -e "\n\n=== Large JSON File Test ==="
python3 -m test_large_files --test test_large_json_file_performance --test-dir "$TEST_DIR"

echo -e "\n\n=== Complex JSON File Test ==="
python3 -m test_large_files --test test_complex_json_file_performance --test-dir "$TEST_DIR"

echo -e "\n\n=== Small Text File Test ==="
python3 -m test_large_files --test test_small_text_file_performance --test-dir "$TEST_DIR"

echo -e "\n\n=== Large Text File Test ==="
python3 -m test_large_files --test test_large_text_file_performance --test-dir "$TEST_DIR"

echo -e "\n\n=== Complex JSON File with no markers Test ==="
python3 -m test_large_files --test test_complex_json_file_no_marker_performance --test-dir "$TEST_DIR"

echo -e "\n\n=== Complex JSON File with double quote markers Test ==="
python3 -m test_large_files --test test_complex_json_file_double_quote_marker_performance --test-dir "$TEST_DIR"

echo -e "\n\nAll tests completed."

# Clean up test files
rm -rf "$TEST_DIR"
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash
#---------------------------------------------------------------------------------------------------
# executes large file tests
#
# prepares a staging directory with the requirements
set -e
scriptdir=$(cd $(dirname $0) && pwd)

rm -f ${scriptdir}/index.py
rm -fr ${scriptdir}/__pycache__

# prepare staging directory
staging=$(mktemp -d)
mkdir -p ${staging}
cd ${staging}

# copy src and overlay with test
cp -f ${scriptdir}/../../../lib/aws-s3-deployment/bucket-deployment-handler/* $PWD
cp -f ${scriptdir}/* $PWD

# Build the Docker image with --no-cache to force rebuild
DOCKER_CMD=${CDK_DOCKER:-docker}
$DOCKER_CMD build -f Dockerfile_large --no-cache -t s3-deployment-test .

$DOCKER_CMD run --rm s3-deployment-test
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,6 @@ cp -f ${scriptdir}/* $PWD

# this will run our tests inside the right environment
DOCKER_CMD=${CDK_DOCKER:-docker}
$DOCKER_CMD build .
$DOCKER_CMD build --no-cache -t s3-deployment-test .

$DOCKER_CMD run --rm s3-deployment-test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How and where/when does this test get run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are manually run by the developer when testing their changes on the custom resource's code.

Loading
Loading