Skip to content

Commit 85e7a48

Browse files
Merge branch 'main' into 2956-athena-read_sql_query-provides-completely-wrong-results-for-qmark-style-parametrized-queries-with-cache-enabled
2 parents 64c4e13 + 9b2cdd9 commit 85e7a48

37 files changed

+2132
-1791
lines changed

.bumpversion.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[tool.bumpversion]
2-
current_version = "3.9.2b1"
2+
current_version = "3.10.0"
33
commit = false
44
tag = false
55
tag_name = "{new_version}"

README.md

Lines changed: 40 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -94,27 +94,27 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
9494
## At scale
9595
AWS SDK for pandas can also run your workflows at scale by leveraging [Modin](https://modin.readthedocs.io/en/stable/) and [Ray](https://www.ray.io/). Both projects aim to speed up data workloads by distributing processing over a cluster of workers.
9696

97-
Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html) or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to learn more.
97+
Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/scale.html) or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to learn more.
9898

9999
> ⚠️ **Ray is currently not available for Python 3.12. While AWS SDK for pandas supports Python 3.12, it cannot be used at scale.**
100100
101101
## [Read The Docs](https://aws-sdk-pandas.readthedocs.io/)
102102

103-
- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/about.html)
104-
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html)
105-
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#pypi-pip)
106-
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#conda)
107-
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#aws-lambda-layer)
108-
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#aws-glue-python-shell-jobs)
109-
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#aws-glue-pyspark-jobs)
110-
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#amazon-sagemaker-notebook)
111-
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#amazon-sagemaker-notebook-lifecycle)
112-
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#emr)
113-
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#from-source)
114-
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html)
115-
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html#getting-started)
116-
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html#supported-apis)
117-
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html#resources)
103+
- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/about.html)
104+
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html)
105+
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#pypi-pip)
106+
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#conda)
107+
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#aws-lambda-layer)
108+
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#aws-glue-python-shell-jobs)
109+
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#aws-glue-pyspark-jobs)
110+
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#amazon-sagemaker-notebook)
111+
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#amazon-sagemaker-notebook-lifecycle)
112+
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#emr)
113+
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#from-source)
114+
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/scale.html)
115+
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/scale.html#getting-started)
116+
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/scale.html#supported-apis)
117+
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/scale.html#resources)
118118
- [**Tutorials**](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials)
119119
- [001 - Introduction](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/001%20-%20Introduction.ipynb)
120120
- [002 - Sessions](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/002%20-%20Sessions.ipynb)
@@ -155,30 +155,30 @@ Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html) or
155155
- [039 - Athena Iceberg](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/039%20-%20Athena%20Iceberg.ipynb)
156156
- [040 - EMR Serverless](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/040%20-%20EMR%20Serverless.ipynb)
157157
- [041 - Apache Spark on Amazon Athena](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/041%20-%20Apache%20Spark%20on%20Amazon%20Athena.ipynb)
158-
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html)
159-
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-s3)
160-
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#aws-glue-catalog)
161-
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-athena)
162-
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-redshift)
163-
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#postgresql)
164-
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#mysql)
165-
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#sqlserver)
166-
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#oracle)
167-
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#data-api-redshift)
168-
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#data-api-rds)
169-
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#opensearch)
170-
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#aws-glue-data-quality)
171-
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-neptune)
172-
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#dynamodb)
173-
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-timestream)
174-
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-emr)
175-
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-cloudwatch-logs)
176-
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-chime)
177-
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-quicksight)
178-
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#aws-sts)
179-
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#aws-secrets-manager)
180-
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#global-configurations)
181-
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#distributed-ray)
158+
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html)
159+
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-s3)
160+
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#aws-glue-catalog)
161+
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-athena)
162+
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-redshift)
163+
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#postgresql)
164+
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#mysql)
165+
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#sqlserver)
166+
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#oracle)
167+
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#data-api-redshift)
168+
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#data-api-rds)
169+
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#opensearch)
170+
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#aws-glue-data-quality)
171+
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-neptune)
172+
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#dynamodb)
173+
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-timestream)
174+
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-emr)
175+
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-cloudwatch-logs)
176+
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-chime)
177+
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-quicksight)
178+
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#aws-sts)
179+
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#aws-secrets-manager)
180+
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#global-configurations)
181+
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#distributed-ray)
182182
- [**License**](https://github.com/aws/aws-sdk-pandas/blob/main/LICENSE.txt)
183183
- [**Contributing**](https://github.com/aws/aws-sdk-pandas/blob/main/CONTRIBUTING.md)
184184

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.9.2b1
1+
3.10.0

awswrangler/__metadata__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,5 @@
77

88
__title__: str = "awswrangler"
99
__description__: str = "Pandas on AWS."
10-
__version__: str = "3.9.2b1"
10+
__version__: str = "3.10.0"
1111
__license__: str = "Apache License 2.0"

awswrangler/_config.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,9 @@ def _apply_type(name: str, value: Any, dtype: type[_ConfigValueType], nullable:
224224
raise exceptions.InvalidArgumentValue(
225225
f"{name} configuration does not accept a null value. Please pass {dtype}."
226226
)
227+
# Handle case where string is empty, "False" or "0". Anything else is True
228+
if isinstance(value, str) and dtype is bool:
229+
return value.lower() not in ("false", "0", "")
227230
try:
228231
return dtype(value) if isinstance(value, dtype) is False else value
229232
except ValueError as ex:

awswrangler/_data_types.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,8 @@ def pyarrow2postgresql( # noqa: PLR0911
213213
return pyarrow2postgresql(dtype=dtype.value_type, string_type=string_type)
214214
if pa.types.is_binary(dtype):
215215
return "BYTEA"
216+
if pa.types.is_list(dtype):
217+
return pyarrow2postgresql(dtype=dtype.value_type, string_type=string_type) + "[]"
216218
raise exceptions.UnsupportedType(f"Unsupported PostgreSQL type: {dtype}")
217219

218220

awswrangler/_databases.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,8 @@ def generate_placeholder_parameter_pairs(
359359
"""Extract Placeholder and Parameter pairs."""
360360

361361
def convert_value_to_native_python_type(value: Any) -> Any:
362+
if isinstance(value, list):
363+
return value
362364
if pd.isna(value):
363365
return None
364366
if hasattr(value, "to_pydatetime"):

awswrangler/athena/_read.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -793,11 +793,11 @@ def read_sql_query(
793793
794794
**Related tutorial:**
795795
796-
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
796+
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
797797
tutorials/006%20-%20Amazon%20Athena.html>`_
798-
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
798+
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
799799
tutorials/019%20-%20Athena%20Cache.html>`_
800-
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
800+
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
801801
tutorials/021%20-%20Global%20Configurations.html>`_
802802
803803
**There are three approaches available through ctas_approach and unload_approach parameters:**
@@ -861,7 +861,7 @@ def read_sql_query(
861861
/athena.html#Athena.Client.get_query_execution>`_ .
862862
863863
For a practical example check out the
864-
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
864+
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
865865
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
866866
867867
@@ -1138,11 +1138,11 @@ def read_sql_table(
11381138
11391139
**Related tutorial:**
11401140
1141-
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
1141+
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
11421142
tutorials/006%20-%20Amazon%20Athena.html>`_
1143-
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
1143+
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
11441144
tutorials/019%20-%20Athena%20Cache.html>`_
1145-
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
1145+
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
11461146
tutorials/021%20-%20Global%20Configurations.html>`_
11471147
11481148
**There are three approaches available through ctas_approach and unload_approach parameters:**
@@ -1206,7 +1206,7 @@ def read_sql_table(
12061206
/athena.html#Athena.Client.get_query_execution>`_ .
12071207
12081208
For a practical example check out the
1209-
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
1209+
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
12101210
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
12111211
12121212

awswrangler/athena/_write_iceberg.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,13 @@ def _determine_differences(
115115

116116
catalog_column_types = typing.cast(
117117
Dict[str, str],
118-
catalog.get_table_types(database=database, table=table, catalog_id=catalog_id, boto3_session=boto3_session),
118+
catalog.get_table_types(
119+
database=database,
120+
table=table,
121+
catalog_id=catalog_id,
122+
filter_iceberg_current=True,
123+
boto3_session=boto3_session,
124+
),
119125
)
120126

121127
original_column_names = set(catalog_column_types)

awswrangler/catalog/_create.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1100,7 +1100,7 @@ def create_csv_table(
11001100
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
11011101
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
11021102
Related tutorial:
1103-
https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/tutorials/014%20-%20Schema%20Evolution.html
1103+
https://aws-sdk-pandas.readthedocs.io/en/3.10.0/tutorials/014%20-%20Schema%20Evolution.html
11041104
sep
11051105
String of length 1. Field delimiter for the output file.
11061106
skip_header_line_count
@@ -1280,7 +1280,7 @@ def create_json_table(
12801280
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
12811281
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
12821282
Related tutorial:
1283-
https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/tutorials/014%20-%20Schema%20Evolution.html
1283+
https://aws-sdk-pandas.readthedocs.io/en/3.10.0/tutorials/014%20-%20Schema%20Evolution.html
12841284
serde_library
12851285
Specifies the SerDe Serialization library which will be used. You need to provide the Class library name
12861286
as a string.

awswrangler/catalog/_get.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ def get_table_types(
107107
database: str,
108108
table: str,
109109
catalog_id: str | None = None,
110+
filter_iceberg_current: bool = False,
110111
boto3_session: boto3.Session | None = None,
111112
) -> dict[str, str] | None:
112113
"""Get all columns and types from a table.
@@ -120,6 +121,9 @@ def get_table_types(
120121
catalog_id
121122
The ID of the Data Catalog from which to retrieve Databases.
122123
If ``None`` is provided, the AWS account ID is used by default.
124+
filter_iceberg_current
125+
If True, returns only current iceberg fields (fields marked with iceberg.field.current: true).
126+
Otherwise, returns the all fields. False by default (return all fields).
123127
boto3_session
124128
The default boto3 session will be used if **boto3_session** receive ``None``.
125129
@@ -139,7 +143,10 @@ def get_table_types(
139143
response = client_glue.get_table(**_catalog_id(catalog_id=catalog_id, DatabaseName=database, Name=table))
140144
except client_glue.exceptions.EntityNotFoundException:
141145
return None
142-
return _extract_dtypes_from_table_details(response=response)
146+
return _extract_dtypes_from_table_details(
147+
response=response,
148+
filter_iceberg_current=filter_iceberg_current,
149+
)
143150

144151

145152
def get_databases(

awswrangler/catalog/_utils.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,16 @@ def _sanitize_name(name: str) -> str:
3131
return re.sub("[^A-Za-z0-9_]+", "_", name).lower() # Replacing non alphanumeric characters by underscore
3232

3333

34-
def _extract_dtypes_from_table_details(response: "GetTableResponseTypeDef") -> dict[str, str]:
34+
def _extract_dtypes_from_table_details(
35+
response: "GetTableResponseTypeDef",
36+
filter_iceberg_current: bool = False,
37+
) -> dict[str, str]:
3538
dtypes: dict[str, str] = {}
3639
for col in response["Table"]["StorageDescriptor"]["Columns"]:
37-
dtypes[col["Name"]] = col["Type"]
40+
# Only return current fields if flag is enabled
41+
if not filter_iceberg_current or col.get("Parameters", {}).get("iceberg.field.current") == "true":
42+
dtypes[col["Name"]] = col["Type"]
43+
# Add partition keys as columns
3844
if "PartitionKeys" in response["Table"]:
3945
for par in response["Table"]["PartitionKeys"]:
4046
dtypes[par["Name"]] = par["Type"]

awswrangler/data_api/rds.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,9 @@ def _create_value_dict( # noqa: PLR0911
359359
if isinstance(value, Decimal):
360360
return {"stringValue": str(value)}, "DECIMAL"
361361

362+
if isinstance(value, uuid.UUID):
363+
return {"stringValue": str(value)}, "UUID"
364+
362365
raise exceptions.InvalidArgumentType(f"Value {value} not supported.")
363366

364367

awswrangler/opensearch/_write.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -504,6 +504,7 @@ def index_documents(
504504
initial_backoff: int | None = None,
505505
max_backoff: int | None = None,
506506
use_threads: bool | int = False,
507+
enable_refresh_interval: bool = True,
507508
**kwargs: Any,
508509
) -> dict[str, Any]:
509510
"""
@@ -559,6 +560,8 @@ def index_documents(
559560
True to enable concurrent requests, False to disable multiple threads.
560561
If enabled os.cpu_count() will be used as the max number of threads.
561562
If integer is provided, specified number is used.
563+
enable_refresh_interval
564+
True (default) to enable ``refresh_interval`` modification to ``-1`` (disabled) while indexing documents
562565
**kwargs
563566
KEYWORD arguments forwarded to bulk operation
564567
elasticsearch >= 7.10.2 / opensearch: \
@@ -614,7 +617,7 @@ def index_documents(
614617
widgets=widgets, max_value=total_documents, prefix="Indexing: "
615618
).start()
616619
for i, bulk_chunk_documents in enumerate(actions):
617-
if i == 1: # second bulk iteration, in case the index didn't exist before
620+
if i == 1 and enable_refresh_interval: # second bulk iteration, in case the index didn't exist before
618621
refresh_interval = _get_refresh_interval(client, index)
619622
_disable_refresh_interval(client, index)
620623
_logger.debug("running bulk index of %s documents", len(bulk_chunk_documents))
@@ -655,6 +658,7 @@ def index_documents(
655658
raise e
656659

657660
finally:
658-
_set_refresh_interval(client, index, refresh_interval)
661+
if enable_refresh_interval:
662+
_set_refresh_interval(client, index, refresh_interval)
659663

660664
return {"success": success, "errors": errors}

awswrangler/s3/_read_orc.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,7 @@ def read_orc(
225225
must return a bool, True to read the partition or False to ignore it.
226226
Ignored if `dataset=False`.
227227
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
228-
https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
228+
https://aws-sdk-pandas.readthedocs.io/en/3.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
229229
columns
230230
List of columns to read from the file(s).
231231
validate_schema
@@ -384,7 +384,7 @@ def read_orc_table(
384384
must return a bool, True to read the partition or False to ignore it.
385385
Ignored if `dataset=False`.
386386
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
387-
https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
387+
https://aws-sdk-pandas.readthedocs.io/en/3.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
388388
columns
389389
List of columns to read from the file(s).
390390
validate_schema

0 commit comments

Comments
 (0)