Skip to content

JSON error in snapshots: Duplicate field #28906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JD557 opened this issue Mar 5, 2018 · 18 comments
Closed

JSON error in snapshots: Duplicate field #28906

JD557 opened this issue Mar 5, 2018 · 18 comments
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs

Comments

@JD557
Copy link

JD557 commented Mar 5, 2018

Elasticsearch version (bin/elasticsearch --version): 6.1.0

Plugins installed: [discovery-ec2, repository-s3, ingest-geoip, ingest-user-agent]

JVM version (java -version): same as docker.elastic.co/elasticsearch/elasticsearch-oss:6.1.0

OS version (uname -a if on a Unix-like system): same as docker.elastic.co/elasticsearch/elasticsearch-oss:6.1.0

Description of the problem including expected versus actual behavior:

After leaving the system running with automatic hourly snapshots to S3 for a few days, I get the following error:

[2018-03-05T14:24:35,534][WARN ][o.e.r.s.S3Repository     ] [9FWWGr6] failed to read index file [index-0]
com.fasterxml.jackson.core.JsonParseException: Duplicate field 'hourly-2018-02-28-17'
 at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@5b9ee173; line: -1, column: 5140]
	at com.fasterxml.jackson.core.json.JsonReadContext._checkDup(JsonReadContext.java:204) ~[jackson-core-2.8.10.jar:2.8.10]
	at com.fasterxml.jackson.core.json.JsonReadContext.setCurrentName(JsonReadContext.java:198) ~[jackson-core-2.8.10.jar:2.8.10]
	at com.fasterxml.jackson.dataformat.smile.SmileParser._handleFieldName(SmileParser.java:1552) ~[jackson-dataformat-smile-2.8.10.jar:2.8.10]
	at com.fasterxml.jackson.dataformat.smile.SmileParser.nextToken(SmileParser.java:588) ~[jackson-dataformat-smile-2.8.10.jar:2.8.10]
	at org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:52) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshots.fromXContent(BlobStoreIndexShardSnapshots.java:257) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:113) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:114) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:89) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.buildBlobStoreIndexShardSnapshots(BlobStoreRepository.java:1070) [elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1139) [elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:811) [elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:401) [elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.snapshots.SnapshotShardsService.access$200(SnapshotShardsService.java:98) [elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:355) [elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.0.jar:6.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

I already created a new snapshot repository and deleted the old data. This fixed the problem for a while, but it appeared again.

The system in question has ~840 indices. Maybe the large number of indices makes this problem more common? (I haven't seen this on other systems).

@javanna javanna added the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Mar 6, 2018
@javanna
Copy link
Member

javanna commented Mar 6, 2018

hi @JD557 were those indices created with 6.x or earlier versions? I suspect this has to do with #22073 , but docs with duplicated fields should not be accepted in the first place, so it could be that some docs got properly indexed with 5.x and now cause problems.

cc @elastic/es-distributed

@JD557
Copy link
Author

JD557 commented Mar 6, 2018

hi @JD557 were those indices created with 6.x or earlier versions?

Some were created with 5.6.4.

Do you think that reindexing them would help or should I just set -Des.json.strict_duplicate_detection=false until I don't need those indices anymore?

@javanna
Copy link
Member

javanna commented Mar 6, 2018

I think that reindexing them will raise the error at index time, then you'll have to fix those docs in order to get them indexed. Setting the system property is a work-around, not really a solution.

@JD557
Copy link
Author

JD557 commented Mar 6, 2018

By the way, I think I should mention that the error is
com.fasterxml.jackson.core.JsonParseException: Duplicate field 'hourly-2018-02-28-17'

Where hourly-2018-02-28-17 is the name of one of the snapshots.

I don't know anything about the internal snapshot representation, but this seems to be a problem in the snapshot metadata, not in an index.

(The snapshots were all created using 6.1.0)

@javanna
Copy link
Member

javanna commented Mar 6, 2018

that is a very good point @JD557 thanks for noticing that. maybe @tlrx or @imotov know more about this.

@ywelsch
Copy link
Contributor

ywelsch commented Mar 6, 2018

@JD557 Do you perhaps have multiple clusters that are concurrently writing to this repository?

@JD557
Copy link
Author

JD557 commented Mar 6, 2018

I have a single cluster with 3 nodes writing to that repository and a hourly curator job to create a snapshot.

The curator job accesses the cluster via DNS, so the snapshot requests don't always hit the same machine (but, AFAIK, there are never 2 snapshots running at the same time).

@ywelsch
Copy link
Contributor

ywelsch commented Mar 8, 2018

Are you experiencing frequent network partitions between the nodes? Anything in the logs on the days leading up to this that could indicate failures on some of the nodes?

Looking at the following snapshotting code, we should make it more resilient against duplicates (we check for duplicates at snapshot initialization, but not when writing the BlobStoreIndexShardSnapshot) so that we don't end up in the situation here (but fail the current snapshot instead). /cc: @tlrx

I still don't understand though how we can end up in the presented situation, except for concurrent access to the repository.

@JD557
Copy link
Author

JD557 commented Mar 8, 2018

I don't think that there are frequent network partitions.

However, looking at the logs, here are some errors that happened before the first time that I had this problem (sorry for the large logs):

Snapshot creation failed

[2018-02-15T20:02:13,395][WARN ][o.e.s.SnapshotShardsService] [9FWWGr6] [[logs-2018-03-16][0]] [s3:hourly-2018-02-15-20/OOoxBKs8S9-QPqblJ-bW5Q] failed to create snapshot
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Failed to write file list
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1006) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1238) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:811) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:401) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.snapshots.SnapshotShardsService.access$200(SnapshotShardsService.java:98) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:355) [elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.0.jar:6.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 27FD59F4EEF2D224; S3 Extended Request ID: MhY2a0feNmbKhNr5TOOWX/OcuQGwlWfeYQEMVd4S0YxFFo3Xo5jTFo2BlbKGIZROM+H69UX/DMk=), S3 Extended Request ID: MhY2a0feNmbKhNr5TOOWX/OcuQGwlWfeYQEMVd4S0YxFFo3Xo5jTFo2BlbKGIZROM+H69UX/DMk=
	at org.elasticsearch.repositories.s3.S3BlobContainer.move(S3BlobContainer.java:177) ~[?:?]
	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.writeAtomic(ChecksumBlobStoreFormat.java:138) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1004) ~[elasticsearch-6.1.0.jar:6.1.0]
	... 10 more
	Suppressed: java.nio.file.NoSuchFileException: Blob [pending-index-951] does not exist
		at org.elasticsearch.repositories.s3.S3BlobContainer.deleteBlob(S3BlobContainer.java:116) ~[?:?]
		at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.writeAtomic(ChecksumBlobStoreFormat.java:142) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1004) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1238) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:811) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:401) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.snapshots.SnapshotShardsService.access$200(SnapshotShardsService.java:98) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:355) [elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.0.jar:6.1.0]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
		at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 27FD59F4EEF2D224; S3 Extended Request ID: MhY2a0feNmbKhNr5TOOWX/OcuQGwlWfeYQEMVd4S0YxFFo3Xo5jTFo2BlbKGIZROM+H69UX/DMk=)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) ~[?:?]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4247) ~[?:?]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4194) ~[?:?]
	at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1870) ~[?:?]
	at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$move$6(S3BlobContainer.java:172) ~[?:?]
	at org.elasticsearch.repositories.s3.SocketAccess.lambda$doPrivilegedVoid$0(SocketAccess.java:57) ~[?:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_161]
	at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedVoid(SocketAccess.java:56) ~[?:?]
	at org.elasticsearch.repositories.s3.S3BlobContainer.move(S3BlobContainer.java:171) ~[?:?]
	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.writeAtomic(ChecksumBlobStoreFormat.java:138) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1004) ~[elasticsearch-6.1.0.jar:6.1.0]
	... 10 more

Snapshot deletion failed

[2018-02-15T21:22:54,284][WARN ][r.suppressed             ] path: /_snapshot/s3/hourly-2018-02-14-21, params: {repository=s3, snapshot=hourly-2018-02-14-21}
org.elasticsearch.transport.RemoteTransportException: [qMs-_ZD][172.17.0.3:9300][cluster:admin/snapshot/delete]
Caused by: org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Failed to write file list
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1006) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.delete(BlobStoreRepository.java:945) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.delete(BlobStoreRepository.java:877) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.deleteSnapshot(BlobStoreRepository.java:390) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.snapshots.SnapshotsService.lambda$deleteSnapshotFromRepository$8(SnapshotsService.java:1292) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) ~[elasticsearch-6.1.0.jar:6.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 079F5E4186B33E60; S3 Extended Request ID: vXAMlUjOEdbKjsgK3rBLGfOUbA381Fnn+beCRxQd6bLCCzlMHMrQ+890nQiFH25MIV4YrkBCT1w=), S3 Extended Request ID: vXAMlUjOEdbKjsgK3rBLGfOUbA381Fnn+beCRxQd6bLCCzlMHMrQ+890nQiFH25MIV4YrkBCT1w=
	at org.elasticsearch.repositories.s3.S3BlobContainer.move(S3BlobContainer.java:177) ~[?:?]
	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.writeAtomic(ChecksumBlobStoreFormat.java:138) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1004) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.delete(BlobStoreRepository.java:945) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.delete(BlobStoreRepository.java:877) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.deleteSnapshot(BlobStoreRepository.java:390) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.snapshots.SnapshotsService.lambda$deleteSnapshotFromRepository$8(SnapshotsService.java:1292) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) ~[elasticsearch-6.1.0.jar:6.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_161]
	Suppressed: java.nio.file.NoSuchFileException: Blob [pending-index-957] does not exist
		at org.elasticsearch.repositories.s3.S3BlobContainer.deleteBlob(S3BlobContainer.java:116) ~[?:?]
		at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.writeAtomic(ChecksumBlobStoreFormat.java:142) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1004) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.delete(BlobStoreRepository.java:945) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.repositories.blobstore.BlobStoreRepository.delete(BlobStoreRepository.java:877) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.repositories.blobstore.BlobStoreRepository.deleteSnapshot(BlobStoreRepository.java:390) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.snapshots.SnapshotsService.lambda$deleteSnapshotFromRepository$8(SnapshotsService.java:1292) ~[elasticsearch-6.1.0.jar:6.1.0]
		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) ~[elasticsearch-6.1.0.jar:6.1.0]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161]
		at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: amazon_s3_exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 079F5E4186B33E60; S3 Extended Request ID: vXAMlUjOEdbKjsgK3rBLGfOUbA381Fnn+beCRxQd6bLCCzlMHMrQ+890nQiFH25MIV4YrkBCT1w=)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) ~[?:?]
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) ~[?:?]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4247) ~[?:?]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4194) ~[?:?]
	at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1870) ~[?:?]
	at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$move$6(S3BlobContainer.java:172) ~[?:?]
	at org.elasticsearch.repositories.s3.SocketAccess.lambda$doPrivilegedVoid$0(SocketAccess.java:57) ~[?:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_161]
	at org.elasticsearch.repositories.s3.SocketAccess.doPrivilegedVoid(SocketAccess.java:56) ~[?:?]
	at org.elasticsearch.repositories.s3.S3BlobContainer.move(S3BlobContainer.java:171) ~[?:?]
	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.writeAtomic(ChecksumBlobStoreFormat.java:138) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.finalize(BlobStoreRepository.java:1004) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$Context.delete(BlobStoreRepository.java:945) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.delete(BlobStoreRepository.java:877) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.deleteSnapshot(BlobStoreRepository.java:390) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.snapshots.SnapshotsService.lambda$deleteSnapshotFromRepository$8(SnapshotsService.java:1292) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) ~[elasticsearch-6.1.0.jar:6.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_161]

Concurrent snapshot exceptions (Maybe a snapshot got "stuck" and took too much to complete?)

[2018-02-16T12:05:55,791][WARN ][r.suppressed             ] path: /_snapshot/s3/hourly-2018-02-16-10, params: {repository=s3, wait_for_completion=false, snapshot=hourly-2018-02-16-10}
org.elasticsearch.transport.RemoteTransportException: [qMs-_ZD][172.17.0.3:9300][cluster:admin/snapshot/create]
Caused by: org.elasticsearch.snapshots.ConcurrentSnapshotExecutionException: [s3:hourly-2018-02-16-10] a snapshot is already running
	at org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:266) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:640) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:270) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:195) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:130) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) ~[elasticsearch-6.1.0.jar:6.1.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) ~[elasticsearch-6.1.0.jar:6.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161]

And finally, I got a com.fasterxml.jackson.core.JsonParseException: Duplicate field 'hourly-2018-02-16-04' exception.

Maybe this can be triggered by transient failures in S3?

@ywelsch
Copy link
Contributor

ywelsch commented Mar 27, 2018

Discussed with @tlrx: We are not sure how we can end up in this situation, but we plan on adding a check in BlobstoreRepository.snapshot(...) that would check for duplicate snapshot names before writing out the new index file and fail the snapshot. This would ensure that you cannot end up in this situation where the index file is broken.

@JD557
Copy link
Author

JD557 commented Apr 20, 2018

Is this check implemented in 6.2.4?

@ywelsch
Copy link
Contributor

ywelsch commented Apr 20, 2018

@JD557 no, but I've just opened a PR for this (#29634).

ywelsch added a commit that referenced this issue Apr 20, 2018
…#29634)

Adds a check in BlobstoreRepository.snapshot(...) that prevents duplicate snapshot names and fails 
the snapshot before writing out the new index file. This ensures that you cannot end up in this
situation where the index file has duplicate names and cannot be read anymore .

Relates to #28906
ywelsch added a commit that referenced this issue Apr 20, 2018
…#29634)

Adds a check in BlobstoreRepository.snapshot(...) that prevents duplicate snapshot names and fails 
the snapshot before writing out the new index file. This ensures that you cannot end up in this
situation where the index file has duplicate names and cannot be read anymore .

Relates to #28906
@ywelsch
Copy link
Contributor

ywelsch commented Apr 20, 2018

The fix will be in 6.3

@colings86 colings86 added the >bug label Apr 24, 2018
@tlrx
Copy link
Member

tlrx commented Apr 24, 2018

Talked to @ywelsch today, now the check for duplicate snapshot name has been added we think that we can't really do much for now and we are both in favor of closing this issue.

If it reappears on versions > 6.3 then we'll reopen for investigation.

@amcypher
Copy link

amcypher commented May 7, 2018

my issue is related, i think...

type":"pcap_file","_id":null,"status":400,"error":{"type":"parse_exception","reason":"Failed to parse content to map","caused_by":{"type":"json_parse_exception","reason":"Duplicate field 'ip_ip_addr'\n at

@tlrx
Copy link
Member

tlrx commented May 7, 2018

@tjc808 I don't think this is related to snapshot/restore feature but instead because you're indexing documents with duplicated fields. You can ask questions like these in the Elastic forums instead.

@DaveCTurner
Copy link
Contributor

The field name ip_ip_addr suggests you may be generating your data using an old version of tshark which had a bug in its Elasticsearch output that's now fixed.

@numeralnathan
Copy link

I am using org.elasticsearch.client:elasticsearch-rest-high-level-client version 6.8.23. I cannot use newer versions due to company requirements.

com.fasterxml.jackson.core.JsonParseException: Duplicate field 'id'
at [Source: (org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper); line: 1, column: 4814]
at com.fasterxml.jackson.core.json.JsonReadContext._checkDup(JsonReadContext.java:225)
at com.fasterxml.jackson.core.json.JsonReadContext.setCurrentName(JsonReadContext.java:219)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:799)
at com.fasterxml.jackson.core.JsonGenerator._copyCurrentContents(JsonGenerator.java:2566)
at com.fasterxml.jackson.core.JsonGenerator.copyCurrentStructure(JsonGenerator.java:2547)
at org.elasticsearch.common.xcontent.json.JsonXContentGenerator.copyCurrentStructure(JsonXContentGenerator.java:418)
at org.elasticsearch.common.xcontent.XContentBuilder.copyCurrentStructure(XContentBuilder.java:988)
at org.elasticsearch.client.RequestConverters.bulk(RequestConverters.java:240)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1761)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1735)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1697)
at org.elasticsearch.client.RestHighLevelClient.bulk(RestHighLevelClient.java:473)

Here is a simplified JSON for IndexRequest that I hope reproduces the problem.

{
"id": "...",
"dataset": {
"id": "...",
"datafile": {
"id": "...",
"permissions": {
"id": "..."
},
},
"par": {
"id": "..."
}
},
"metadata": {
"id": "pcl.1.metadata.472313f06c1140f8a8f390d0230325ab"
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs
Projects
None yet
Development

No branches or pull requests

8 participants