-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Change GeoIP downloader policy after 30 days of no updates #74099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-core-features (Team:Core/Features) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @probakowski! I left a few questions.
modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java
Outdated
Show resolved
Hide resolved
@@ -63,13 +67,16 @@ | |||
TimeValue.timeValueDays(3), TimeValue.timeValueDays(1), Property.Dynamic, Property.NodeScope); | |||
public static final Setting<String> ENDPOINT_SETTING = Setting.simpleString("ingest.geoip.downloader.endpoint", | |||
"https://geoip.elastic.co/v1/database", Property.NodeScope); | |||
public static final Setting<TimeValue> DATABASE_VALIDITY = Setting.timeSetting("ingest.geoip.database.validity", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this constant is currently unused. The setting it self is used in testing, and I assume that this setting's only purpose is for testing? If so then we need to make sure that this setting can't be set in production, because otherwise the 30 days check can be circumvented. Is this the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the constant as it was leftover, but similar setting (ingest.geoip.database_validity
) is used by tests as you said. It's registered only by test module (similarly to ingest.geoip.database_path
) so it shouldn't be available in production
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
state = state.put(name, new Metadata(System.currentTimeMillis(), old.getFirstChunk(), old.getLastChunk(), old.getMd5())); | ||
long timestamp = System.currentTimeMillis(); | ||
UpdateByQueryRequest request = new UpdateByQueryRequest(DATABASES_INDEX).setRefresh(true); | ||
request.setScript(new Script("ctx._source.timestamp=" + timestamp + "L")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we record the timestamp elsewhere instead of adding a field to the chunk documents and updating all these documents? Maybe store it in the cluster state as mapping metadata or custom data in IndexMetadata
? Under the hood this results in reindexing all the chunk documents, which would be nice if this could be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've added additional field in GeoIpTaskState.Metadata
to use in validity test instead of updating timestamp and it does simplify code greatly. Thanks for suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is great! 👍
10dcb7d
to
5a449db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I left a few more comments.
Also I think we should add a section to the documentation that describes when a database can become invalid and what the behaviour is of the geoip processor in that case. Doing this in a followup pr is fine with me too.
modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java
Outdated
Show resolved
Hide resolved
modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java
Show resolved
Hide resolved
client().admin().cluster() | ||
.prepareUpdateSettings() | ||
.setPersistentSettings(Settings.builder() | ||
.put("ingest.geoip.database_validity", TimeValue.timeValueMillis(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this test be adjust (or add another version of this test), so that we first test that ingestion works as expected and then we change ingest.geoip.database_validity
to 1ms
and expect that _geoip_expired_database
tag is added during ingestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally can we also verify that each node doesn't have any local database files any more after the databases have become invalid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about verifying that ingestion works prior invalidation, I've added it.
Check for no local database files is there already, around line 129
modules/ingest-geoip/src/test/java/org/elasticsearch/ingest/geoip/GeoIpProcessorTests.java
Show resolved
Hide resolved
modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/DatabaseRegistry.java
Show resolved
Hide resolved
Thanks for the review @martijnvg
I think we will need to add it in PR that will enable geoip downloader again - only then database can get invalid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…4099) This PR changes the way GeoIpDownloader and GeoIpProcessor handle situation when we are unable to update databases for 30 days. In that case: GeoIpDownloader will delete all chunks from .geoip_databases index DatabaseRegistry will delete all files on ingest nodes GeoIpProcessor will tag document with tags: ["_geoip_expired_database"] field (same way as in Logstash) This change also fixes bug with that breaks DatabaseRegistry and when it tires to download databases after updating timestamp only (GeoIpDownloader checks if there are new databases and updates timestamp because local databases are up to date)
) (#74296) * Change GeoIP downloader policy after 30 days of no updates (#74099) This PR changes the way GeoIpDownloader and GeoIpProcessor handle situation when we are unable to update databases for 30 days. In that case: GeoIpDownloader will delete all chunks from .geoip_databases index DatabaseRegistry will delete all files on ingest nodes GeoIpProcessor will tag document with tags: ["_geoip_expired_database"] field (same way as in Logstash) This change also fixes bug with that breaks DatabaseRegistry and when it tires to download databases after updating timestamp only (GeoIpDownloader checks if there are new databases and updates timestamp because local databases are up to date) * fix compilation
This PR changes the way
GeoIpDownloader
andGeoIpProcessor
handle situation when we are unable to update databases for 30 days. In that case:GeoIpDownloader
will delete all chunks from.geoip_databases
indexDatabaseRegistry
will delete all files on ingest nodesGeoIpProcessor
will tag document withtags: ["_geoip_expired_database"]
field (same way as in Logstash)This change also fixes bug with that breaks
DatabaseRegistry
and when it tires to download databases after updating timestamp only (GeoIpDownloader
checks if there are new databases and updates timestamp because local databases are up to date)