-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Adds average document size to DocsStats #27117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change is required in order to support a size based check for the index rollover. The average document size is estimated by sampling only existing segments. We prefer using segments rather than StoreStats because StoreStats is not reliable if indexing or merging operations are in progress. Relates elastic#27004
@jasontedor, Should we also expose this stat to the REST layer (eg. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor comment but in general it LGTM.
@@ -57,16 +66,26 @@ public long getDeleted() { | |||
return this.deleted; | |||
} | |||
|
|||
public long getAverageSizeInBytes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add javadocs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@Override | ||
public void readFrom(StreamInput in) throws IOException { | ||
count = in.readVLong(); | ||
deleted = in.readVLong(); | ||
if (in.getVersion().onOrAfter(Version.V_6_1_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to make it Version.V_7_0_0
for now and change it back after this change is backported in order not to cause failures in the multi-version cluster qa tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hint. I temporarily made this for v7 only (f38e957)
indexShard = newStartedShard(true); | ||
int smallDocNum = randomIntBetween(5, 100); | ||
for (int i = 0; i < smallDocNum; i++) { | ||
indexDoc(indexShard, "test", "small-" + i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've been trying to use doc
as a type name in new tests whenever possible. Can you rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a suggestion.
@@ -880,8 +880,15 @@ public FlushStats flushStats() { | |||
} | |||
|
|||
public DocsStats docStats() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should rather be something like this:
public DocsStats docStats() {
long numDocs = 0;
long numDeletedDocs = 0;
long sizeInByte = 0;
List<Segment> segments = segments(false);
for (Segment segment : segments) {
if (segment.search) {
numDocs += segment.getNumDocs();
numDeletedDocs += segment.getDeletedDocs();
sizeInByte += segment.getSizeInBytes();
}
}
return new DocsStats(numDocs, numDeletedDocs, sizeInByte);
}
that way we maintain a consistent total and we can calculate the average at read time and aggregation of doc stats will be much simpler? I also think we should make sure the size in bytes is based on the currently used reader which is guaranteed by the Segment#search
flag. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this makes the DocsStats simpler and the average value more accurate. I have updated this in 662f062
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two really minor comments, this LGTM regardless of what you choose :)
} | ||
|
||
public void add(DocsStats docsStats) { | ||
if (docsStats == null) { | ||
public void add(DocsStats that) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally that
is a bit too close to this
for typo avoidance, so I'd prefer other
, but it's so minor that it's totally up to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also prefer other
over that
(addressed in 6cca080) but I used that
in order to have the same indention for this expression (removed).
- long totalBytes = this.averageSizeInBytes * (this.count + this.deleted)
- + that.averageSizeInBytes * (that.count + that.deleted);
@Override | ||
public void readFrom(StreamInput in) throws IOException { | ||
count = in.readVLong(); | ||
deleted = in.readVLong(); | ||
if (in.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) { | ||
totalSizeInBytes = in.readVLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set totalSizeInBytes
to -1 to indicate that it cannot be read and not that there are 0 bytes in use for 6.x nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 96e9be7
@Override | ||
public void readFrom(StreamInput in) throws IOException { | ||
count = in.readVLong(); | ||
deleted = in.readVLong(); | ||
if (in.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this should be V_6_1_0
since you will be backporting this to the 6.x branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpountz recommended to make this for v7, then change for the backport later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, it should be 7.0.0. Then when you backport set it to 6.1.0 in the 6.x branch and make sure that the BWC tests in master against 6.x pass (you might have to skip some of them). Then push a commit to master flipping the version to 6.1.0 and removing the skips.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh okay, I hadn't realized we were doing it the reverse way now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean by the reverse way, these are part of the steps to have green CI on all branches every step of the way.
} | ||
|
||
@Override | ||
public void writeTo(StreamOutput out) throws IOException { | ||
out.writeVLong(count); | ||
out.writeVLong(deleted); | ||
if (out.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here for V_6_1_0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This change is required in order to support a size based check for the index rollover. The index size is estimated by sampling the existing segments only. We prefer using segments to StoreStats because StoreStats is not reliable if indexing or merging operations are in progress. Relates #27004
* master: (63 commits) [Docs] Fix note in bucket_selector [Docs] Fix indentation of examples (elastic#27168) [Docs] Clarify `span_not` query behavior for non-overlapping matches (elastic#27150) [Docs] Remove first person "I" from getting started (elastic#27155) [Docs] Correct link target for datatype murmur3 (elastic#27143) Fix division by zero in phrase suggester that causes assertion to fail Enable Docstats with totalSizeInBytes for 6.1.0 Adds average document size to DocsStats (elastic#27117) Upgrade Painless from ANTLR 4.5.1-1 to ANTLR 4.5.3. (elastic#27153) Exists template needs a template name (elastic#25988) [Tests] Fix occasional test failure due to two random values being the same Fix beidermorse phonetic token filter for unspecified `languageset` (elastic#27112) Fix max score tracking with field collapsing (elastic#27122) [Doc] Add Ingest CSV Processor Plugin to plugin as a community plugin (elastic#27105) Removed the beta tag from cross-cluster search fixed typo in ConstructingObjectParse (elastic#27129) Allow for the Painless Definition to have multiple instances (elastic#27096) Apply missing request options to the expand phase (elastic#27118) Only pull SegmentReader once in getSegmentInfo (elastic#27121) Fix BWC for discovery stats ...
This change is required in order to support a size based check for the
index rollover.
The average document size is estimated by sampling only existing
segments. We prefer using segments rather than StoreStats because
StoreStats is not reliable if indexing or merging operations are in
progress.
Relates #27004