-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[FEATURE][ML] Parse results and join them in the data-frame copy index #36382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE][ML] Parse results and join them in the data-frame copy index #36382
Conversation
Pinging @elastic/ml-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I just noted a few minor things
@@ -18,7 +18,7 @@ | |||
* but in the context of the java side it is more descriptive to call this the | |||
* end of data message. | |||
*/ | |||
private static final String END_OF_DATA_MESSAGE_CODE = "r"; | |||
private static final String END_OF_DATA_MESSAGE_CODE = "$"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still 'r'
on the C++ side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This https://github.com/elastic/ml-cpp/pull/334/files changes it to $
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK cool. I missed that.
|
||
LOGGER.debug("[{}] Closing process", jobId); | ||
LOGGER.info("[{}] Waiting for result processor to complete", jobId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'll want this to be an INFO
in production. If you want to leave it like this on the feature branch please add a TODO
to downgrade it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this shouldn't be an info message. I left it on purpose as I think it's useful during development. I had in mind that we'd review all logging when this is refactored into persistent tasks. Not sure I'd have a todo for each one of them, but I can add them if you think it's best.
} catch (IOException e) { | ||
LOGGER.error(new ParameterizedMessage("[{}] Error writing data to the process", jobId), e); | ||
} finally { | ||
LOGGER.info("[{}] Closing process", jobId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto either downgrade this now or add a TODO
to do so before merging the feature branch.
} | ||
AnalyticsResult result = currentResults.get(i); | ||
SearchHit hit = row.getHit(); | ||
Map<String, Object> source = new HashMap(hit.getSourceAsMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want the new fields to come at the end of the source document? If so, this should use LinkedHashMap
instead of HashMap
. (Maybe we don't care though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I am not sure what is actually affected by the order. But I think it makes sense to append them at the end as the least invasive annotation of the original rows.
public Iterator<AutodetectResult> parseResults(InputStream in) throws ElasticsearchParseException { | ||
public class ProcessResultsParser<T> { | ||
|
||
private static final Logger LOGGER = LogManager.getLogger(ProcessResultsParser.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tradition is lowercase logger
for static loggers, as the object is not immutable.
@@ -35,7 +35,7 @@ public void testWriteEndOfData() throws IOException { | |||
InOrder inOrder = inOrder(lengthEncodedWriter); | |||
inOrder.verify(lengthEncodedWriter).writeNumFields(4); | |||
inOrder.verify(lengthEncodedWriter, times(3)).writeField(""); | |||
inOrder.verify(lengthEncodedWriter).writeField("r"); | |||
inOrder.verify(lengthEncodedWriter).writeField("$"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still r
in C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.