Skip to content

[FEATURE][ML] Parse results and join them in the data-frame copy index #36382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dimitris-athanasiou
Copy link
Contributor

No description provided.

@dimitris-athanasiou dimitris-athanasiou added the :ml Machine learning label Dec 7, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I just noted a few minor things

@@ -18,7 +18,7 @@
* but in the context of the java side it is more descriptive to call this the
* end of data message.
*/
private static final String END_OF_DATA_MESSAGE_CODE = "r";
private static final String END_OF_DATA_MESSAGE_CODE = "$";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still 'r' on the C++ side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK cool. I missed that.


LOGGER.debug("[{}] Closing process", jobId);
LOGGER.info("[{}] Waiting for result processor to complete", jobId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'll want this to be an INFO in production. If you want to leave it like this on the feature branch please add a TODO to downgrade it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this shouldn't be an info message. I left it on purpose as I think it's useful during development. I had in mind that we'd review all logging when this is refactored into persistent tasks. Not sure I'd have a todo for each one of them, but I can add them if you think it's best.

} catch (IOException e) {
LOGGER.error(new ParameterizedMessage("[{}] Error writing data to the process", jobId), e);
} finally {
LOGGER.info("[{}] Closing process", jobId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto either downgrade this now or add a TODO to do so before merging the feature branch.

}
AnalyticsResult result = currentResults.get(i);
SearchHit hit = row.getHit();
Map<String, Object> source = new HashMap(hit.getSourceAsMap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the new fields to come at the end of the source document? If so, this should use LinkedHashMap instead of HashMap. (Maybe we don't care though.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I am not sure what is actually affected by the order. But I think it makes sense to append them at the end as the least invasive annotation of the original rows.

public Iterator<AutodetectResult> parseResults(InputStream in) throws ElasticsearchParseException {
public class ProcessResultsParser<T> {

private static final Logger LOGGER = LogManager.getLogger(ProcessResultsParser.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tradition is lowercase logger for static loggers, as the object is not immutable.

@@ -35,7 +35,7 @@ public void testWriteEndOfData() throws IOException {
InOrder inOrder = inOrder(lengthEncodedWriter);
inOrder.verify(lengthEncodedWriter).writeNumFields(4);
inOrder.verify(lengthEncodedWriter, times(3)).writeField("");
inOrder.verify(lengthEncodedWriter).writeField("r");
inOrder.verify(lengthEncodedWriter).writeField("$");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still r in C++.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dimitris-athanasiou dimitris-athanasiou merged commit fd3e6a9 into elastic:feature-ml-data-frame-analytics Dec 11, 2018
@dimitris-athanasiou dimitris-athanasiou deleted the parse-results-and-join-them-in-data-frame-copy-index branch December 11, 2018 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants