Skip to content

Painless: Add an Ingest Script Processor Example #32302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Aug 9, 2018

Conversation

jdconrad
Copy link
Contributor

This PR adds two pieces. The first is a small set of documentation providing instructions on how to get setup to run context examples. This will require a download similar to how Kibana works for some of the examples. The second is an ingest processor example using the downloaded data. More examples will follow as ideally one per PR. This also adds a set of tests to individually test each script as a unit test.

@jdconrad jdconrad added >docs General docs changes :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache v7.0.0 v6.4.0 labels Jul 23, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@jdconrad jdconrad requested a review from debadair July 23, 2018 22:13
@jdconrad jdconrad mentioned this pull request Jul 23, 2018
22 tasks
@jdconrad jdconrad added v6.5.0 and removed v6.4.0 labels Jul 25, 2018
@jdconrad
Copy link
Contributor Author

@elasticmachine test this please

Copy link
Contributor

@debadair debadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you have any questions!

for use in the examples. The data is a fictional set of plays and their
locations and times where each individual document represents a single seat in
a theater for a single play.
+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I try to avoid "meta data" like "for use in the examples" unless it's really needed for clarification. If you set the stage in the intro, "for use in the examples" is redundant.

Since you've already presented this as example data, there's really no need to call it out as "fictional". I'd just say, "this data set contains booking information for a collection of plays. Each document represents a single seat for a play at a particular theater on a specific date and time."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


Use the commands below to setup an Elasticsearch cluster to execute the example
scripts for each context where an example is provided. Each example must be
executed in the order of the table in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd flip this around and say something like, "To run the examples, you first need to index the sample data into Elasticsearch:"

...and then step through that process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

a theater for a single play.
+
Each document contains the following values:
+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally talk about the "fields" in a document, rather than "values".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

. {es_version}/running-elasticsearch.html[Run] an elasticsearch cluster. The
examples assume the cluster is running locally. Modify the examples as necessary
to execute against a remote cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we normally say, "Start Elasticsearch." and show the command. I'd make the bit about the examples assuming localhost as a note. Maybe, "NOTE: These examples assume Elasticsearch and Kibana are running locally. To use the Console editor in a remote Kibana instance, click the settings icon and enter the Console URL. To submit a cURL request to a remote Elasticsearch instance, you'll need to edit the request."

FYI, We have a "wishlist" item to be able to specify the Elasticsearch URL for copy as curl like we do for the console.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. And I hope that "wishlist" item comes true!


. Create {es_version}/mapping.html[mappings] using the following console
command:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just say, "Create mappings for the sample data:"

There are actually large debates about that style, but to my mind "using the following console command" is redundant when you immediately show the following console command. My personal style is: Tell them what to do. Show them how to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. And agreed, just slowly working on my tech writing skills :)


. Execute the <<painless-ingest-processor-context, ingest>> example. This is a
requirement to upload the seat data for use in all the examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we try to use "run" rather than "execute". Here, I'd say, "Run the ingest example to upload the seat data:" And then show them how to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


. Upload the seat data using the following console command:
+
[source,js]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems...redundant with the previous step? And it feels contradictory to say "console command" and then explicitly limit it to a curl command? If we can show it as a Console command, we should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ingest command doesn't actually upload the data, it just sets up an ingest pipeline for incoming data to be processed so this is still a required step. Is that what you meant with the question? And, yes I wasn't thinking much about the consistency between the difference of console and curl. Thank you for pointing that out.

To use this example first follow the steps outlined in
<<painless-context-examples, context examples>>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be more explicit... "To run this example, you need to <<painless-context-examples, ingest the sample data>>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

`Strings` into a useful format for storage into a date field. Upon completion of
the script each document will have its `datetime` field value filled in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly editorial...I'd say:

The seat data contains:

  • A date in the format YYYY-MM-DD where the second digit of both month and day is optional.
  • A time in the format HH:MM* where the second digit of both hours and minutes is optional. The star (*) represents either the String AM or PM.

The following ingest script processes the date and time Strings and stores the result in a datetimefield.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's much better. Fixed.


Use the following curl command to create the ingest script processor:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above--conflating console vs cURL. I generally phrase this as, "Submit the following request to xyz:"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jdconrad
Copy link
Contributor Author

jdconrad commented Aug 1, 2018

@debadair Thanks for the review! Will address all of these soon.

@jdconrad
Copy link
Contributor Author

jdconrad commented Aug 6, 2018

@debadair Ready for the next round of review.

Copy link
Contributor

@debadair debadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor editorial comments, but LGTM.

Each example must be run in the order of the table in the
<<painless-contexts, previous section>> to work correctly. To run each example,
first index the sample data into Elasticsearch:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about, "To run the examples, you need to index the sample seat data into Elasticsearch. The examples must be run sequentially to work correctly."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

editor in a remote Kibana instance, click the settings icon and enter the
Console URL. Edit a request to submit a cURL request to a remote Elasticsearch
instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say, "with a remote Kibana instance" and flip the last sentence around, "To submit a cURL request to a remote Elasticsearch instance, edit the request URL."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


. Run the <<painless-ingest-processor-context, ingest>> example. This is a
requirement to upload the seat data for use in all the examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just say, "This uploads the seat data that you need to run the examples."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@jdconrad
Copy link
Contributor Author

jdconrad commented Aug 9, 2018

@debadair Thanks for the reviews! Will commit as soon as CI passes.

@jdconrad jdconrad merged commit 293c8a2 into elastic:master Aug 9, 2018
jdconrad added a commit that referenced this pull request Aug 9, 2018
This commit adds two pieces. The first is a small set of documentation providing 
instructions on how to get setup to run context examples. This will require a download 
similar to how Kibana works for some of the examples. The second is an ingest processor 
example using the downloaded data. More examples will follow as ideally one per PR. 
This also adds a set of tests to individually test each script as a unit test.
@jdconrad jdconrad mentioned this pull request Oct 26, 2018
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >docs General docs changes v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants