|
| 1 | +--- |
| 2 | +title: How to submit a job on the Data Processing platform using the CLI |
| 3 | +slug: submit-cli |
| 4 | +excerpt: Find out how to run your Apache Spark job with the Data Processing platform using the CLI |
| 5 | +section: How to |
| 6 | +order: 7 |
| 7 | +--- |
| 8 | + |
| 9 | +**Last updated 15<sup>th</sup> May, 2020** |
| 10 | + |
| 11 | +## Objective |
| 12 | +This guide helps you to upload your application code to Object Storage and submit an Apache Spark job using the Data Processing CLI. |
| 13 | + |
| 14 | +To read an introduction about the Data Processing service you can visit [Data Processing Overview](../overview){.external}. |
| 15 | + |
| 16 | +## Requirements |
| 17 | +- An OVHcloud account |
| 18 | +- An activated cloud project in your OVHcloud account (see [How to create a cloud project](../../public-cloud/getting_started_with_public_cloud_logging_in_and_creating_a_project){.external} and [How to activate the Data Processing service for your cloud project](../activation){.external} for details.) |
| 19 | +- An Openstack user in your cloud project and access to Openstack Horizon dashboard (see [How to create an Openstack user and access to Horizon](../../public-cloud/configure_user_access_to_horizon/){.external} for details.) |
| 20 | +- An application code to be run in Apache Spark environment |
| 21 | + |
| 22 | +## Instructions |
| 23 | + |
| 24 | +### Step 1: Download the Data Processing CLI |
| 25 | +Download the latest release for your system of the CLI binary available on [GitHub](https://github.com/ovh/data-processing-spark-submit/releases){.external} and save it as ``ovh-spark-submit`` or run this: |
| 26 | +```shell-session |
| 27 | +$ export SYSTEM_ARCHITECTURE=darwin_386 # or darwin_amd64 or linux_386 or linux_amd64 or windows_386 or windows_amd64 |
| 28 | +$ export DATA_PROCESSING_CLI_VERSION=$(curl -s https://api.github.com/repos/ovh/data-processing-spark-submit/releases/latest | grep "tag_name" | cut -d : -f 2,3 | tr -d \",\ ) |
| 29 | +$ wget -O ovh-spark-submit https://github.com/ovh/data-processing-spark-submit/releases/download/$DATA_PROCESSING_CLI_VERSION/ovh-spark-submit_$SYSTEM_ARCHITECTURE |
| 30 | +``` |
| 31 | +If you are on Linux or MacOs, you may have to run this command to make it executable: |
| 32 | + |
| 33 | +```shell-session |
| 34 | +$ chmod u+x ovh-spark-submit |
| 35 | +``` |
| 36 | + |
| 37 | +### Step 2: Set up the configuration.ini file |
| 38 | +To be able to submit a job with your CLI, you must set up some configurations that will allow it to authenticate to the OVHcloud API. |
| 39 | +To do so, you will need an application key, a secret application key and a consumer key. These can be obtained [here]( https://eu.api.ovh.com/createToken/){.external}. |
| 40 | + |
| 41 | +You need to add the rights ``GET/POST/PUT`` on the endpoint ``/cloud/project/\*/dataProcessing/\*`` . |
| 42 | +{.thumbnail} |
| 43 | + |
| 44 | +Once you have got your keys, you have to create a new ``configuration.ini`` file in the same directory and complete it with your 3 keys. |
| 45 | +``` |
| 46 | +[ovh] |
| 47 | +; configuration specific to 'ovh-eu' endpoint as it's the only one available for now |
| 48 | +endpoint=ovh-eu |
| 49 | +application_key=my_app_key |
| 50 | +application_secret=my_application_secret |
| 51 | +consumer_key=my_consumer_key |
| 52 | +``` |
| 53 | + |
| 54 | +### Step 3: Upload your application code to Object Storage |
| 55 | +Before running your job in the Data Processing platform, you will need to create a container in OVHcloud Object Storage. |
| 56 | +You can work with your Object Storage using either the OVHcloud Manager or the Openstack Horizon dashboard. |
| 57 | + |
| 58 | +Please see [Creating Storage Containers in Customer Panel](../../storage/pcs/create-container/){.external} or [Create an object container in Horizon](../../storage/create_an_object_container/){.external} for more details. |
| 59 | + |
| 60 | +You can also manage your Object storage through command line with the [Openstack Swift API](https://docs.ovh.com/gb/en/public-cloud/getting_started_with_the_swift_api/){.external} |
| 61 | + |
| 62 | +When it is created, upload your application code in your container. If you don't have any application code, you can still try the CLI with the examples files provided inside the ``testdata`` directory of the [GitHub project](https://github.com/ovh/data-processing-spark-submit){.external}. |
| 63 | +If you want to submit a python job, do not forget to upload your environment.yml file (see [How to generate environment file for Python jobs](../generate-environment){.external}) |
| 64 | + |
| 65 | +>[!primary] |
| 66 | +> |
| 67 | +> If you only have your application code to upload, you can use the auto-upload feature of the CLI instead of uploading it manually (see **step 4: Submit a job - Optionally use auto-upload**). |
| 68 | +
|
| 69 | + |
| 70 | +### Step 4: Submit a job |
| 71 | + |
| 72 | +Now everything is ready, let's submit a job ! |
| 73 | + |
| 74 | +To launch a job with the Data Processing CLI, you have to run the executable file you previously built with your job configurations as parameters. |
| 75 | + |
| 76 | +Here is an example of command you could run to submit a SparkPi job in java/scala: |
| 77 | + |
| 78 | +```shell-session |
| 79 | +$ ./ovh-spark-submit --projectid yourProjectId --class org.apache.spark.examples.SparkPi --driver-cores 1 --driver-memory 4G --executor-cores 1 --executor-memory 4G --num-executors 1 swift://odp/spark-examples.jar 1000 |
| 80 | +``` |
| 81 | + |
| 82 | +In this example, the application code ``spark-examples.jar``is stored in the ``odp`` container on Object Storage, the Spark driver and its executor have 1 core and 4 gibibytes of memory. Once the Spark cluster deployed the script will run with ``1000`` as argument. |
| 83 | + |
| 84 | +>[!primary] |
| 85 | +> |
| 86 | +> Some of the parameters can be set as environment variables, such as ** --projectid**. |
| 87 | +
|
| 88 | +The ovh-spark-submit CLI provides a part of the parameters of spark-submit to configure your job. |
| 89 | +If you want to know more about these parameters run: |
| 90 | + |
| 91 | +```shell-session |
| 92 | +$ ./ovh-spark-submit -h |
| 93 | +``` |
| 94 | + |
| 95 | +If you don't know how to set these parameters values, please refer to the page [How to fill the job submit form in the Data Processing page from the OVHcloud Manager](../job-submit-form){.external}) |
| 96 | + |
| 97 | +While your job is running, you can watch logs in your terminal or access Spark UI through this URL: |
| 98 | + |
| 99 | +``https://adc.{region}.dataconvergence.ovh.com/{your-job-id}/jobs/`` |
| 100 | + |
| 101 | +Where **region** refers to the region you chose to submit your job. |
| 102 | + |
| 103 | +At any time, you can stop your job by pressing ``Ctrl+C``. If you do so, the CLI will ask you to confirm that you want to cancel the job before killing it. |
| 104 | + |
| 105 | +If you want to check your results after the job has finished, you can download its logs from your Object Storage (see [Checking a job's logs in the Data Processing manager's page](../check-logs)). |
| 106 | + |
| 107 | +#### Optionally use auto-upload |
| 108 | +If you want to save time when you often need to change your application code, the auto-upload feature of the CLI allows you to upload your code on Object Storage automatically. |
| 109 | +Uploading a file this way into your Object Storage will overwrite previous artefacts with the same name. |
| 110 | +To enable it, you need to update the ``configuration.ini`` file to add the configurations needed for the protocol you want to use. |
| 111 | + |
| 112 | +For now, the only protocol supported is OpenStack Swift. Here is how you should update you ``configuration.ini`` file in order to upload your code with this protocol: |
| 113 | + |
| 114 | +``` |
| 115 | +[ovh] |
| 116 | +; configuration specific to 'ovh-eu' endpoint as it's the only one available for now |
| 117 | +endpoint=ovh-eu |
| 118 | +application_key=my_app_key |
| 119 | +application_secret=my_application_secret |
| 120 | +consumer_key=my_consumer_key |
| 121 | +
|
| 122 | +; configuration specific for protocol swift (OVHcloud Object Storage with Keystone v3 authentication) |
| 123 | +[swift] |
| 124 | +user_name=openstack_user_name |
| 125 | +password=openstack_password |
| 126 | +auth_url=openstack_auth_url |
| 127 | +domain=openstack_auth_url_domain |
| 128 | +region=openstack_region |
| 129 | +``` |
| 130 | + |
| 131 | +And here is an example of command you could run to run the same job after uploading your local code (``spark-examples.jar``) to your ``odp`` Object Storage container with the Swift protocol: |
| 132 | + |
| 133 | +```shell-session |
| 134 | +$ ./ovh-spark-submit --project-id yourProjectId --upload ./spark-examples.jar --class org.apache.spark.examples.SparkPi --driver-cores 1 --driver-memory 4G --executor-cores 1 --executor-memory 4G --num-executors 1 swift://odp/spark-examples.jar 1000 |
| 135 | +``` |
| 136 | + |
| 137 | +## Go further |
| 138 | + |
| 139 | +To learn more about using Data Processing and how to create cluster and process your data, we invite you to look at [Data Processing documentations page](../). |
| 140 | + |
| 141 | +You can send your questions, suggestions or feedbacks in our community of users on [https://community.ovh.com/en/](https://community.ovh.com/en/){.external} or in our public [Gitter](https://gitter.im/ovh/data-processing){.external} |
0 commit comments