-
Notifications
You must be signed in to change notification settings - Fork 3
How to use with notebook #1
Comments
Hi Eric - please keep in mind this is completely experimental stuff and is subject to change (which includes its complete removal) at any time - caveat emptor. 😄 Yes - that is the Notebook PR on which to base your testing. Given the branch is 2+ years old and completely reworks kernel management, it will certainly have merge issues. In addition, I've had to make changes to get remote kernels working but haven't conveyed those back to @takluyver. I'll try to add further instructions for setting up an environment. It will consist of manual branch pulls , builds, and installs. I've been able to run kernels in YARN Cluster mode so far - looking at Kubernetes now. There are still issues with restarts and configuration that need to be worked out. Not sure what your yarn_kernel_provider question is about. The yarn provider implementation was posted 5 days ago in https://github.com/gateway-experiments/yarn_kernel_provider which is another thing that might change (i.e., one repo per provider + the base). |
Happy to cook, pull and merge branches when you provide the receipt. |
Ok - once kubernetes is in a working state, docker should just follow suit. However, before we include more than k8s, yarn (and the remote base), I want to take a step back and make sure this "repo layout" and deployment model is what we really want. Also keep in mind that all the configuration support for k8s and docker (helm charts, compose scripts, etc.) are geared toward EG being the server. Those will require rework to have Notebook be the server and that may prove challenging. |
Happy to test on K8S or YARN with EG as server. |
Thanks. EG will not be usable with kernel providers for some time. I suspect you meant to say with Notebook as a server. |
yes, notebook as a server. sorry. |
Hi Eric - I really apologize for the delay here (been side-tracked lately). I've managed to get back to this stuff this week. I'm currently in the process of getting the k8s kernels off the ground. I'm running into issues with the base notebook branch (that we need to use) - which is preventing getting to the crux of the kubernetes stuff. At any rate, my plan is get k8s kernels minimally working. This should validate the ContainerKernelLIfecycleManager on which the Docker lifecycle managers also depend (we've changed the 'process proxies' to 'kernel lifecycle managers' (after all, what's a few more characters to type 😄). Then, if you're okay with things, it would be great if you could follow the k8s lead to get the docker stuff working! I also owe some instructions for getting things going. I'd prefer to wait until one of the PRs is merged on the jupyter_kernel_mgmt repo that allows us to derive from the KernelSpecProvider rather than implement our own 'find' logic. Once that is merged, we'll make a release of that module - which will simplify deployments. That said, I'd be happy to share which 'branches' are necessary if you're chomping at the bit. There are many issues to work out in this 'provider world' and I'd have to say the jury is still out on this (IMHO). I hope that helps. |
@kevin-bates Thx for the updates and the hard road you are following to lead us to a distributed and managed kernel world. Ping me if I can help to test or solve outstanding issues. |
Hi Eric - thank you for your patience. I was able to get a "minimally viable" kubernetes instance running. I also have created a I also decided that, while we're still checking things out, a "fake" release might be the best way to convey setup instructions. This allows me to attach pre-built files to the release so others don't need to pull branches, wait for PRs to be merged, etc. You can find this 'interim-dev' release here. |
Thx @kevin-bates. I have been through the interim-dev release instructions and tried the on-prem (non container) setup: notebook, juptyer_kernel_mgmt, remote_kernel_provider and docker concrete implementation of remote_kernel_provider. I deployed the kernelspec and tried the Docker Kernel. This fails with following exception.
Comments on the
Then I deployed the K8S Spec on minikube. I get running I wonder how to deploy a notebook outside of K8S which would act as an Enterprise Gateway. |
@kevin-bates To complement previous comment, I wanted first to install the kubernetes_kernel_provider as concrete implementation in the on-prem scenario, but decided not to do that as the K8S |
Hi Eric - thanks for the update. The on-prem stuff only applies to the YARN support. I've minimally tested with YARN and Kubernetes. For K8s, the notebook server runs in k8s as well. Same will be the case for Docker. We do not support container envs using on-prem kernels or vice versa. Regarding your comments:
That's fine. The startup script was pulled (obviously) from my EG YARN env.
As noted above, I "tested" YARN using on-prem configs and K8s using the full container configs. If you have a YARN env to hit, it might be a good exercise to ensure you see the same YARN behaviors, but that config is not applicable to the container configs. If you choose to checkout on-prem YARN, you might want to validate YARN using EG first.
Some might come into play. It depends on where those variables get used in EG. However, some apply to DistributedProcessProxy - which hasn't (and may never) gotten implemented in kernel provider land yet.
Not sure what you mean. If you're saying you'd like to take a Notebook server and run it in headless mode, where a "front-end" configured with NB2KG hits that server... you won't be able to do that for a couple reasons. 1, the token management stuff will get in the way. 2. Kernel Gateway has handlers that look for an 'env' stanza in the kernel startup request. This functionality doesn't exist in Notebook server. The play is to plumb the new jupyter_server with this capability. At that time, we'd likely add support for parameterized kernels where the kernel parameters also appear in a stanza in the kernel startup request. |
|
Good to know. So it is like Will further work on validation on-prem with YARN, with prelimary validation of my setup with EG on YARN.
I was thinking to |
I'm still not sure I'm understanding what you're completely driving at. So you're With your original question, I thought you wanted something like EG running OUTSIDE of k8s but launches kernels INSIDE k8s - which we don't support. The entity that launches the kernels must also be within the same "network". If that's still not what you're getting at, I'd be happy to setup a webex or something. Feel free to ping me at [email protected] or on our gitter channel/room. |
For the inside/outside questions, I will take more time to align my thoughts in a picture and share it for discussion. For the on premise on YARN, I have a working setup in EG. Trying the setup for this remote kernels, I receive following exception (issue with the port range):
|
I'm afk at the moment but are you using the yarn kernelspecs tar file? This sounds like it's not using the yarn lifecycle manager that should be defined in a yarn_kernel.json file. Can you provide the json file you're using? |
Right, I was using the wrong spec. From https://github.com/gateway-experiments/remote_kernel_provider/releases/download/v0.1-interim-dev/yarn_kernelspecs.tar I am trying the My py libs:
|
I have renamed The content of my json which BTZ is YARN Cluster mode - There is no YARN Client mode in the interim distribution: Normal? I would feel better with the YARN Client...
|
Correct. Yarn client mode is performed via the distributed process proxy,
which is not implemented and may never be. If you want to work with yarn in
this new config you must use cluster mode for now.
…On Sat, Jul 13, 2019 at 10:58 AM Eric Charles ***@***.***> wrote:
I have renamed yarn_kernel.json to kernel.json and the kernel shows up in
the UI but returns the same exception.
The content of my json which BTZ is YARN Cluster mode - There is no YARN
Client mode in the interim distribution: Normal? I would feel better with
the YARN Client...
{
"language": "python",
"display_name": "3 Spark - Python (YARN Cluster Mode)",
"metadata": {
"lifecycle_manager": {
"class_name": "yarn_kernel_provider.yarn.YarnKernelLifecycleManager"
}
},
"env": {
"SPARK_HOME": "/opt/spark",
"PYSPARK_PYTHON": "/opt/conda/envs/datalayer/bin/python",
"PYTHONPATH": "${HOME}/.local/lib/python3.7/site-packages:/opt/spark/python/lib/py4j-0.10.7-src.zip",
"SPARK_OPTS": "--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/home/${KERNEL_USERNAME}/.local --conf spark.yarn.appMasterEnv.PYTHONPATH=${HOME}/.local/lib/python3.7/site-packages:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/conda/bin:$PATH ${KERNEL_EXTRA_SPARK_OPTS}",
"LAUNCH_OPTS": ""
},
"argv": [
"/usr/local/share/jupyter/kernels/spark_python_yarn_cluster_3/bin/run.sh",
"--RemoteProcessProxy.kernel-id",
"{kernel_id}",
"--RemoteProcessProxy.response-address",
"{response_address}",
"--RemoteProcessProxy.port-range",
"{port_range}",
"--RemoteProcessProxy.spark-context-initialization-mode",
"lazy"
]
}
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1?email_source=notifications&email_token=AFMNPCFSUSARMB7WDF3J26LP7IJUDA5CNFSM4HXGNGXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ3WQQI#issuecomment-511141953>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFMNPCDYXH3UYJMR47LEQT3P7IJUDANCNFSM4HXGNGXA>
.
|
When working with kernels in a "provider world", files named The RemoteKernelProvider subclasses only get their "search" functionality from the KernelSpecProvider. The launch code is independent. I already see there's going to be confusion about YARN client mode not being related to the YarnKernelProvider. However, in reality, YARN client mode is purely a launch thing. No special lifecycle management is required for client mode. I could modify a Not sure this is helpful. Looking forward to your diagram. |
Thx @kevin-bates, explanation is helpful here. Final goal is K8S (which works locally with your interim release) but YARN is useful for me also. I have YARN Cluster running fine on EG. For YARN Cluster remote kernel (with jupyter-kernel-mgmt 0.3.0 and notebook 6.0.0.dev0):
PS: This will be busy week for me, then I will first concentrate your view of the world (on premise with YARN Cluster, the rest being |
Hi Eric - I suspect your YARN cluster issues are because you're using the version of jupyter_kernel_mgmt from PyPi and not the wheel file attached to interim-dev release and, as a result, you're missing the code to recognize prefixed kernel.json files of other providers. Renaming To confirm you're using the correct jupyter_kernel_mgmt file, its source contents should resemble those of this PR, namely the use of the |
Thx for the hint Kevin. I was suspecting that but I double-checked and I use the interim wheel (I can see A version 0.0.4 different from the already published pip may remove my confusion. So far My user feedback is: more doc (I can help) - Impressed with the K8S where the server spins directly distributed kernels without addition Gateway. |
Sorry for the difficulty Eric. I will create a fresh conda env and try following the directions from the interim-dev release to see if I encounter similar behaviors. You should also be sure you're using the correct notebook build since using the released version would also exhibit the behaviors you're seeing. Regarding the |
Progressing a bit: I have found in
and was returning the default value localhost (where you have a TODO BTW). I have forced in source to Enough for today for me. Would be good that you install the interim on fresh env and confirms it is workin on your side with YARN cluster. |
Ah - yeah, I was experimenting with how to access config entries and I also run on the YARN master - so localhost works. I'm sorry for the hassle here, but it may be the case that this is just not ready for others to play with. There's just WAY TOO MUCH to look into at this point and I just don't have the bandwidth to perform a lot of support right now. I really appreciate your comments and will likely move to a single repo, but, as you noted, that's going to take time and I feel like I need to understand all of the possible show stoppers first. |
Add to pin tornado to 5.1.1 like you said + force yarn_endpoint to http://localhost:8088/ws/v1/cluster in the code |
FWIW, I updated the yarn provider to include updates for the yarn-api-client changes that occurred a few weeks ago. Apparently, the copy of yarn.py used in the kernel providers was fairly stale. I've updated to resemble the master branch from EG. See gateway-experiments/yarn_kernel_provider#4 Interim-dev assets have been updated (yarn wheel only) |
I'm closing this since the INTERIM-DEV is serving this purpose and we can ensure the proper documentation is in place prior to its removal (or sooner at this point). |
Hi, Great to see this!
How can I test it with the notebook? Should I first apply jupyter/notebook#4170 (which btw has merge issues)?
PS: Any plan to create the
yarn_kernel_provider
implementation?The text was updated successfully, but these errors were encountered: