Skip to content

Allow extraFiles to be injected to hub / singleuser pods and automatically load config in /usr/local/etc/jupyterhub_config.d #2006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Feb 1, 2021

Conversation

consideRatio
Copy link
Member

@consideRatio consideRatio commented Jan 22, 2021

Summary

Motivation

Injecting files to the hub / singleuser pods have a wide range of applications:

Usage documentation

See the documentation preview, or if its already merged the latest documentation.

@consideRatio consideRatio marked this pull request as draft January 22, 2021 00:20
@consideRatio consideRatio force-pushed the pr/file-injection branch 2 times, most recently from 05e1c75 to dfda9b8 Compare January 22, 2021 17:38
@consideRatio consideRatio marked this pull request as ready for review January 22, 2021 17:44
@consideRatio
Copy link
Member Author

consideRatio commented Jan 22, 2021

Feature planning

The general idea is to add a volume and a volumeMount to the pod in question, but there are some deliberations along the way.

  • Use subPath?
    Using subPath allow you to mount a specific key from a k8s configmap/secret into a specific file, rather than mounting all keys of a configmap/secret into a directory. By using subPath, the mount will no longer update on changes to the configmap/secret, but by not using subPath you must will erase all other content in the mount directory.
    • Decision
      Rely entirely on usage of subPath, supporting dynamic updates for files injected into a folder is just a bit too much overhead and requires initContainers etc.
  • Use ConfigMaps or Secrets?
    • ConfigMaps are easier to inspect by being written in plain text, while Secrets are base64 encoded.
    • Secret's base64 encoding will make content take 4/3 of the space.
    • Secret's base64 encoding will allow for binary file mounts such as images for hub templates.
    • Secret's is a safer bet from a permission standpoint.
    • Decision
      Rely entirely on one or more k8s Secrets, don't complicate it by sometimes using a k8s ConfigMap.
  • How many Secrets should be used to pass the files?
    • One single, one per file, or one default + a dynamic amount of additional ones? The benefit of having multiple is that more data can be fit into them. Using a single file, we are limited to about ~3 MB of data due to k8s / etcd constraints.
    • Decision
      At this point, assume a single which won't shut the door from allowing a dynamic amount of additional ones later.
  • What k8s Secret resources for hub/singleuser should be used?
    • Option A: a new dedicated k8s Secret containing both hub/singleuser extraFiles?
    • Option B: two new dedicated k8s Secrets containing hub and singleuser extraFiles repsectively?
    • Option C: re-use the hub k8s Secret to contain both hub/singleuser extraFiles?
    • Option D: re-use the hub k8s Secret and create a new dedicated k8s Secret for singleuser.
    • Decision
      I'm going towards Option D for now.
  • Configuration option name?
    • Is extraFiles good? I like that it includes files in the name, but perhaps being more explicit and saying extraFileMounts is worth considering, hinting it will influence volumeMounts (and volumes) of a pod spec also.
    • Decision
      hub.extraFiles / singleuser.extraFiles it is!
    • Is mountPath / data / stringData / binaryData good?
    • Decision
      I think so, let's go with it for now.
  • Implicit and/or explicit data format?
    • When the user passes data as a data structure, we will in the end need to render it to a string. Helm provides us with toYaml, toToml, and toJson to do that. We could decide what data structure to render to using the filename extension, but perhaps there is an edge case where we need to support an explicit data format? If we need an explicit format, we could name it dataFormat where yaml, json, toml would be the valid options.
    • Decision
      For now let's only support the implicit format and let Helm template rendering fail if a filename is passed without a supported filename extension when data is used.
  • Expose mode?
    When declaring a SecretVolumeSource as a Volume you can set defaultMode such as 0644, and you can set an explicit map of files and mode for them. It can be critical for some files, such as SSH keys.
    • Decision
      I'll try implement mode on a file to file basis and excluding defaultMode which I think will be fine as it is, and would cause complications with the decision to re-use the hub k8s Secret.
  • Explicit name notation?
    As extraFiles declare files in many different folders, the having the name as a key could lead to name conflicts. Due to that, it would make sense to have a name field next to mountPath allowing multiple files of the same name to be mounted to different folders.

Passing files?

Its straightforward on how to pass dictionary like configuration, but, how to go about passing a large .py file or a binary image file?

A standalone text file
Just pass --set-file singleuser.extraFiles.my_file_key.stringData=./my_file.py to the helm upgrade command.

A standalone binary file
With binary files like an image, using --set-file wouldn't work as it doesn't know how to convert it to a string, which it must be. This is a challenge.

There is a PR open for --include-file etc which would make files accessible to the Helm template rendering through .Files which would save the day I think. It is being considered for Helm v3.6 at the earliest.

It would be possible for users to base64 encode their files and then use --set-file on the base64 encoded version and then passing --set-file singleuser.extraFiles.my_image.binaryData=./my_image_base64_encoded.txt.

Decision
For now, we don't create some suggested solution to mounting standalone binary files directly but instead await if for example helm/helm#8841 becomes included in a future version of Helm.

References

@consideRatio
Copy link
Member Author

consideRatio commented Jan 22, 2021

Implementation work items

  • Define a named template named jupyterhub.extraFiles.data to render a passed context of hub.extraFiles or singleuser.extraFiles binaryData items to be rendered within a k8s Secret's data field.
  • Define a named template named jupyterhub.extraFiles.stringData to render a passed context of hub.extraFiles or singleuser.extraFiles data and stringData items to be rendered within a k8s Secret's data field.
  • Conditionally create a new dedicated k8s Secret for singleuser based on singleuser.extraFiles is empty or not
  • Populate hub / singleuser k8s Secret with associated extraFiles entries
  • Update hub Deployment's volumes to include all extraFiles with their respective mode
  • Update hub Deployment's volumeMounts to include all extraFiles with their respective mountPath
  • Update c.KubeSpawner.volumes to include all extraFiles with their respective mode
  • Update c.KubeSpawner.volume_mounts to include all extraFiles with their respective mountPath
  • Define a set of extraFiles for lint-and-validate-config.yaml and dev-config.yaml
    • Use duplicated entries in lint-and-validate-config.yaml to verify template rendering can handle it
    • Prepare for testing of modes, data, stringData, binaryData, mountPath
  • Support explicit name notation to support having multiple files with the same filename in different folders.
  • Let a CI test verify files are made available for the hub pod and have the right mode
  • Let a CI test verify files are made available for the singleuser pod and have the right mode
  • Update schema.yaml to describe this configuration option
  • Add basic validation to enforce not multiple data fields are provided
  • Document how to pass external files
  • Let JupyterHub automatically load files in a certain folder
  • Test /etc/jupyterhub.d config loading
  • Tolerate line wraps in the base64 encoding
  • Load configuration files in alphabetical order.
  • Rename mountPath to mountDir
  • Remove name and mountDir as well as logic to let the ...extraFiles.my_file_key be an implicit file name. mountPath was added instead as a required field.

@consideRatio consideRatio force-pushed the pr/file-injection branch 2 times, most recently from 61c8c01 to 1233ff3 Compare January 22, 2021 21:09
@consideRatio consideRatio changed the title Allow extraFiles to be injected to hub / singleuser pods Allow extraFiles to be injected to hub / singleuser pods and automatically load config in /etc/jupyterhub.d Jan 22, 2021
@consideRatio
Copy link
Member Author

consideRatio commented Jan 25, 2021

Review points

  • Does /etc/jupyterhub.d/ make sense for us to put extra configs in that jupyterhub automatically loads?
    Answer: No, but /usr/local/etc/jupyterhub/jupyterhub_config.d make sense!
  • The default location for templates and static files are under /usr/local/share/jupyterhub/. Perhaps we should provide an example where we override a template to reference an image we mount in static using extraFiles?
    image

@choldgraf
Copy link
Member

I think that the documentation of this field looks good to me, though I am not 100% sure why this would be useful I'm assuming it'd be obvious to someone trying to deploy their JupyterHub. Is the main reason for this to add extra configuration within the hub that one might want for other purposes? If there is a quick one-liner to explain when it is useful, perhaps that could be added to the top of the section.

@yuvipanda
Copy link
Collaborator

yuvipanda commented Jan 27, 2021

Does /etc/jupyterhub.d/ make sense for us to put extra configs in that jupyterhub automatically loads?

This should be next to wherever jupyterhub_config.py is, and should be called jupyterhub_config.d. This matches what most debian packages do (I think), and is also what we do in TLJH.

Alternatively, we could move jupyterhub_config.py to /usr/local/etc, but IMO it's not worth that. Also note that this should be in /usr/local/etc and not /etc, since /etc should be reserved for system (apt) packages. The Linux FHS spec is helpful here.

An example on overriding templates would be wonderful!

@consideRatio
Copy link
Member Author

@yuvipanda thank you for your input!! We currently mount it in /etc/jupyterhub following #1407.

            - mountPath: /etc/jupyterhub/jupyterhub_config.py
              subPath: jupyterhub_config.py
              name: config

If done from scratch, you would mount files something like...

/usr/local/etc/jupyterhub/jupyterhub_config.py
/usr/local/etc/jupyterhub/jupyterhub_config.d/my_extra_config.py

Is that correct?


I think this is won't be breaking actually. Only those that use a custom command to start jupyterhub and have hardcoded the config location would need to make a change afaik. I suggest we update to the sensible structure. What do you think?

@yuvipanda
Copy link
Collaborator

I suggest we update to the sensible structure. What do you think?

I agree! Let's do it :)

@minrk
Copy link
Member

minrk commented Jan 27, 2021

though I am not 100% sure why this would be useful I'm assuming it'd be obvious to someone trying to deploy their JupyterHub

The key point is mentioned in this bit in the docs:

To avoid embedding entire files in the Helm chart configuration

It eliminates the need to do things like inline python/js/html/etc. in yaml config files like we do in binderhub, which loses syntax highlighting, editor support, linters, etc. Instead, you can write regular files and get all your familiar tooling, and then use this config to mount those files into the container. A big help when you have nontrivial things in there!

@consideRatio
Copy link
Member Author

consideRatio commented Jan 27, 2021

Woops I got a bit confused between PRs and pushed things regarding #1993. This PR is ready for review / merge again.

@consideRatio consideRatio force-pushed the pr/file-injection branch 2 times, most recently from e9baa27 to f1ca325 Compare January 27, 2021 13:31
@consideRatio consideRatio changed the title Allow extraFiles to be injected to hub / singleuser pods and automatically load config in /etc/jupyterhub.d Allow extraFiles to be injected to hub / singleuser pods and automatically load config in /usr/local/etc/jupyterhub_config.d Jan 27, 2021
@minrk
Copy link
Member

minrk commented Feb 1, 2021

This is great! I'm not sure the FHS doc supports switching from /etc/jupyterhub to /usr/local/etc/jupyterhub in a container/k8s/helm context (are we the "system administrators" in question, or the host system providers? I'd argue we are closer to the host system providers, based on the discussion of which files change during software updates).

This is the relevant FHS statement on /usr/local:

The /usr/local hierarchy is for use by the system administrator when installing software locally. It needs to be safe from being overwritten when the system software is updated.

Translating that to our context, I think that means we should be guaranteeing to our users that we never modify files in /usr/local across helm upgrades. the 'system administrator' vs 'host system provider' is a fuzzy distinction in container image providers.

Since the work is already done, I don't object to it, but based on my reading of FHS, I think it was probably more appropriate and less surprising to use /etc/ than /usr/local/etc, though /etc/opt would be the most appropriate for us as an 'add-on' (with jupyterhub in /opt) per 3.7.4 / 3.13. This is getting a little too close to bikeshedding, so I'm happy to merge as-is as this point.

Great work @consideRatio!

@minrk minrk merged commit c841983 into jupyterhub:master Feb 1, 2021
consideRatio pushed a commit to jupyterhub/helm-chart that referenced this pull request Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable injection of files to the singleuser pods and the hub pod
5 participants