“referece_tutorial_links_added”

selamw1 · selamw1 · commit 68eab1fbc244 · 2024-12-03T16:05:09.000-08:00
diff --git a/docs/conf.py b/docs/conf.py
@@ -30,16 +30,20 @@
 html_theme = 'sphinx_book_theme'
 html_title = 'JAX AI Stack'
 html_static_path = ['_static']
+html_css_files = ['css/custom.css']
+html_logo = '_static/ai-stack-logo.svg'
+html_favicon = '_static/favicon.png'
 
 # Theme-specific options
 # https://sphinx-book-theme.readthedocs.io/en/stable/reference.html
 html_theme_options = {
     'show_navbar_depth': 2,
     'show_toc_level': 2,
     'repository_url': 'https://github.com/jax-ml/jax-ai-stack',
-    'path_to_docs': 'docs/',
+    'path_to_docs': 'docs/source/',
     'use_repository_button': True,
     'navigation_with_keys': True,
+    'home_page_in_toc': True,
 }
 
 exclude_patterns = [
@@ -67,6 +71,7 @@
 
 suppress_warnings = [
     'misc.highlighting_failure',  # Suppress warning in exception in digits_vae
+    'mystnb.unknown_mime_type', # Suppress warning for unknown mime type (e.g. colab-display-data+json)
 ]
 
 # -- Options for myst ----------------------------------------------
diff --git a/docs/data_loaders_for_multi_device_setups_with_jax.ipynb b/docs/data_loaders_for_multi_device_setups_with_jax.ipynb
@@ -23,9 +23,13 @@
     "*   [**Grain**](https://github.com/google/grain)\n",
     "*   [**Hugging Face**](https://huggingface.co/docs/datasets/en/use_with_jax#data-loading)\n",
     "\n",
-    "You'll see how to use each of these libraries to efficiently load data for a simple image classification task using the MNIST dataset.\n",
+    "You'll learn how to use each of these libraries to efficiently load data for an image classification task using the MNIST dataset.\n",
     "\n",
-    "Building on the [Data Loaders on GPU](https://jax-ai-stack.readthedocs.io/en/latest/data_loaders_on_gpu_with_jax.html) tutorial, this guide introduces optimizations for distributed training across multiple GPUs or TPUs. It focuses on data sharding with `Mesh` and `NamedSharding` to efficiently partition and synchronize data across devices. By leveraging multi-device setups, you'll maximize resource utilization for large datasets in distributed environments."
+    "Building on the [Data Loaders on GPU](https://jax-ai-stack.readthedocs.io/en/latest/data_loaders_on_gpu_with_jax.html) tutorial,  this guide covers advanced strategies for multi-device setups, such as data sharding with `Mesh` and `NamedSharding` to partition and synchronize data across devices. These techniques allow you to partition and synchronize data across multiple devices, balancing the complexities of distributed systems while optimizing resource usage for large-scale datasets.\n",
+    "\n",
+    "If you're looking for CPU-specific data loading advice, see [Data Loaders on CPU](https://jax-ai-stack.readthedocs.io/en/latest/data_loaders_on_cpu_with_jax.html).\n",
+    "\n",
+    "If you're looking for GPU-specific data loading advice, see [Data Loaders on GPU](https://jax-ai-stack.readthedocs.io/en/latest/data_loaders_on_gpu_with_jax.html)."
    ]
   },
   {
@@ -57,7 +61,7 @@
     "id": "TsFdlkSZKp9S"
    },
    "source": [
-    "### Checking TPU Availability for JAX"
+    "## Checking TPU Availability for JAX"
    ]
   },
   {
@@ -99,7 +103,7 @@
     "id": "qyJ_WTghDnIc"
    },
    "source": [
-    "### Setting Hyperparameters and Initializing Parameters\n",
+    "## Setting Hyperparameters and Initializing Parameters\n",
     "\n",
     "You'll define hyperparameters for your model and data loading, including layer sizes, learning rate, batch size, and the data directory. You'll also initialize the weights and biases for a fully-connected neural network."
    ]
@@ -141,7 +145,7 @@
     "id": "rHLdqeI7D2WZ"
    },
    "source": [
-    "### Model Prediction with Auto-Batching\n",
+    "## Model Prediction with Auto-Batching\n",
     "\n",
     "In this section, you'll define the `predict` function for your neural network. This function computes the output of the network for a single input image.\n",
     "\n",
@@ -182,7 +186,7 @@
     "id": "AMWmxjVEpH2D"
    },
    "source": [
-    "Multi-device setup using a Mesh of devices"
+    "## Multi-device setup using a Mesh of devices"
    ]
   },
   {
@@ -210,7 +214,7 @@
     "id": "rLqfeORsERek"
    },
    "source": [
-    "### Utility and Loss Functions\n",
+    "## Utility and Loss Functions\n",
     "\n",
     "You'll now define utility functions for:\n",
     "- One-hot encoding: Converts class indices to binary vectors.\n",
@@ -1676,9 +1680,9 @@
    "source": [
     "## Summary\n",
     "\n",
-    "This notebook has introduced efficient methods for multi-device distributed data loading on TPUs with JAX. You explored how to leverage popular libraries like PyTorch DataLoader, TensorFlow Datasets, Grain, and Hugging Face Datasets to streamline the data loading process for machine learning tasks. Each library offers distinct advantages, allowing you to select the best approach for your specific project needs.\n",
+    "This notebook introduced efficient methods for multi-device distributed data loading on TPUs with JAX. You explored how to leverage popular libraries like PyTorch DataLoader, TensorFlow Datasets, Grain, and Hugging Face Datasets to optimize the data loading process for machine learning tasks. Each library offers unique advantages, enabling you to choose the best approach based on your project’s requirements.\n",
     "\n",
-    "For more detailed strategies on distributed data loading with JAX, including global data pipelines and per-device processing, refer to the [Distributed Data Loading Guide](https://jax.readthedocs.io/en/latest/distributed_data_loading.html)."
+    "For more in-depth strategies on distributed data loading with JAX, including global data pipelines and per-device processing, refer to the [Distributed Data Loading Guide](https://jax.readthedocs.io/en/latest/distributed_data_loading.html)."
    ]
   }
  ],
diff --git a/docs/data_loaders_for_multi_device_setups_with_jax.md b/docs/data_loaders_for_multi_device_setups_with_jax.md
@@ -25,9 +25,13 @@ This tutorial explores various data loading strategies for **JAX** in **multi-de
 *   [**Grain**](https://github.com/google/grain)
 *   [**Hugging Face**](https://huggingface.co/docs/datasets/en/use_with_jax#data-loading)
 
-You'll see how to use each of these libraries to efficiently load data for a simple image classification task using the MNIST dataset.
+You'll learn how to use each of these libraries to efficiently load data for an image classification task using the MNIST dataset.
 
-Building on the [Data Loaders on GPU](https://jax-ai-stack.readthedocs.io/en/latest/data_loaders_on_gpu_with_jax.html) tutorial, this guide introduces optimizations for distributed training across multiple GPUs or TPUs. It focuses on data sharding with `Mesh` and `NamedSharding` to efficiently partition and synchronize data across devices. By leveraging multi-device setups, you'll maximize resource utilization for large datasets in distributed environments.
+Building on the [Data Loaders on GPU](https://jax-ai-stack.readthedocs.io/en/latest/data_loaders_on_gpu_with_jax.html) tutorial,  this guide covers advanced strategies for multi-device setups, such as data sharding with `Mesh` and `NamedSharding` to partition and synchronize data across devices. These techniques allow you to partition and synchronize data across multiple devices, balancing the complexities of distributed systems while optimizing resource usage for large-scale datasets.
+
+If you're looking for CPU-specific data loading advice, see [Data Loaders on CPU](https://jax-ai-stack.readthedocs.io/en/latest/data_loaders_on_cpu_with_jax.html).
+
+If you're looking for GPU-specific data loading advice, see [Data Loaders on GPU](https://jax-ai-stack.readthedocs.io/en/latest/data_loaders_on_gpu_with_jax.html).
 
 +++ {"id": "-rsMgVtO6asW"}
 
@@ -44,7 +48,7 @@ from jax.sharding import Mesh, PartitionSpec, NamedSharding
 
 +++ {"id": "TsFdlkSZKp9S"}
 
-### Checking TPU Availability for JAX
+## Checking TPU Availability for JAX
 
 ```{code-cell}
 ---
@@ -58,7 +62,7 @@ jax.devices()
 
 +++ {"id": "qyJ_WTghDnIc"}
 
-### Setting Hyperparameters and Initializing Parameters
+## Setting Hyperparameters and Initializing Parameters
 
 You'll define hyperparameters for your model and data loading, including layer sizes, learning rate, batch size, and the data directory. You'll also initialize the weights and biases for a fully-connected neural network.
 
@@ -90,7 +94,7 @@ params = init_network_params(layer_sizes, random.PRNGKey(0))
 
 +++ {"id": "rHLdqeI7D2WZ"}
 
-### Model Prediction with Auto-Batching
+## Model Prediction with Auto-Batching
 
 In this section, you'll define the `predict` function for your neural network. This function computes the output of the network for a single input image.
 
@@ -121,7 +125,7 @@ batched_predict = vmap(predict, in_axes=(None, 0))
 
 +++ {"id": "AMWmxjVEpH2D"}
 
-Multi-device setup using a Mesh of devices
+## Multi-device setup using a Mesh of devices
 
 ```{code-cell}
 :id: 4Jc5YLFnpE-_
@@ -139,7 +143,7 @@ sharding_spec = PartitionSpec('device')
 
 +++ {"id": "rLqfeORsERek"}
 
-### Utility and Loss Functions
+## Utility and Loss Functions
 
 You'll now define utility functions for:
 - One-hot encoding: Converts class indices to binary vectors.
@@ -714,6 +718,6 @@ train_model(num_epochs, params, hf_training_generator)
 
 ## Summary
 
-This notebook has introduced efficient methods for multi-device distributed data loading on TPUs with JAX. You explored how to leverage popular libraries like PyTorch DataLoader, TensorFlow Datasets, Grain, and Hugging Face Datasets to streamline the data loading process for machine learning tasks. Each library offers distinct advantages, allowing you to select the best approach for your specific project needs.
+This notebook introduced efficient methods for multi-device distributed data loading on TPUs with JAX. You explored how to leverage popular libraries like PyTorch DataLoader, TensorFlow Datasets, Grain, and Hugging Face Datasets to optimize the data loading process for machine learning tasks. Each library offers unique advantages, enabling you to choose the best approach based on your project’s requirements.
 
-For more detailed strategies on distributed data loading with JAX, including global data pipelines and per-device processing, refer to the [Distributed Data Loading Guide](https://jax.readthedocs.io/en/latest/distributed_data_loading.html).
+For more in-depth strategies on distributed data loading with JAX, including global data pipelines and per-device processing, refer to the [Distributed Data Loading Guide](https://jax.readthedocs.io/en/latest/distributed_data_loading.html).
diff --git a/docs/tutorials.md b/docs/tutorials.md
@@ -1,6 +1,8 @@
 # Tutorials
 
-*Note: this is a work in progress; visit again soon for updated content!*
+```{note}
+This is a work in progress; visit again soon for updated content!
+```
 
 The following tutorials are meant as an intro to the full stack: