From 7e2e6cacd4aff54f9f37376cd82919d714872830 Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Mon, 15 Mar 2021 21:43:55 -0700
Subject: [PATCH 01/13] Fix broken external link to mybinder.

---
 content/pairing.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/pairing.md b/content/pairing.md
index 7e45a031..2d226de9 100644
--- a/content/pairing.md
+++ b/content/pairing.md
@@ -59,7 +59,7 @@ output.
 > supports a variety of restructured text directives.  These Sphinx
 > markdown directives will render when NumPy tutorials are built into a
 > static website, but they will show up as raw code when you open in
-> Jupyter locally or on [Binder](mybinder.org). 
+> Jupyter locally or on [Binder](https://mybinder.org).
 
 Consider these two versions of the same __Simple notebook example__. You
 have three things in the notebooks:

From 876c5fc0e968012bea3eab79f2d3435eb07f2d62 Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Mon, 15 Mar 2021 21:51:12 -0700
Subject: [PATCH 02/13] Add pairing tutorial to toctree in site.

Fixes sphinx warning about orphaned doc.

Replicate text from readme in index.
---
 site/index.md | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/site/index.md b/site/index.md
index 55d975f0..d425f332 100644
--- a/site/index.md
+++ b/site/index.md
@@ -62,6 +62,20 @@ used in the main NumPy documentation has two reasons:
 
 [rst]: https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html
 
+#### Note
+
+You may notice our content is in markdown format (`.md` files). We review and
+host notebooks in the [MyST-NB](https://myst-nb.readthedocs.io/) format. We
+accept both Jupyter notebooks (`.ipynb`) and MyST-NB notebooks (`.md`).
+If you want to sync your `.ipynb` to your `.md` file follow the [pairing
+tutorial](content/pairing.md).
+
+```{toctree}
+:hidden:
+
+content/pairing
+```
+
 ### Adding your own tutorials
 
 If you have your own tutorial in the form of a Jupyter notebook (an `.ipynb`

From 71d0f2643beeb906f19f168ee0b546f6d69b3260 Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Mon, 15 Mar 2021 22:16:15 -0700
Subject: [PATCH 03/13] Add requirements for rl tutorial.

---
 requirements.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/requirements.txt b/requirements.txt
index 8b9bde6a..8b830222 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -7,3 +7,4 @@ pytest
 nbval
 statsmodels
 imageio
+gym[atari]

From 0404f51d98529cd766a4badc385583fc23313817 Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Mon, 15 Mar 2021 22:21:00 -0700
Subject: [PATCH 04/13] Bump execution timeout for CI.

---
 site/conf.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/conf.py b/site/conf.py
index b8ed7fa3..2871ed0b 100644
--- a/site/conf.py
+++ b/site/conf.py
@@ -37,7 +37,7 @@
 exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', 'notebooks']
 
 # MyST-NB configuration
-execution_timeout = 600
+execution_timeout = 900
 
 
 # -- Options for HTML output -------------------------------------------------

From b7e8c3cb74bad409e66c72efc67a7f62bced62f3 Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Mon, 15 Mar 2021 22:46:29 -0700
Subject: [PATCH 05/13] Add build dependencies for atari-py.

---
 .circleci/config.yml | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/.circleci/config.yml b/.circleci/config.yml
index 8fa08fc3..9610f866 100644
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -11,6 +11,10 @@ jobs:
     steps:
       - checkout
 
+      - run:
+          name: Install deps for building atari-py
+          command: sudo apt-get install -y cmake
+
       - run:
           name: Install Python dependencies
           command: |

From b89cafa48ca9c318ab5e62209724718bdb23a0de Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Tue, 16 Mar 2021 10:27:10 -0700
Subject: [PATCH 06/13] Bump circleci context timeout limit.

---
 .circleci/config.yml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.circleci/config.yml b/.circleci/config.yml
index 9610f866..f6cd81f3 100644
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -29,6 +29,7 @@ jobs:
 
       - run:
           name: Build site
+          no_output_timeout: 30m
           command: |
             source venv/bin/activate
             # n = nitpicky (broken links), W = warnings as errors,

From 7f5f31ac030e332510cca06e3d0f95ea1e65ba92 Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Tue, 16 Mar 2021 10:41:01 -0700
Subject: [PATCH 07/13] Add ffmpeg dep for atari-py.

---
 .circleci/config.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.circleci/config.yml b/.circleci/config.yml
index f6cd81f3..88ecbb4e 100644
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -13,7 +13,7 @@ jobs:
 
       - run:
           name: Install deps for building atari-py
-          command: sudo apt-get install -y cmake
+          command: sudo apt-get install -y cmake ffmpeg
 
       - run:
           name: Install Python dependencies

From b3cb85e2654054ba1d0776a8d78a4616e06b5451 Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Tue, 16 Mar 2021 12:04:24 -0700
Subject: [PATCH 08/13] PERF: subsample test image set in mnist tutorial.

---
 content/tutorial-deep-learning-on-mnist.md | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/content/tutorial-deep-learning-on-mnist.md b/content/tutorial-deep-learning-on-mnist.md
index d7070e44..5f7826d1 100644
--- a/content/tutorial-deep-learning-on-mnist.md
+++ b/content/tutorial-deep-learning-on-mnist.md
@@ -201,12 +201,18 @@ print('The data type of training images: {}'.format(x_train.dtype))
 print('The data type of test images: {}'.format(x_test.dtype))
 ```
 
-**2.** Normalize the arrays by dividing them by 255 (and thus promoting the data type from `uint8` to `float64`) and then assign the train and test image data variables — `x_train` and `x_test` — to `training_images` and `train_labels`, respectively. To make the neural network model train faster in this example, `training_images` contains only 1,000 samples out of 60,000. To learn from the entire sample size, change the `sample` variable to `60000`.
+**2.** Normalize the arrays by dividing them by 255 (and thus promoting the data type from `uint8` to `float64`) and then assign the train and test image data variables — `x_train` and `x_test` — to `training_images` and `train_labels`, respectively.
+To reduce the model training and evaluation time in this example, only a subset
+of the training and test images will be used.
+Both `training_images` and `test_images` will contain only 1,000 samples each out
+of the complete datasets of 60,000 and 10,000 images, respectively.
+These values can be controlled by changing the  `training_sample` and
+`test_sample` below, up to their maximum values of 60,000 and 10,000.
 
 ```{code-cell} ipython3
-sample = 1000
-training_images = x_train[0:sample] / 255
-test_images = x_test / 255
+training_sample, test_sample = 1000, 1000
+training_images = x_train[0:training_sample] / 255
+test_images = x_test[0:test_sample] / 255
 ```
 
 **3.** Confirm that the image data has changed to the floating-point format:

From 3695ab69a59fa6d3c709922942ea7b4df34edeb5 Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Thu, 11 Mar 2021 21:20:23 -0800
Subject: [PATCH 09/13] Vectorize model evaluation in mnist tutorial.

---
 content/tutorial-deep-learning-on-mnist.md | 45 +++++++++++++---------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/content/tutorial-deep-learning-on-mnist.md b/content/tutorial-deep-learning-on-mnist.md
index 5f7826d1..76c0c502 100644
--- a/content/tutorial-deep-learning-on-mnist.md
+++ b/content/tutorial-deep-learning-on-mnist.md
@@ -411,6 +411,8 @@ weights_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
 ```
 
 **5.** Set up the neural network's learning experiment with a training loop and start the training process.
+Note that the model is evaluated at each epoch by running the model on test
+set, thus the model improvement can be tracked vs. epoch.
 
 Start the training process:
 
@@ -425,6 +427,11 @@ store_test_accurate_pred = []
 # This is a training loop.
 # Run the learning experiment for a defined number of epochs (iterations).
 for j in range(epochs):
+
+    #################
+    # Training step #
+    #################
+
     # Set the initial loss/error and the number of accurate predictions to zero.
     training_loss = 0.0
     training_accurate_predictions = 0
@@ -473,26 +480,26 @@ for j in range(epochs):
     store_training_loss.append(training_loss)
     store_training_accurate_pred.append(training_accurate_predictions)
 
-    # Evaluate on the test set:
-    # 1. Set the initial error and the number of accurate predictions to zero.
-    test_loss = 0.0
-    test_accurate_predictions = 0
-
-    # 2. Start testing the model by evaluating on the test image dataset.
-    for i in range(len(test_images)):
-        # 1. Pass the test images through the input layer.
-        layer_0 = test_images[i]
-        # 2. Compute the weighted sum of the test image inputs in and
-        #    pass the hidden layer's output through ReLU.
-        layer_1 = relu(np.dot(layer_0, weights_1))
-        # 3. Compute the weighted sum of the hidden layer's inputs.
-        #    Produce a 10-dimensional vector with 10 scores.
-        layer_2 = np.dot(layer_1, weights_2)
+    ################
+    # Testing step #
+    ################
+
+    # Evaluate model performance on the test set at each epoch.
+
+    # Unlike the training step, the weights are not modified for each image
+    # (or batch). Therefore the model can be applied to the test images in a
+    # vectorized manner, eliminating the need to loop over each image
+    # individually:
+
+    results = relu(test_images @ weights_1) @ weights_2
+
+    # Measure the error between the actual label (truth) and prediction values.
+    test_loss = np.sum((test_labels - results)**2)
 
-        # 4. Measure the error between the actual label (truth) and prediction values.
-        test_loss += np.sum((test_labels[i] - layer_2) ** 2)
-        # 5. Increment the accurate prediction count.
-        test_accurate_predictions += int(np.argmax(layer_2) == np.argmax(test_labels[i]))
+    # Measure prediction accuracy on test set
+    test_accurate_predictions = np.sum(
+        np.argmax(results, axis=1) == np.argmax(test_labels, axis=1)
+    )
 
     # Store test set losses and accurate predictions.
     store_test_loss.append(test_loss)

From e65ac70c862db4afb3620e52bce42fef0dad295c Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Thu, 11 Mar 2021 21:25:33 -0800
Subject: [PATCH 10/13] Update wording and numbering in code comments.

---
 content/tutorial-deep-learning-on-mnist.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/content/tutorial-deep-learning-on-mnist.md b/content/tutorial-deep-learning-on-mnist.md
index 76c0c502..928743c3 100644
--- a/content/tutorial-deep-learning-on-mnist.md
+++ b/content/tutorial-deep-learning-on-mnist.md
@@ -411,8 +411,8 @@ weights_2 = 0.2 * np.random.random((hidden_size, num_labels)) - 0.1
 ```
 
 **5.** Set up the neural network's learning experiment with a training loop and start the training process.
-Note that the model is evaluated at each epoch by running the model on test
-set, thus the model improvement can be tracked vs. epoch.
+Note that the model is evaluated against the test set at each epoch to track
+its performance over the training epochs.
 
 Start the training process:
 
@@ -480,9 +480,9 @@ for j in range(epochs):
     store_training_loss.append(training_loss)
     store_training_accurate_pred.append(training_accurate_predictions)
 
-    ################
-    # Testing step #
-    ################
+    ###################
+    # Evaluation step #
+    ###################
 
     # Evaluate model performance on the test set at each epoch.
 
@@ -505,7 +505,7 @@ for j in range(epochs):
     store_test_loss.append(test_loss)
     store_test_accurate_pred.append(test_accurate_predictions)
 
-    # 3. Display the error and accuracy metrics in the output.
+    # Summarize error and accuracy metrics at each epoch
     print("\n" + \
           "Epoch: " + str(j) + \
           " Training set error:" + str(training_loss/ float(len(training_images)))[0:5] +\

From 43c9e02bef881d0ee523ac23b6209a51aaf7b5e6 Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Tue, 16 Mar 2021 12:18:22 -0700
Subject: [PATCH 11/13] Only apply one-hot encoding to subsets.

---
 content/tutorial-deep-learning-on-mnist.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/content/tutorial-deep-learning-on-mnist.md b/content/tutorial-deep-learning-on-mnist.md
index 928743c3..18041dbd 100644
--- a/content/tutorial-deep-learning-on-mnist.md
+++ b/content/tutorial-deep-learning-on-mnist.md
@@ -263,8 +263,8 @@ def one_hot_encoding(labels, dimension=10):
 **3.** Encode the labels and assign the values to new variables:
 
 ```{code-cell} ipython3
-training_labels = one_hot_encoding(y_train)
-test_labels = one_hot_encoding(y_test)
+training_labels = one_hot_encoding(y_train[:training_sample])
+test_labels = one_hot_encoding(y_test[:test_sample])
 ```
 
 **4.** Check that the data type has changed to floating point:

From 69075f010a4c72a65147d09f97a56ce7173568fd Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Tue, 16 Mar 2021 13:27:41 -0700
Subject: [PATCH 12/13] Fix broken image links in RL tutorial.

---
 ...orial-deep-reinforcement-learning-with-pong-from-pixels.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md b/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
index 0fd4b02b..bdfb5acf 100644
--- a/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
+++ b/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
@@ -22,7 +22,7 @@ This tutorial demonstrates how to implement a deep reinforcement learning (RL) a
 
 Pong is a 2D game from 1972 where two players use "rackets" to play a form of table tennis. Each player moves the racket up and down the screen and tries to hit a ball in their opponent's direction by touching it. The goal is to hit the ball such that it goes past the opponent's racket (they miss their shot). According to the rules, if a player reaches 21 points, they win. In Pong, the RL agent that learns to play against an opponent is displayed on the right.
 
-<center><img src="../../../content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.png" width="800", hspace="20" vspace="20"></center>
+![pong_rl](tutorial-deep-reinforcement-learning-with-pong-from-pixels.png)
 
 This example is based on the [code](https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5) developed by [Andrej Karpathy](https://karpathy.ai) for the [Deep RL Bootcamp](https://sites.google.com/view/deep-rl-bootcamp/home) in 2017 at UC Berkeley. His [blog post](http://karpathy.github.io/2016/05/31/rl/) from 2016 also provides more background on the mechanics and theory used in Pong RL.
 
@@ -480,7 +480,7 @@ The pseudocode for the policy gradient method for Pong:
 
             - Maximize the probability of actions that lead to high rewards.
 
-<center><img src="../../../content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.png" width="800", hspace="20" vspace="20"></center>
+![pong_rl](tutorial-deep-reinforcement-learning-with-pong-from-pixels.png)
 
 You can stop the training at any time or/and check saved MP4 videos of saved plays on your disk in the `/video` directory. You can set the maximum number of episodes that is more appropriate for your setup.
 

From 14eb9689ce81fea2e39f55234e25478f6ef225e2 Mon Sep 17 00:00:00 2001
From: Ross Barnowski <rossbar@berkeley.edu>
Date: Tue, 16 Mar 2021 14:12:56 -0700
Subject: [PATCH 13/13] PERF: reduce RL episodes and batch size from 10 to 3.

---
 ...l-deep-reinforcement-learning-with-pong-from-pixels.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md b/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
index bdfb5acf..381943d7 100644
--- a/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
+++ b/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
@@ -51,7 +51,7 @@ This tutorial can also be run locally in an isolated environment, such as [Virtu
 3. Create the policy (the neural network) and the forward pass
 4. Set up the update step (backpropagation)
 5. Define the discounted rewards (expected return) function
-6. Train the agent for 100 episodes
+6. Train the agent for 3 episodes
 7. Next steps
 8. Appendix
     - Notes on RL and deep RL
@@ -486,12 +486,12 @@ You can stop the training at any time or/and check saved MP4 videos of saved pla
 
 +++ {"id": "gD6XBqUqfNOV"}
 
-1. For demo purposes, let's limit the number of episodes for training to 10. If you are using hardware acceleration (CPUs and GPUs), you can increase the number to 1,000 or beyond. For comparison, Andrej Karpathy's original experiment took about 8,000 episodes.
+1. For demo purposes, let's limit the number of episodes for training to 3. If you are using hardware acceleration (CPUs and GPUs), you can increase the number to 1,000 or beyond. For comparison, Andrej Karpathy's original experiment took about 8,000 episodes.
 
 ```{code-cell} ipython3
 :id: TdRXrc37Rfvo
 
-max_episodes = 10
+max_episodes = 3
 ```
 
 +++ {"id": "ORj7JFGB0Gy8"}
@@ -503,7 +503,7 @@ max_episodes = 10
 ```{code-cell} ipython3
 :id: eKLLYUKbG-5A
 
-batch_size = 10
+batch_size = 3
 learning_rate = 1e-4
 ```