@@ -127,17 +127,15 @@ SELECT department, culture, link_resource
127
127
LIMIT 200
128
128
```
129
129
130
- You can enter these strings on the Google BigQuery console to see the data.
131
- The journey also provides convenient script to query the attributes.
132
- First clone the journey git repository:
130
+ You can enter these strings on the Google BigQuery console to see the data. The journey also provides convenient script
131
+ to query the attributes. First clone the journey git repository:
133
132
134
133
```
135
134
cd ~
136
135
git clone https://github.com/IBM/tensorflow-kubernetes-art-classification.git
137
136
```
138
137
139
- The script to query Google BigQuery is bigquery.py.
140
- Edit the script to put the appropriate SQL string and run the script:
138
+ The script to query Google BigQuery is bigquery.py. Edit the script to put the appropriate SQL string and run the script:
141
139
142
140
```
143
141
cd tensorflow-kubernetes-art-classification
@@ -262,12 +260,12 @@ within a reasonable amount of time. In practice, you would use a larger dataset
262
260
such as multiple CPU cores and GPU. Depending on the amount of computation resources, the training can run for days
263
261
or over a week.
264
262
265
- Next follow this [ instructions] ( https://console.bluemix.net/docs/containers/cs_cluster.html#bx_registry_other ) to
263
+ Next follow these [ instructions] ( https://console.bluemix.net/docs/containers/cs_cluster.html#bx_registry_other ) to
266
264
1 . create a namespace in Bluemix Container Registry and upload the image to this namespace
267
265
2. create a non-expiring registry token
268
266
3. create a Kubernetes secret to store the Bluemix token information
269
267
270
- Update met-art .yaml file with your images name and secret name
268
+ Update train-model .yaml file with your images name and secret name
271
269
272
270
```
273
271
apiVersion: v1
@@ -299,39 +297,98 @@ spec:
299
297
persistentVolumeClaim:
300
298
claimName: met-art-logs
301
299
imagePullSecrets:
302
- - name: bluemix-token
300
+ - name: bluemix-secret
303
301
restartPolicy: Never
304
302
```
305
303
306
304
```
307
305
# For Mac OS
308
- sed -i '.original' 's/registry.ng.bluemix.net\/tf_ns\/met-art:v1/registry.<region>.bluemix.net\/<my_namespace>\/<my_image>:<tag>/' met-art .yaml
309
- sed -i '.original' 's/bluemix-token /<my_token>/' met-art .yaml
306
+ sed -i '.original' 's/registry.ng.bluemix.net\/tf_ns\/met-art:v1/registry.<region>.bluemix.net\/<my_namespace>\/<my_image>:<tag>/' train-model .yaml
307
+ sed -i '.original' 's/bluemix-secret /<my_token>/' train-model .yaml
310
308
# For all other Linux platforms
311
- sed -i 's/registry.ng.bluemix.net\/tf_ns\/met-art:v1/registry.<region>.bluemix.net\/<my_namespace>\/<my_image>:<tag>/' met-art .yaml
312
- sed -i 's/bluemix-token /<my_token>/' met-art .yaml
309
+ sed -i 's/registry.ng.bluemix.net\/tf_ns\/met-art:v1/registry.<region>.bluemix.net\/<my_namespace>\/<my_image>:<tag>/' train-model .yaml
310
+ sed -i 's/bluemix-secret /<my_token>/' train-model .yaml
313
311
```
314
312
315
313
Deploy the pod with the following command:
316
314
317
315
```
318
- kubectl create -f met-art.yaml
316
+ kubectl create -f train-model.yaml
317
+ ```
318
+
319
+ Check the training status with the following command:
320
+
321
+ ```
322
+ kubectl logs train-met-art-model
319
323
```
320
324
321
325
Along with the pod, a local volume will be created and mounted to the pod to hold the output of the training.
322
326
This includes the checkpoints, which are used for resuming after a crash and saving a trained model,
323
327
and the event file, which is used for visualization. Further, the restart policy for the pod is set to "Never",
324
328
because once the training complete there is no need to restart the pod again.
325
329
330
+ ### 7. Evaluate model performance
331
+
332
+ Evaluate the model from the last checkpoint in the training step above
333
+
334
+ ```
335
+ apiVersion: v1
336
+ kind: Pod
337
+ metadata:
338
+ name: eval-met-art-model
339
+ spec:
340
+ containers:
341
+ - name: tensorflow
342
+ image: registry.ng.bluemix.net/tf_ns/met-art:v1
343
+ volumeMounts:
344
+ - name: model-logs
345
+ mountPath: /logs
346
+ ports:
347
+ - containerPort: 5000
348
+ command:
349
+ - "/usr/bin/python"
350
+ - "/model/eval_image_classifier.py"
351
+ args:
352
+ - "--alsologtostderr"
353
+ - "--checkpoint_path=/logs/model.ckpt-100"
354
+ - "--eval_dir=/logs"
355
+ - "--dataset_dir=/data"
356
+ - "--dataset_name=arts"
357
+ - "--dataset_split_name=validation"
358
+ - "--model_name=inception_v3"
359
+ - "--clone_on_cpu=True"
360
+ - "--batch_size=10"
361
+ volumes:
362
+ - name: model-logs
363
+ persistentVolumeClaim:
364
+ claimName: met-art-logs
365
+ imagePullSecrets:
366
+ - name: bluemix-secret
367
+ restartPolicy: Never
368
+ ```
369
+ Update eval-model.yaml file with your images name and secret name just like in step 6
370
+
371
+ Deploy the pod with the following command:
372
+
373
+ ```
374
+ kubectl create -f eval-model.yaml
375
+ ```
376
+
377
+ Check the evaluation status with the following command:
378
+
379
+ ```
380
+ kubectl logs eval-met-art-model
381
+ ```
382
+
326
383
327
- ### 7 . Save trained model
384
+ ### 8 . Save trained model
328
385
329
386
Copy the files from the Kubernetes local volume.
330
387
331
388
The trained model is the last checkpoint file.
332
389
333
390
334
- ### 8 . Visualize
391
+ ### 9 . Visualize
335
392
336
393
The event file copied from the Kubernetes local volume contains the log data for TensorBoard.
337
394
Start the TensorBoard and point to the local directory with the event file:
@@ -342,7 +399,7 @@ tensorboard --logdir=<path_to_dir>
342
399
343
400
Then open your browser with the link displayed from the command.
344
401
345
- ### 9 . Run inference
402
+ ### 10 . Run inference
346
403
347
404
Now that you have trained a model to classify art image by culture, you can provide
348
405
a new art image to see how it will be classified by the model.
0 commit comments