CS663 | computer vision

Training Options - Local versus Google Cloud

TIP: as long as you periodically save checkpoints, you can restart a stopped training job and it will pick up from the latest checkpoint.

1) Train Locally w/CPU: on your machine with CPU ---will take long time (up to days) except for small number of epochs and little data

2) Train Locally w/GPU if your machine supports this (if you have specific NVidia GPU chipsets) ---this will run much much more quickly. You must follow Google online instructions to setup Tensorflow w/GPU (not hard but, a little time, only do once). Rather than days can take hours.

One thing that can occur as you use your GPU with its memory is that compared to CPU you may more quickly run into OOM (out of memory) errors where you do not have enough memory for training.

3) Free Google Colab: you can use the "free" Google Colab environment but, you will get one GPU with limited memory and time allocated to run your Colab notebook (like a jupyter notebook).

4) Use your Google Free Cloud Credits : train using a google cloud VM using a Jupyter Notebook ---making sure you launch a VM that is setup for machine Learning . Process will be launching (setup or setting up) a Machine Learning based VM (has tensorflow etc) and then connecting to it and launching yoru Jupyter Notebook (or

official google site containing VM's for Machine Learning https://cloud.google.com/deep-learning-vm ) these are Preconfigured VMs for deep learning applications.

AND see https://medium.com/google-cloud/how-to-run-deep-learning-models-on-google-cloud-platform-in-6-steps-4950a57acfa5 How to run Deep learning models on Google Cloud Platform in 6 steps?

OTHER options NOT really recommended.

5)Purchase Google Colab Pro Subscription: If you want to purchase a $10/month account on Colab Pro you will will get double memory and resources over the free Google Colab solution (may or may not be enough?

6) possibly run your Google Colab on a Hosted Machine Learning setup Google VM (communicating through an SSH --so you run the colab "locally but channel through an ssh that talks to the VM) ---really this is kind of a stupid solution as you can more simply do solution#4. It is complicated to talk through ssh just so you can use a colab.