Training Options - Local versus Google Cloud
TIP: as long as you periodically save checkpoints, you can restart a stopped training job and it will pick up from the latest checkpoint.
1) Train Locally w/CPU: on your machine with CPU ---will take long time (up to days) except for small number of epochs and little data
2) Train Locally w/GPU if your machine supports this (if you have specific NVidia GPU chipsets) ---this will run much much more quickly. You must follow Google online instructions to setup Tensorflow w/GPU (not hard but, a little time, only do once). Rather than days can take hours.
-
One thing that can occur as you use your GPU with its memory is that compared to CPU you may more quickly run into OOM (out of memory) errors where you do not have enough memory for training.
3) Free Google Colab: you can use the "free" Google Colab environment but, you will get one GPU with limited memory and time allocated to run your Colab notebook (like a jupyter notebook).
4) Use your Google Free Cloud Credits : train using a google cloud VM using a Jupyter Notebook ---making sure you launch a VM that is setup for machine Learning . Process will be launching (setup or setting up) a Machine Learning based VM (has tensorflow etc) and then connecting to it and launching yoru Jupyter Notebook (or
-
official google site containing VM's for Machine Learning https://cloud.google.com/deep-learning-vm ) these are Preconfigured VMs for deep learning applications.
AND see https://medium.com/google-cloud/how-to-run-deep-learning-models-on-google-cloud-platform-in-6-steps-4950a57acfa5 How to run Deep learning models on Google Cloud Platform in 6 steps?