TF2 Object Detecton API: training and evaluation using SAME script but specified via parameters
Training+Validation
Once you prepare the configuration file and your Input TFRecord files for both Training dataset and Validation dataset, you train by invoking the model_main_tf2.py script. Some parameters include
- --checkpoint_every_n saves a checkpoint every nsteps while training, and this value can be specified in the parameter
- --model_dir output directory from training, where results (checkpoints etc) go
- --pipeline_config_path very important configuration file that stipulates many parameters of training (i.e. batch size, number steps, TFRecord for training data, TFRecord for validation, ,etc)
- --num_train_steps number of training steps
- --num_eval_steps after this many steps evaluate with validation dataset. REMEMBER what is Validation: While training is running, the sytems stops periodically and uses the last model checkpoints and runs it with the validation dataset (different than training data set)
- and the results are used to evaluate the model accuracy at this checkpoint using the validation set (from the validation tfrecord file). This will help us to monitor the training progress by printing the validation mAP on the terminal, and by using a GUI monitoring package like tensorboard.
NOTE these are just SOME Of the available parameters --look at the model_main_tf2.py and underlying code as well as online TF2+ Object Detection API examples to learn about more options.
Read the configuraiton file and values related to training
#create output directory out_dir=C:\whatever\outputdirectory mkdir -p $out_dir
#call script to train
python model_main_tf2.py --alsologtostderr --model_dir=$out_dir --checkpoint_every_n=500 \ --pipeline_config_path=../models/ssd_mobilenet_v2_raccoon.config \ --eval_on_train_data 2>&1 | tee $out_dir/train.log # I am using the path format consistent with windows machines but, you would change accordingly if running #on other Operating Systems. #IMPORTANT: you can place your configuration file where you wish python C:\whereever\models\research\object_detection\model_main_tf2.py \ --pipeline_config_path=C:\whereever\models\research\deploy\pipeline_file.config \ --checkpoint_every_n=500 \ --model_dir=C:\pathToWhereSaveResults\training \ --alsologtostderr \ --num_train_steps=16000 \ --sample_1_of_n_eval_examples=1 \ --num_eval_steps=500
>>>here is an example on a Mac/Unix machine --the last line is showing dumping the printed text output to a file rather than standard output
python3 /home/whatever/TensorFlow/models/research/object_detection/model_main_tf2.py \ --pipeline_config_path="/home/whatever/TensorFlow/Deployments/mymodel/pipeline_file.config" \ --model_dir="/home/whatever/TensorFlow/SavedTraining/mymodel" \ --alsologtostderr \ --num_train_steps=40000 \ --sample_1_1of_n_eval_examples=1 \ --num_eval_steps=500 \ &> /home/whatever/TensorFlow/Logs/mymodelTrainingTextOutput.txt
Evaluation
>>>For the previous training example on a Mac/Unix machine --here is an example of running the script to perform evaluation. Notice some of the parameters are missing and some new parameters (and one with a differnt value) are specified like:
- --checkpoint_dir this is the SPECIFIC directory where the checkpoints are stored
- --model_dir the VALUE Of this must be the value of where the trained model is stored --specifically the checkpoints (so in this case it is actually the same value as chekcpoint_dir)
- --run_once for evaluation you only run once!!!
READ the configuration file and read the GREEN section for evaluation configuration values
#for a change showing you paths in Unix/Mac OS type specfication--- #directory for model_dir must be where you stored checkpoints during training python3 /home/whatever/TensorFlow/models/research/object_detection/model_main_tf2.py \ --pipeline_config_path=/home/whatever/TensorFlow/Deployments/mymodel/pipeline_file_validation.config \ --model_dir=/home/whatever/TensorFlow/FineTunedModels/mymodel/checkpoint/ \ --checkpoint_dir=/home/whatever/TensorFlow/FineTunedModels/mymodel/checkpoint/ \ --run_once=True \ --alsologtostderror