Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Encountered During Model Training Step (OSError: Checkpoint is Expected to be an Object-Based Checkpoint) #120

mutti2324 opened this issue Nov 3, 2024 · 1 comment


Copy link

mutti2324 commented Nov 3, 2024

Setup Option (using GPU Support)
OS Windows 10 Pro 22H2
CPU Intel(R) Core(TM) i5-7600K CPU @ 3.80GHz
Python 3.9.20
Anaconda 4.10.1
TensorFlow 2.10.1
GPU-Support yes
CUDA cuda_11.2.2_461.33_win10.exe
Model Builder Test ✅👍 PASSED - Ran 24 tests in 23.995s
Model Training Test ❌👎 FAILED

Hello @sglvladi, First of all, thank you for your tutorial! I was able to follow it step by step. However, I encountered an issue in the "Training the Model" section of the training.rst file. When I ran the command you provided:

python --model_dir=C:/Users/marco/TFOD/TensorFlow/models/my_ssd_resnet50_v1_fpn --pipeline_config_path=C:/Users/marco/TFOD/TensorFlow/models/my_ssd_resnet50_v1_fpn/pipeline.config

I received the following error message:

C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\tensorflow_addons\utils\ UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see:

C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\tensorflow_addons\utils\ UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.12.0 and strictly below 2.15.0 (nightly versions are not supported). 
 The versions of TensorFlow you are currently using is 2.10.1 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
2024-11-03 08:13:14.709629: I tensorflow/core/platform/] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-03 08:13:15.160587: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4609 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I1103 08:13:15.378028 68556] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I1103 08:13:15.381019 68556] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I1103 08:13:15.381019 68556] Maybe overwriting use_bfloat16: False
WARNING:tensorflow:From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\object_detection\ StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W1103 08:13:15.405953 68556] From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\object_detection\ StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['C:/Users/marco/TFOD/TensorFlow/workspace/training_demo/annotations/train.record']
I1103 08:13:15.412935 68556] Reading unweighted datasets: ['C:/Users/marco/TFOD/TensorFlow/workspace/training_demo/annotations/train.record']
INFO:tensorflow:Reading record datasets for input file: ['C:/Users/marco/TFOD/TensorFlow/workspace/training_demo/annotations/train.record']
I1103 08:13:15.413933 68556] Reading record datasets for input file: ['C:/Users/marco/TFOD/TensorFlow/workspace/training_demo/annotations/train.record']
INFO:tensorflow:Number of filenames to read: 1
I1103 08:13:15.413933 68556] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W1103 08:13:15.414930 68556] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\object_detection\builders\ parallel_interleave (from is deprecated and will be removed in a future version.
Instructions for updating:
Use `, cycle_length, block_length,` instead. If sloppy execution is desired, use ``.
W1103 08:13:15.420913 68556] From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\object_detection\builders\ parallel_interleave (from is deprecated and will be removed in a future version.
Instructions for updating:
Use `, cycle_length, block_length,` instead. If sloppy execution is desired, use ``.
WARNING:tensorflow:From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\object_detection\builders\ DatasetV1.map_with_legacy_function (from is deprecated and will be removed in a future version.
Instructions for updating:
Use `
W1103 08:13:15.434876 68556] From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\object_detection\builders\ DatasetV1.map_with_legacy_function (from is deprecated and will be removed in a future version.
Instructions for updating:
Use `
WARNING:tensorflow:From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\tensorflow\python\util\ sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W1103 08:13:20.280873 68556] From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\tensorflow\python\util\ sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\tensorflow\python\util\ sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W1103 08:13:22.383233 68556] From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\tensorflow\python\util\ sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\tensorflow\python\util\ to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1103 08:13:23.653824 68556] From C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\tensorflow\python\util\ to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
Traceback (most recent call last):
  File "C:\Users\marco\TFOD\TensorFlow\workspace\training_demo\", line 114, in <module>
  File "C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\tensorflow\python\platform\", line 36, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\absl\", line 308, in run
    _run_main(main, args)
  File "C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\absl\", line 254, in _run_main
  File "C:\Users\marco\TFOD\TensorFlow\workspace\training_demo\", line 105, in main
  File "C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\object_detection\", line 605, in train_loop
  File "C:\Users\marco\anaconda3\envs\VENV\lib\site-packages\object_detection\", line 396, in load_fine_tune_checkpoint
    raise IOError('Checkpoint is expected to be an object-based checkpoint.')
OSError: Checkpoint is expected to be an object-based checkpoint.

Python Packages List using pip list

(VENV) C:\Users\marco\TFOD> pip list

Package Version
absl-py 1.4.0
apache-beam 2.46.0
array-record 0.4.1
astunparse 1.6.3
avro-python3 1.10.2
bleach 6.2.0
cachetools 5.5.0
certifi 2024.8.30
charset-normalizer 3.4.0
click 8.1.7
cloudpickle 2.2.1
colorama 0.4.6
contextlib2 21.6.0
contourpy 1.2.0
crcmod 1.7
cycler 0.11.0
Cython 3.0.11
dm-tree 0.1.8
docopt 0.6.2
etils 1.5.2
fastavro 1.9.7
fasteners 0.19
flatbuffers 24.3.25
fonttools 4.51.0
fsspec 2024.10.0
gast 0.4.0
gin-config 0.5.0
google-api-core 2.22.0
google-api-python-client 2.151.0
google-auth 2.35.0
google-auth-httplib2 0.2.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
googleapis-common-protos 1.63.1
grpcio 1.34.1
h5py 3.1.0
hdfs 2.7.3
httplib2 0.21.0
idna 3.10
immutabledict 4.2.0
importlib_metadata 8.5.0
importlib_resources 6.4.0
joblib 1.4.2
kaggle 1.6.17
keras 2.10.0
keras-nightly 2.5.0.dev2021032900
Keras-Preprocessing 1.1.2
kiwisolver 1.4.4
libclang 18.1.1
lvis 0.5.3
lxml 5.3.0
Markdown 3.7
MarkupSafe 3.0.2
matplotlib 3.9.2
mkl_fft 1.3.10
mkl_random 1.2.7
mkl-service 2.4.0
numpy 1.24.3
oauth2client 4.1.3
oauthlib 3.2.2
object_detection 0.1
objsize 0.6.1
opt-einsum 3.3.0
orjson 3.10.11
packaging 24.1
pandas 2.2.3
pillow 10.4.0
pip 24.2
ply 3.11
portalocker 2.10.1
promise 2.3
proto-plus 1.25.0
protobuf 3.19.6
psutil 6.1.0
py-cpuinfo 9.0.0
pyarrow 9.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.1
pycocotools 2.0.7
pydot 1.4.2
pymongo 3.13.0
pyparsing 2.4.7
PyQt5 5.15.10
PyQt5-sip 12.13.0
python-dateutil 2.9.0.post0
python-slugify 8.0.4
pytz 2024.2
pywin32 308
PyYAML 5.4.1
regex 2024.9.11
requests 2.32.3
requests-oauthlib 2.0.0
rsa 4.9
sacrebleu 2.2.0
scikit-learn 1.5.2
scipy 1.13.1
sentencepiece 0.2.0
seqeval 1.2.2
setuptools 75.1.0
sip 6.7.12
six 1.15.0
tabulate 0.9.0
tensorboard 2.10.1
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.10.1
tensorflow-addons 0.22.0
tensorflow-datasets 4.9.0
tensorflow-estimator 2.10.0
tensorflow-hub 0.16.1
tensorflow-io 0.31.0
tensorflow-io-gcs-filesystem 0.31.0
tensorflow-metadata 1.13.0
tensorflow-model-optimization 0.8.0
tensorflow-text 2.10.0
termcolor 1.1.0
text-unidecode 1.3
tf-keras 2.15.0
tf-models-official 2.10.1
tf-slim 1.1.0
threadpoolctl 3.5.0
toml 0.10.2
tomli 2.0.1
tornado 6.4.1
tqdm 4.66.6
typeguard 2.13.3
tzdata 2024.2
unicodedata2 15.1.0
uritemplate 4.1.1
urllib3 2.2.3
webencodings 0.5.1
Werkzeug 3.1.1
wheel 0.44.0
wrapt 1.12.1
zipp 3.20.2
zstandard 0.23.0
@mutti2324 mutti2324 changed the title "Training the Model" described in the file "training.rst" Error Encountered During Model Training Step (OSError: Checkpoint is Expected to be an Object-Based Checkpoint) Nov 3, 2024
Copy link

mutti2324 commented Nov 3, 2024

I could solve this issue:

inside the pipeline.config file


I wrote

fine_tune_checkpoint: "C:/Users/marco/TFOD/TensorFlow/workspace/training_demo/pre-trained-models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0.index"

instead writing

fine_tune_checkpoint: "C:/Users/marco/TFOD/TensorFlow/workspace/training_demo/pre-trained-models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0"

So, removing .index solves this issue. The training model process is running now!

INFO:tensorflow:Step 100 per-step time 6.250s I1103 09:21:13.362221 70996] Step 100 per-step time 6.250s INFO:tensorflow:{'Loss/classification_loss': 0.15669851, 'Loss/localization_loss': 0.03190955, 'Loss/regularization_loss': 0.2535239, 'Loss/total_loss': 0.44213197, 'learning_rate': 0.014666351} I1103 09:21:13.363219 70996] {'Loss/classification_loss': 0.15669851, 'Loss/localization_loss': 0.03190955, 'Loss/regularization_loss': 0.2535239, 'Loss/total_loss': 0.44213197, 'learning_rate': 0.014666351}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

1 participant