This repository was archived by the owner on Jul 7, 2023. It is now read-only.
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint #1930
Open
Description
Description
Runtime error while training : t2t-trainer --generate_data --data_dir=/t2t_data --output_dir=/t2t_train/deque --problem=text2text_copyable_tokens --model=neural_deque_model --hparams_set=neural_deque --train_steps=100 --eval_steps=5
Environment information
OS: Ubuntu:18.04.5
$ pip freeze | grep tensor
mesh-tensorflow==0.1.21
tensor2tensor==1.15.7
tensorboard==1.15.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==1.15.0
tensorflow-addons==0.19.0
tensorflow-datasets==3.2.1
tensorflow-estimator==1.15.1
tensorflow-gan==2.1.0
tensorflow-hub==0.13.0
tensorflow-io-gcs-filesystem==0.32.0
tensorflow-metadata==1.12.0
tensorflow-probability==0.7.0
tensorstore==0.1.28
$ python -V
Python 3.7.12
For bugs: reproduction and error logs
# Steps to reproduce:
...
# Error logs:
Traceback (most recent call last):
File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 35, in <module>
tf.app.run(main)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/snoop/tracer.py", line 173, in simple_wrapper
return function(*args, **kwargs)
File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 30, in main
t2t_trainer.main(argv)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 418, in main
execute_schedule(exp)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 371, in execute_schedule
getattr(exp, FLAGS.schedule)()
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/utils/trainer_lib.py", line 468, in continuous_train_and_eval
self._eval_spec)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
return executor.run()
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
return self.run_local()
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
saving_listeners=saving_listeners)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
saving_listeners)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1495, in _train_with_estimator_spec
any_step_done = True
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 861, in __exit__
self._close_internal(exception_type)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 894, in _close_internal
h.end(self._coordinated_creator.tf_sess)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 600, in end
self._save(session, last_step)
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 619, in _save
if l.after_save(session, step):
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 519, in after_save
self._evaluate(global_step_value) # updates self.eval_result
File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 544, in _evaluate
'Eval status: {}'.format(self.eval_result.status))
RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint
Metadata
Metadata
Assignees
Labels
No labels