Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint #1930

Open
@919294AkshatSharma

Description

@919294AkshatSharma

Description

Runtime error while training : t2t-trainer --generate_data --data_dir=/t2t_data --output_dir=/t2t_train/deque --problem=text2text_copyable_tokens --model=neural_deque_model --hparams_set=neural_deque --train_steps=100 --eval_steps=5

Environment information

OS: Ubuntu:18.04.5

$ pip freeze | grep tensor

mesh-tensorflow==0.1.21
tensor2tensor==1.15.7
tensorboard==1.15.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==1.15.0
tensorflow-addons==0.19.0
tensorflow-datasets==3.2.1
tensorflow-estimator==1.15.1
tensorflow-gan==2.1.0
tensorflow-hub==0.13.0
tensorflow-io-gcs-filesystem==0.32.0
tensorflow-metadata==1.12.0
tensorflow-probability==0.7.0
tensorstore==0.1.28

$ python -V
Python 3.7.12

For bugs: reproduction and error logs

# Steps to reproduce:
...
# Error logs:
Traceback (most recent call last):
  File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 35, in <module>
    tf.app.run(main)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/snoop/tracer.py", line 173, in simple_wrapper
    return function(*args, **kwargs)
  File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 30, in main
    t2t_trainer.main(argv)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 418, in main
    execute_schedule(exp)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 371, in execute_schedule
    getattr(exp, FLAGS.schedule)()
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/utils/trainer_lib.py", line 468, in continuous_train_and_eval
    self._eval_spec)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
    return executor.run()
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
    return self.run_local()
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
    saving_listeners=saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
    saving_listeners)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1495, in _train_with_estimator_spec
    any_step_done = True
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 861, in __exit__
    self._close_internal(exception_type)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 894, in _close_internal
    h.end(self._coordinated_creator.tf_sess)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 600, in end
    self._save(session, last_step)
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 619, in _save
    if l.after_save(session, step):
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 519, in after_save
    self._evaluate(global_step_value)  # updates self.eval_result
  File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 544, in _evaluate
    'Eval status: {}'.format(self.eval_result.status))
RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions