Skip to content
This repository has been archived by the owner on Jun 26, 2021. It is now read-only.

remove timestamp from save path #200

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

gedoensmax
Copy link
Contributor

To resolve issue #197

Copy link
Member

@mibaumgartner mibaumgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some very minor requests.

delira/training/utils.py Outdated Show resolved Hide resolved
delira/training/utils.py Show resolved Hide resolved
@mibaumgartner mibaumgartner added changes requested Things that are currently developed ready for review Things that need to be reviewed labels Sep 6, 2019
Copy link
Collaborator

@ORippler ORippler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should define what the expected default behaviour of our experiment is first (trying to resume vs. re-running from scratch).

Imo name should serve as a UID given by the User (if they want, they can put a timestamp there), and default behaviour should be to try and resume experiment state in case of os.path.join(save_path, name) existing rather than changing it to be unique. Reason is, that this is passed foward to experiment._setup_training as default, and this state is ultimately used by the trainer to search for a previous state.

If we don't want to do that, I see no major advantage in refactoring the timestamp to an increasing integer number.

Thoughts?

@mibaumgartner
Copy link
Member

mibaumgartner commented Sep 6, 2019

I think, that is a problem with the current design of the experiment.resume function which uses a separate save_path as its argument. Maybe it would be more suitable to design the resume function like a static method which also restores the original experiment settings (so the arguments passed to the __init__). Obviously, it would be necessary to save those settings inside the experiment.

I thought the increasing integer value would be a nice way to decrease the length of the experiment names 🙈

@justusschock
Copy link
Member

I would prefer a version, that ensures unique names, if a path already exists. I usually have one naming convention and rely on the timestamp part for not overwriting previous checkpoints. My proposal: We add a flag to the experiment (called automatic_resume or something like this), which defaults to True and indicates whether to resume or create a unique path

@gedoensmax
Copy link
Contributor Author

This flag is what i had in mind too but we will have to adjust some things for this as pickle dumps all information about the expiremt setup at the end of training right? It might be better so dump the configuration first and then train the network. So that we just have to search for the latest checkpoint.

@justusschock
Copy link
Member

Searching for the latest checkpoint is already handled by the trainer

@mibaumgartner mibaumgartner added changes requested Things that are currently developed and removed changes requested Things that are currently developed ready for review Things that need to be reviewed labels Sep 24, 2019
@gedoensmax
Copy link
Contributor Author

Format to use for the stamp

sequence number_year_month_day

If the run should be continued if present the first sequence is used

@justusschock
Copy link
Member

This is a minor issue, but can we maybe put the sequence number at the end? This seems more intuitive to me

@gedoensmax
Copy link
Contributor Author

We briefly agreed on having it this way in the meeting today of course the other way round makes sense too. I kinda like the idea of having it this way so that a simple number sorting is recognised on first sight but either way is fine for me.

@justusschock
Copy link
Member

Okay, that's a valid reason too, but this way you get the first run from different days sorted together instead of a "real" development over time. What doe the others think on this? @haarburger @mibaumgartner ?

@gedoensmax
Copy link
Contributor Author

Oh i would continue numerating even if the day changes.

@justusschock
Copy link
Member

That's something I wouldn't like. If we create an experiment, the date of the creation should be used for all runs, but if we re-initiate an experiment, I would also choose a new date

@gedoensmax
Copy link
Contributor Author

That's something I wouldn't like. If we create an experiment, the date of the creation should be used for all runs, but if we re-initiate an experiment, I would also choose a new date

Ok to clearify if i start an experiment today it will be 00_date_today and 01_date_today if i start the same one an hour later.
If i start it tomorrow again or in a month it would be 02_date_then but only if the experiment name is still the same which i think should be changed if something else is tried at a later point.

@justusschock
Copy link
Member

justusschock commented Sep 24, 2019

Ah okay, I would do it like

date_today_1, date_today_2 etc. But then I would restart with date_then1.

Another problem is, that we need some management of adding zeroes. E.g. if we have already 9 successful runs (which are named in one of the ways above, doesn't matter which one exactly) and we start the 10 run, we need to move the runs before to paths with starting zeroes to keep the sorting correct (and same if re reach the 100., 1000. etc. run). This shouldn't be hard to realize, but something we need to be aware of.

EDIT: the naming convention you proposed is also okay for me, as long as we take care of the necessary zeroes

@mibaumgartner
Copy link
Member

mibaumgartner commented Sep 24, 2019

After looking at the pros and cons, I would prefer the number at the end (did not think about sorting during our meeting). Furthermore, I would like to start from 0 at each day (edit: and) zero pad to the following 001 ... 010 ... 100 ... (no padding) ... 1000. I think, most users won't run more than 1000 experiments per day (the only drawback would be a suboptimal ordering of the experiments when running more than 1000).

@haarburger
Copy link
Contributor

haarburger commented Sep 24, 2019

I'd prefer the implementation that @mibaumgartner just suggested.

@gedoensmax gedoensmax removed the changes requested Things that are currently developed label Oct 1, 2019
@gedoensmax gedoensmax added the ready for review Things that need to be reviewed label Oct 1, 2019
Travis AutoPEP8 Fixes and others added 2 commits October 2, 2019 01:40
@gedoensmax
Copy link
Contributor Author

@mibaumgartner after i fixed the style check the sklearn backend now crashes could you check if that is a known bug because it seems to me like it could be.

# Conflicts:
#	delira/training/base_experiment.py
@justusschock
Copy link
Member

justusschock commented Oct 17, 2019

The failure of the test is caused, because in python 3.5 the ordering of a dict is not preserved.
Due to this fact, we have to rewrite this test in a way that it only checks if both lists contain the same elements but does not check their order.

EDIT: I adressed this in #227

Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current implementation does not completely reflect what we've been discussing, since the mode to also generate a non-unique save-path is missing which would allow automatic resumes of failed trainings

@gedoensmax
Copy link
Contributor Author

Ok i would actually suggest they should still be somewhat unique so that a k-fold continuation is possible, in which we just start the highest number again.
Meaning we omit the timestamp in case we want to continue training or what do you have in mind ?

@justusschock
Copy link
Member

Yeah something like that. But we don't need to take care of the kfold in this case, because inside the save_path the experiment will also create a separate folder for each run.

@gedoensmax
Copy link
Contributor Author

As this has been here for a while now we may just add in training continuation should we ?
But someone mentioned it is already implemented so it would be great to share the idea if there is already a fixed idea on this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ready for review Things that need to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants