-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New sanity check for "invalid bottom" blobs breaks valid networks #601
Comments
Thanks for the report. The check was introduced to catch cases when one would want to use the label blob during inference (i.e. in the
A way to solve your problem would be to add this to your
However that would mean you wouldn't see the validation loss and this is probably not what you want. There is already a test in DIGITS to prevent loss functions from being added to the |
We could add the It's hacky, but it works. |
So with this change @Shadfield would just need to rename her/his |
To use a layer for training and validation but not deploy, he'd set the name to The way it works now is:
Not exactly what I'd call obvious. |
@lukeyeager One thing which I perhaps didn't make clear, I was testing with generic models not classification networks. In answer to the earlier question by @gheinrich: yes when doing "test a single image" in the earlier DIGITS revision, I get the caffe error you pointed out (and it crashes the DIGITS server). Regarding the proposed solution: I previously thought the According to the table it seems like the
I've attached the logfile, train_val and deploy files generated by DIGITS (the "train_" prefix has been automatically removed). Looking at the logfile, it seems like it built the train network successfully and died while building the test network (which still has the flatten layer in). Maybe someone else could try to replicate this and see if the prefix works for them? |
Check BVLC/caffe#3394
|
Looks like all four of your data layers are reading from the same source? That can't be what you want. |
Oh sorry that's my mistake. After moving back and forth between different revisions of DIGITS, I tried to save time creating the dataset by re-using the same lmdb file for all 4 data sources, just for the purposes of this MWE. When I create a proper dataset and use the suggested I suppose this "bug report" was a bit of a waste of everyones time, there was a working solution even if it wasn't obvious! Perhaps instead I could make a feature request... to stop other people running into the same difficulties, maybe it would be useful to unify the behaviour of include phases and prefixes? And also to add "validation" as a third option similar to @gheinrich suggestion of |
Replacing the entire phase/stage framework with layer name prefixes would make me sad. But you're right - we could add even more hackery to make that option more usable. I'm still crossing my fingers for BVLC/caffe#3211 (comment).
Not at all! If I don't know something is bothering you, I can't fix it! |
In case someone hits a problem like that mentioned in NVIDIA#601 for a classification network. Once Caffe implements input layers and phase control from Python we should be able to remove those workarounds.
In case someone hits a problem like that mentioned in NVIDIA#601 for a classification network. Once Caffe implements input layers and phase control from Python we should be able to remove those workarounds.
In case someone hits a problem like that mentioned in NVIDIA#601 for a classification network. Once Caffe implements input layers and phase control from Python we should be able to remove those workarounds.
In case someone hits a problem like that mentioned in NVIDIA#601 for a classification network. Once Caffe implements input layers and phase control from Python we should be able to remove those workarounds.
Closing this issue. You should be able to get around any failing sanity checks with the new power of #628, and the |
@lukeyeager Can you please put some tutorial regarding this SANITY CHECK feature? Whether one needs to append train_ (or deploy_) only in name field or the top field also needs to be changed? The following snippet is not working for me.
|
Hi @khurram-amin you should preferably use |
Try reading through this popup and see if it helps: #628 (comment) |
Revision d3cbdca introduced a new sanity check into digits/model/tasks/caffe_train.py at Line 1515 onwards.
I don't understand the code well enough to pinpoint the error further than this, but there seems to be some problem with the implementation of the sanity check.
Below is an MWE network definition. In previous revisions of digits this executes successfully, and plots out the performance on the train and validation (test) data. Obviously this MWE has nothing to learn, so the error would be constant for all epochs.
In the new revision of DIGITS the training process quits before it even starts, with the error message:
The sanity check is complaining that the flatten layer requires the label blob, even though the label blob is available in both train and test modes. I've also tried including 2 flatten layers (one specified as "phase: TEST" and the other as "phase: TRAIN") with no luck.
If we remove the flatten layer from the MWE (so label is connected straight to the loss layer) it passes the sanity check, and correctly computes the validation loss on the test data. So the label blob is definitely available at test time, even in the most recent DIGITS revision.
MWE network definition copied from the train_val.prototxt that DIGITS creates, with filepaths anonymized.
The text was updated successfully, but these errors were encountered: