training checkpoint - 5500 (1 hour on 3090) #1

johndpope · 2024-06-27T14:24:05Z

i had to rework the generator to use less layers / and use 64 x 64 image resizing.

johndpope · 2024-06-27T14:48:25Z

@fenghe12 / @JaLnYn / @ChenyangWang95
this might actually work.

In megaportaits I use custom resnet50 probably safer to switch that in because otherwise the model is going to just discard the updates???. I check in the morning.

francqz31 · 2024-06-27T15:04:13Z

@johndpope is it just me or sonnet 3.5 machine learning code output is actually way more readable than opus ? feels like actual working code this time !!!

johndpope · 2024-06-27T21:51:14Z

something maybe not quite right. i train overnight.
this is still epoch 0 -

checkpoint-86500

i change the code back to use 512x512
resume training - and get this.

im seeing newer clearer images advancing in epoch 1 - even after a few more cycles - will udpate here later. i think by epoch 4 - probably going to be fairly decent.

i add some tensorboard stuff - and surface the losses.

recon_step_126000.png

UPDATE -
my bad was overfitting to one image. I just push updated dataloader.
new debug image.

Starting training again.
was seeing OOM errors - check your num_of_workers.

UPDATE - i restart training - I change the generator to use resblocks - maybe will help recreate the image better.

UPDATE - Sunday
so i rebuilt code to do progressive training with resolution upscaling - 64,128....256 ...512
added tensorboard losses

i give up training across celebA -
i overfit to one pair of images....

training progress so far

UPDATE - Sunday night

so had some battle with gradient explosions

ending up having to add some accummulation steps in that helped stablize things
#3

looks like the learning rate is getting things into a minima....

UPDATE - i switch to use 256 because resnet50 cant return rich features 2048,7,7 for images less than 224x224.

This was referenced Jun 28, 2024

Current results of training - epoch 4 johndpope/MegaPortrait-hack#36

Closed

Training Scripts ? Songluchuan/AdaSR-TalkingHead#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training checkpoint - 5500 (1 hour on 3090) #1

training checkpoint - 5500 (1 hour on 3090) #1

johndpope commented Jun 27, 2024 •

edited

Loading

johndpope commented Jun 27, 2024 •

edited

Loading

francqz31 commented Jun 27, 2024

johndpope commented Jun 27, 2024 •

edited

Loading

training checkpoint - 5500 (1 hour on 3090) #1

training checkpoint - 5500 (1 hour on 3090) #1

Comments

johndpope commented Jun 27, 2024 • edited Loading

johndpope commented Jun 27, 2024 • edited Loading

francqz31 commented Jun 27, 2024

johndpope commented Jun 27, 2024 • edited Loading

johndpope commented Jun 27, 2024 •

edited

Loading

johndpope commented Jun 27, 2024 •

edited

Loading

johndpope commented Jun 27, 2024 •

edited

Loading