Improved Precision And Recall Metric For Assessing Generative Models


  1. Precision and Recall GAN Precision and Recall.png
  2. Revisiting Classifier Two-Sample Tests (2017)
    • Use binary classifier (DNN, or just KNN) to perform two-sample test / evaluate distribution between generated and real data distributions
    • Also cannot deal with mode collapse
  3. Approximating Manifold using k nearest neighbours GAN knn manifold approximation.png knn precision recall.png
  4. Pareto frontiers
    • Do not make assumptions / explicit preference in multi-objective optimization (precision vs recall) Multiobjective Optimization
    • Using snapshots during training


StyleGAN alterations (black dots are lowest FID)

stylegan precision recall.png A: original StyleGAN1 B: no minibatch std -> less variation C: also less R1 discriminator regularization -> less variation D: no progressive growing -> hurts FID E: randomly translates output image before passing to discriminator -> better precision F: no instance normalization -> more variety and better FID (lol somehow, but style mixing probably does not work)

Realism score

stylegan realism score.png truncation performed by only retraining hyperspheres with radii below median to perform over-conservative estimate (pruning prevents having very wrong scores for generated samples that fall in underrepresented samples where radii are overly loose)

R >= 1 means generated sample falls in real distribution

Evaluate Interpolations

Do linear interpolation on StyleGAN latent space and investigate realism score