Variational Autoencoders



Views:67755|Rating:4.97|View Time:15:5Minutes|Likes:2869|Dislikes:20
In this episode, we dive into Variational Autoencoders, a class of neural networks that can learn to compress data completely unsupervised!

VAE’s are a very hot topic right now in unsupervised modelling of latent variables and provide a unique solution to the curse of dimensionality.

This video starts with a quick intro into normal autoencoders and then goes into VAE’s and disentangled beta-VAE’s.
I aslo touch upon related topics like learning causal, latent representations, image segmentation and the reparameterization trick!

Get ready for a pretty technical episode!

Paper references:
– Disentangled VAE’s (DeepMind 2016):
– Applying disentangled VAE’s to RL: DARLA (DeepMind 2017):
– Original VAE paper (2013):

If you enjoy my videos, all support is super welcome!

44 thoughts on “Variational Autoencoders”

  1. Sublime text editor is so aesthetic. Anyway, yes, great point, the input dimensionality needs to be reduced. Even the original Atari DeepMind breakthrough relied on a smaller (handcrafted) representation of the pixel data. With the disentangled variational autoencoder it may be feasible or even an improvement to deal with the full input.

  2. The shown loss is not a loss, but minus loss – we are maximising the likelihood expectation according to the paper cited, not minimising. Be careful 🙂

  3. In the Original VEA paper they take a covariance matrix instead of a variance vector.
    also in the beta-VEA paper by Irina Higgins.
    I don't know if the disentangling with beta >1 even works with just a variance vector. So I think that's an important Detail.

  4. Brilliant video, you explain it way better then the most litterature on the subject! It is guys like you who make the world (and me) smarter! thanks 🙂

  5. so a mean and a std is taken at all the outputs [the outputs resulting from each of the training samples] for each output neurons of the encoder?

    so if there are 10 outputs of the encoder and 100 training samples then a mean and a std calulated for each of the 10 outputs using the 100 values that resulted at each output from the 100 samples?

  6. I would like to see more videos from you. Clear explanation of concept and gentle presentation of math. Great job!

  7. Really liked it. Firstly giving an intuition of the concept, its application and then to the objective function while explaining its individual terms, in a way everyone can understand, it was simply professional and elegant. Nice work and thanks!

  8. I'm clear on the purpose of Autoencoders viz. compression of high dimensional space to a lower latent space. Still unclear on why VAE? Whats the advantage of learning mean and variance of a distribution?

  9. Hi, thanks for this nice tutorial. In fact, I have a question on VAE. Hope you have time to answer it. When we use the Decoder network after finishing training, we use N(0,1), not N(m, s), where m and s came from Encoder while training. So, I think we use different distribution for input of Decoder from training time. Can you explain this more clearly? I read some article on this but I cannot get it. What I guess is as follows: when we training we use the KL cost for z ~ (0,1), so when it is trained well enough, the mean and standard deviation should be 0, 1. So, with the assumption, we can use z ~ (0, 1), with m = 0, s = 1. Did I understand roughly correct? Thanks for reading, and more thanks if you can teach me on this.

  10. why do we need a bottlenneck in the case of a denoising autoencoder or neural inpainting? Here we dont use the compressed representation for anything? In comparsion to dimension reduction or visualization where we have a clear usage for them

  11. Very Great video. Thank you so much for the explanation. I am trying to implement disentangled variational autoencoders to regenerate grid layouts. So I started by implementing variational autoencoders first but I couldn't know from programming perspective what is the beta we are talking about ? is it in the one in the sampling function ? Thank you in advance.

  12. Amazing! Thank God! I was researching on Recurrent Variational AE and badly wanted to understand Var AE, Thank You!! I understood a lot! Please Please, Please make more videos!!!!

  13. We needed a serious and technical channel about latest findings in DL. That siraj crap is useless. Keep going! Awesome

  14. Nice video. But about the DAE part, larger beta lead to more focus on learning a distribution close to the prior. So what's your prior? I am assuming the prior is zero-mean-unit Gaussian. So what is the prior for VAE? If it is diagonal Gaussian, then the benefit from DAE is simply a scaling-factor.

  15. What a fantastic channel. Insta-sub.
    I have to replay a few parts twice from the information density, but I guess that's a good thing!

  16. Question: in a Stanford video lecture Justin Johnson said (about two years ago) that Variational Autoencoders are a great idea, but they do not work that well in practice. Is this still true, or VA became more useful over time?..

  17. awesome explanation, thanks. I still don't really get why it's useful to store things as a distribution rather than point estimates though?

Leave a Reply

Your email address will not be published. Required fields are marked *