Paper Review: Vector-Quantized Variational Autoencoder (VQ-VAE)

Ryan S
3 min readApr 9, 2022

In this quick review, we’ll be talking about VQ-VAE, a novel autoencoder model that improves the reconstruction and compression quality of samples through the employment of a discrete (vector-quantized) latent space and a learned prior.

Autoencoders work by finding an information-maximizing compressed representation in a lower-dimensional space than their original input. Photo by JJ Ying on Unsplash.

Overview

Tags: Compression, Deep Learning, Computer Vision, Data Augmentation

Year Published: 2017

Research Gap(s) Filled: Improved the generative and compressive capabilities of VAE models through: (i) discrete, vector-quantized latent spaces, and (ii) learned priors for the generative model.

Links:

  1. AirXv
  2. GitHub (original)
  3. Tutorial (Keras/TensorFlow 2)
  4. A great blog post explaining VQ-VAE

Abridged Summary

The Vector-Quantized Variational Autoencoder (VQ-VAE) model is a novel unsupervised machine learning algorithm that builds upon Variational Autonencoders (VAEs) through the use of a vector-quantized, discrete latent space [1, 2]. This latent representation lends itself well to producing higher-quality input reconstructions relative to standard VAE models, particularly for the computer vision domain.

--

--

Ryan S

Image Scientist, MIT CSAIL Alum, Tutor, Dark Roast Coffee Fan, GitHub: https://github.com/rmsander/