Text to image generation with Deep Learning

[Update 2022 Oct. 30] Added the text-to-video models recently introduced: Imagen Video and Phenaki. Notation Let’s formulate the problem before going further. Symbol Meaning $g_\theta$ Generator network with parameters $\theta$ $\mathbf{c}$ A caption, represented as a sequence of tokens $x$ An input image, optionally fed to $g_\theta$ to perform modification on it $y$ The output image, sampled from $g_\theta(\mathbf{c})$ or $g_\theta(\mathbf{c}, x)$ $\mathbf{z}$ A latent vector $\mathbf{h}$ Hidden states, intermadiate representation of the input data Intro and problem formulation We refer to text-to-image generation as the tasks of generating visual content conditioned on some text description....

September 29, 2022 · 30 min · Nathan Fradet

Modern Natural Language Generation (NLG) techniques

Intro The task of generating content from deep learning (statistical) models is pretty different from other common machine learning tasks. The underlying objective is to train a model so it can generate realistic content during inference. For continuous domains, state of the art models are mostly based on adversarial (Generative Adversarial Networks, GANs) and denoising diffusion training objectives. These models are capable to create impressive and highly realistic results in a wide variety of styles....

August 18, 2022 · 33 min · Nathan Fradet