Hello,

👋 I’m a final year PhD student at Sorbonne University (LIP6, Paris) working on deep learning techniques for symbolic music generation. My principal interest lies in generative AI, i.e. methods to train and use deep learning models to generate content. I am especially interested in discrete modalities (language, symbolic music, code…) for the logic they convey. I am particularly interested in gradient-based generation methods for discrete modalities, e.g. adapting diffusion processes for natural language, code… My main secondary research interest lies in multimodal learning, in particular on how coupling natural language with continuous modalities (image, audio) can help models to learn better for any downstream task.

  • 📒 I use this site to reference my research notes on topics for which I have a special interest. I usually go for a specific topic and make a comprehensive post on the existing solutions and future directions.
  • 🔬 Interests: NLP/NLG, Gradient-based generative methods, Multimodal learning

Beyond autoregressive text generation

[Update 2022 Dec. 4] Added contrastive learning / decoding methods. [Update 2023 Mar. 10] Refactor and adding RLHF and diffusion methods. [Update 2023 Dec. 22] Added DPO, RAG and EMNLP 2023 papers. Intro The task of generating content from deep learning models is different from other common tasks in the sens that 1) the model is often trained to replicate the training data for continuous models or predict the next element for discrete ones; 2) when testing a model, there is no exact expected results, in consequence; 3) its evaluation is often tricky....

October 18, 2022 Â· 51 min Â· Nathan Fradet

Text to image generation

[Update 2022 Oct. 30] Added the text-to-video models recently introduced: Imagen Video and Phenaki. Notation Symbol Meaning $g_\theta$ Generator network with parameters $\theta$ $\mathbf{c}$ A caption, represented as a sequence of tokens $x$ An input image, optionally fed to $g_\theta$ to perform modification on it $y$ The output image, sampled from $g_\theta(\mathbf{c})$ or $g_\theta(\mathbf{c}, x)$ $\mathbf{z}$ A latent vector $\mathbf{h}$ Hidden states, intermediate representation of the input data Intro and problem formulation We refer to text-to-image generation as the tasks of generating visual content conditioned on some text description....

September 29, 2022 Â· 30 min Â· Nathan Fradet