Congratulations to Google on open-sourcing Gemma Diffusion!
I want to give a shout-out to a group of really talented Cornell students who developed in the lab a lot of the new ideas that we see in this model:
@mariannearr -- Block diffusion is what enables Gemma Diffusion to generate arbitrary length sequences and support KV caching.
@mariannearr @SchiffYair -- Efficient encoder-decoder diffusion (E2D2) extends block diffusion and is part of what makes Gemma really fast, speeding up inference by running a smaller decoder model.
@SchiffYair @ssahoo_ @Guanghan__Wang -- Uniform diffusion LMs (UDLMs) are the family of discrete diffusion models that underlie Gemma and define its noise process and training objective. This work builds on our earlier simplified losses in MDLMs.
@ssahoo_ -- Uniform diffusion supports built-in error correction and is especially effective with distilled fast samplers like the ones introduced in Duo.
This is a great overview of Gemma Diffusion: https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-diffusiongemma
Check out the students' papers below: