Back to insights

Published on 6/27/2026

ByteDance's "iLLaDA" is a diffusion language model that keeps up with Qwen2.5

the-decoder.com · ai-productivity-automation · AI Tools & Product Updates

ByteDance's "iLLaDA" is a diffusion language model that keeps up with Qwen2.5

Insight summary

  • ByteDance and Renmin University developed iLLaDA, an 8B diffusion language model pre-trained on 12 trillion tokens.
  • iLLaDA matches and slightly surpasses the autoregressive Qwen2.5 7B model on average benchmarks, scoring 63.9 versus 63.3.
  • Unlike autoregressive models that generate text token-by-token, iLLaDA uses a diffusion approach by refining masked tokens bidirectionally.
  • iLLaDA improves substantially over its predecessor LLaDA, notably with a 21.6 point increase on the BBH reasoning test.
  • Compared to Google's DiffusionGemma, iLLaDA is focused on quality as a dense model trained from scratch rather than speed.
  • iLLaDA lags behind Qwen2.5 Instruct, scoring 67.1 versus 77.1, mainly due to lack of reinforcement learning alignment.
  • The study highlights ongoing challenges in directly comparing diffusion and autoregressive language models due to differing benchmarks and model scales.

Content details

Industry
ai-productivity-automation
Topic
AI Tools & Product Updates
Source
the-decoder.com
Language
en
View source