ByteDance's "iLLaDA" is a diffusion language model that keeps up with Qwen2.5
the-decoder.com · ai-productivity-automation · AI Tools & Product Updates
Insight summary
•ByteDance and Renmin University developed iLLaDA, an 8B diffusion language model pre-trained on 12 trillion tokens.
•iLLaDA matches and slightly surpasses the autoregressive Qwen2.5 7B model on average benchmarks, scoring 63.9 versus 63.3.
•Unlike autoregressive models that generate text token-by-token, iLLaDA uses a diffusion approach by refining masked tokens bidirectionally.
•iLLaDA improves substantially over its predecessor LLaDA, notably with a 21.6 point increase on the BBH reasoning test.
•Compared to Google's DiffusionGemma, iLLaDA is focused on quality as a dense model trained from scratch rather than speed.
•iLLaDA lags behind Qwen2.5 Instruct, scoring 67.1 versus 77.1, mainly due to lack of reinforcement learning alignment.
•The study highlights ongoing challenges in directly comparing diffusion and autoregressive language models due to differing benchmarks and model scales.