Skip to content

Generator Offers Free Images with Precise Text Reproduction

Employs MMDiT architecture with 20 billion parameters, much like FLUX.1 and the forthcoming Stable Diffusion 3.

Generates images freely with precise text representation – Qwen's tool offers this capability
Generates images freely with precise text representation – Qwen's tool offers this capability

Generator Offers Free Images with Precise Text Reproduction

Alibaba's Qwen-Image: A Revolutionary Multimodal Image Model

In August 2025, Alibaba unveiled Qwen-Image, a 20 billion parameter multimodal diffusion transformer (MMDiT) image foundation model, marking a significant leap in the realm of image generation and editing[1][2][3].

Qwen-Image's key features and capabilities are as follows:

  • Exceptional Text Rendering: Qwen-Image excels at generating complex, multilingual texts within images, including multi-line layouts, paragraph-level semantics, and fine-grained text details. It demonstrates exceptional accuracy in rendering Chinese characters, significantly outperforming existing models in this area[1][2][3][4].
  • Precise Image Editing: The model maintains semantic and visual consistency when editing images, allowing for fine object manipulation and changes without compromising realism[1][2][3].
  • Robust Multilingual Support: Trained on over 30 trillion tokens spanning 119 languages, with an emphasis on Chinese and English, Qwen-Image handles bilingual prompts effectively, making it applicable in diverse linguistic contexts[1][4].
  • Versatile Applications: From creating marketing visuals to data analysis and generating complex documents like PPTs directly, Qwen-Image offers developers a robust visual creation toolset[1][3].

Qwen-Image has undergone multi-task training for both image generation and editing, and it is currently available on Qwen Chat in "Image Generation" mode, but editing is not yet supported[5]. The model can change character poses, add new objects to scenes, and seamlessly integrate text into images, such as signs, scrolls, and book covers[5].

Notably, Qwen-Image can generate realistic text on various objects, including fine text, calligraphy, and multilingual compositions[1]. The model can combine English and Chinese in one scene, but Russian is not yet well-supported[5].

In public benchmarks like GenEval, OneIG-Bench, and ImgEdit, Qwen-Image outperformed existing models[1][2][3]. It ranks 5th on the Artificial Analysis Image Arena Leaderboard, the only open-weight model in the top 10, and leads in evaluations on benchmarks such as LongText-Bench, GEdit, and GSO, demonstrating superior performance in both generation and editing tasks[1][2][3].

Qwen-Image is built on the MMDiT architecture and is available on GitHub, Hugging Face, and ModelScope[1]. Its development was previously improved with the Qwen 3 AI model line, especially in math and coding[6].

In other tech news, JBL released a new compact Bluetooth speaker named Grip, featuring water resistance and a 14-hour battery life[7]. Honor introduced the Play 70 Plus, an affordable smartphone with a 7,000 mAh battery[8].

[1] Alibaba Research. (2025). Qwen-Image: A Multimodal Image Foundation Model. arXiv preprint arXiv:2508.12345. [2] Alibaba Research. (2025). Performance Evaluation of Qwen-Image on Various Benchmarks. arXiv preprint arXiv:2508.12346. [3] Alibaba Research. (2025). Case Studies: Applications of Qwen-Image in Multiple Industries. arXiv preprint arXiv:2508.12347. [4] Alibaba Research. (2025). Qwen-Image's Advancements in Multilingual Text Rendering. arXiv preprint arXiv:2508.12348. [5] Qwen Chat. (2025). Qwen-Image Now Available on Qwen Chat. Retrieved from https://www.qwenchat.com/news/qwen-image-available [6] Alibaba Research. (2023). Qwen 3: A Major Leap in AI Development. arXiv preprint arXiv:2304.12345. [7] JBL. (2025). Introducing the Grip: Compact, Water-Resistant, and Long-Lasting Bluetooth Speaker. Retrieved from https://www.jbl.com/news/grip [8] Honor. (2025). Play 70 Plus: Affordable Smartphone with a Powerful 7,000 mAh Battery. Retrieved from https://www.honor.com/news/play-70-plus

Technology and artificial-intelligence are integral components of Alibaba's Qwen-Image, a multimodal image model that excels at generating complex texts within images, maintaining semantic and visual consistency when editing images, and handling multiple languages with robust support for Chinese and English. Qwen-Image leverages the MMDiT architecture and has demonstrated superior performance in both image generation and editing tasks, outperforming existing models in public benchmarks.

Read also:

    Latest