Stable Diffusion 3: A Major Leap Forward in Text-to-Image Generation (Compared to SDXL & DALL-E)

Midjourney may be considered a toy in the text-to-image field, while Stable Diffusion has always been the text-to-image model closest to a usable tool due to its stability, controllability, and efficiency.

On February 22, 2024, released the Stable Diffusion 3 early preview. This model is not yet open for testing.

Key updates

  • Significant improvements in image quality, multi-subject prompts, and word spelling capabilities.
  • Utilizes a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements.
  • Model sizes range from 800m to 8B parameters, making it suitable for deployment on various devices.
  • Safety is integrated throughout the model training, testing, evaluation, and deployment process.
SD3 Donuts astronauts

Although it is not yet open for testing, some employees have already started sharing images on social media. Sora has made a good start.

SD3 vs. SDXL vs. DALL-E

sd3 sdxl delle red sphere blue cube cat dog

Prompt: Photo of a red sphere on top of a blue cube. Behind them is a green triangle, on the right is a dog, on the left is a cat.

sd3 sdxl delle Three transparent glass bottles

Prompt: Three transparent glass bottles on a wooden table. The one on the left has red liquid and the number 1. The one in the middle has blue liquid and the number 2. The one on the right has green liquid and the number 3.

sd3 sdxl delle Anime style illustration newsstand

Prompt: Anime style illustration of a newsstand on top of a small grassy hill, on top of the newsstand we see the text “it’s here!”. In the background we see a big rain approaching.

sd3 sdxl delle horse balancing on top colorful ball

Prompt: A horse balancing on top of a colorful ball in a field with green grass and a mountain in the background.

sd3 sdxl delle shipwreck on beach

Prompt: Wide photo of a shipwreck on the beach, lots of rust and moss on the ship contrasting with the beautiful blue of the ocean water and the peace that the beauty of nature conveys. The big waves are magnificent and touch the ship.

The above Stable Diffusion 3 images are from @andrekerygma and @EMostaque.

Stable Diffusion 3’s performance so far can basically restore the prompt words 100%. In the picture of the horse, you can even see the horse stepping on the ball, causing the ball to deform.

One of the focuses of this update is the ability to spell words correctly. For example:

90's desktop computer

Prompt: Photo of an 90’s desktop computer on a work desk, on the computer screen it says “welcome”. On the wall in the background we see beautiful graffiti with the text “SD3” very large on the wall.

good night baby tiger

Prompt: Resting on the kitchen table is an embroidered cloth with the text ‘good night’ and an embroidered baby tiger. Next to the cloth there is a lit candle. The lighting is dim and dramatic.

Regardless of whether it is the CRT effect on the screen or the embroidery effect on the cloth, even though there were no explicit prompts in the prompt, Stable Diffusion 3 also used the most appropriate effects to display the words “welcome” and “good night”. The integration of text and image is perfect.

We will also do a hands-on test after the technical details and open beta are released. Welcome everyone to stay tuned.

However, testing and the like are all secondary. Judging from the capabilities it has shown so far, Stable Diffusion 3 can already be used as a daily image creation tool.

