Midjourney may be considered a toy in the text-to-image field, while Stable Diffusion has always been the text-to-image model closest to a usable tool due to its stability, controllability, and efficiency.
On February 22, 2024, stability.ai released the Stable Diffusion 3 early preview. This model is not yet open for testing.
Link: https://stability.ai/stablediffusion3
Key updates
- Significant improvements in image quality, multi-subject prompts, and word spelling capabilities.
- Utilizes a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements.
- Model sizes range from 800m to 8B parameters, making it suitable for deployment on various devices.
- Safety is integrated throughout the model training, testing, evaluation, and deployment process.
Although it is not yet open for testing, some stability.ai employees have already started sharing images on social media. Sora has made a good start.
SD3 vs. SDXL vs. DALL-E
Prompt: Photo of a red sphere on top of a blue cube. Behind them is a green triangle, on the right is a dog, on the left is a cat.
Prompt: Three transparent glass bottles on a wooden table. The one on the left has red liquid and the number 1. The one in the middle has blue liquid and the number 2. The one on the right has green liquid and the number 3.
Prompt: Anime style illustration of a newsstand on top of a small grassy hill, on top of the newsstand we see the text “it’s here!”. In the background we see a big rain approaching.
Prompt: A horse balancing on top of a colorful ball in a field with green grass and a mountain in the background.
Prompt: Wide photo of a shipwreck on the beach, lots of rust and moss on the ship contrasting with the beautiful blue of the ocean water and the peace that the beauty of nature conveys. The big waves are magnificent and touch the ship.
The above Stable Diffusion 3 images are from @andrekerygma and @EMostaque.
Stable Diffusion 3’s performance so far can basically restore the prompt words 100%. In the picture of the horse, you can even see the horse stepping on the ball, causing the ball to deform.
One of the focuses of this update is the ability to spell words correctly. For example:
Prompt: Photo of an 90’s desktop computer on a work desk, on the computer screen it says “welcome”. On the wall in the background we see beautiful graffiti with the text “SD3” very large on the wall.
Prompt: Resting on the kitchen table is an embroidered cloth with the text ‘good night’ and an embroidered baby tiger. Next to the cloth there is a lit candle. The lighting is dim and dramatic.
Regardless of whether it is the CRT effect on the screen or the embroidery effect on the cloth, even though there were no explicit prompts in the prompt, Stable Diffusion 3 also used the most appropriate effects to display the words “welcome” and “good night”. The integration of text and image is perfect.
We will also do a hands-on test after the technical details and open beta are released. Welcome everyone to stay tuned.
However, testing and the like are all secondary. Judging from the capabilities it has shown so far, Stable Diffusion 3 can already be used as a daily image creation tool.
Further Reading: