Diffusion Depot, the smart image manager for Stable Diffusion, Midjourney, and DALL-E
TL;DR: I built an app to manage images generated with ML models. Check it out.
Recently, I started playing with Stable Diffusion, using it to generate countless images. I have a beefy 24GB 3090 Ti that I've held onto even after I stopped streaming on Twitch, and I wanted to know how it would fare compared to my previous experiences with DALL-E 2.
The results were astounding. It generates batches of images in a matter of seconds, making it as fast as DALL-E, with much more impressive results. This is one area where OpenAI definitely doesn't have the lead.
However, within a few days, I had accumulated thousands of different images, some just slight variations on an existing prompt. If I wanted to return to a particular version of a prompt, digging it up was a bother. And what if I wanted to find all images that contained something specific, like all images of Patrick Stewart swimming in the ocean or horses wearing red hoodies (why not)?
It was impossible to manage.
I wanted to generate as many images as I wanted and try out as many prompt variations as I wanted without thinking about how to find them later or carefully curating all the generated images to keep only the "best".
Key features
I started thinking about building an app that could take care of it. What would it take? What features could it have that would make it worth the effort to build the app itself and use it regularly? Well:
- It needs to let me tag images. When I generate a big batch, I need to be able to quickly tag the pictures of that batch and move on. Bonus points if it could do some of the tagging automatically.
- It needs to be faster than the file manager. I can scroll down a long folder full of images, so whatever the app does, it needs to beat that. Maybe a scrollable list of all pictures in higher resolution than what we get with the file manager? It also needs to let me filter by tag, and it needs to be instant.
- It needs to list all prompts I've ever used. And it needs to let me see which images each prompt has generated so that instead of just seeing a bunch of words, I actually see the "typical output" for that prompt.
- It needs to integrate with Stable Diffusion. I want to be able to upscale images whenever I want and generate more images on demand, either based on an existing image or a prompt. Thankfully, all of that is possible with the Stable Diffusion Web UI, which has an API to control it.
With those requirements in mind, I set out to build an app that could do it all. I figured that since this was an app for ML-generated images, I should keep with the theme and generate all the assets around it. I asked Lucas, one of my bots to come up with a name, and he settled on Diffusion Depot. I also used IconifyAI to generate the app’s icon.
How it was built
The app itself was built in Electron, with React and Next.js. An SQLite database stores key metadata about all your pictures, prompts, and tags, grouping them all intelligently. By keeping everything in an SQLite database, all of the app’s data is easy to access and export. No lock-in.
One of the key challenges that I quickly noticed while building Diffusion Depot was that loading a list of thousands of images was quite taxing on the computer. It would noticeable stutter and freeze while first loading the list. The fix involved two key bits of technology.
Firstly, I used a technique called windowing to limit the number of images rendered in the DOM at any one point. That way, even if the list has multiple thousands of images, only a handful are being rendered, which is a lot lighter on the computer.
Secondly, I realised that the size of the images themselves played a part when it comes to the strain being placed on the system. The images that come out of these models are not necessarily optimized, so even though it’s all local, they still carry a cost.
While I was testing this theory out, using my age-old PNG/JPG optimizers, I decided to switch to WebP, since it’s supposed to provide smaller images for the same level of quality. Electron is built on the Blink browser engine, which has supported them for a long time, so there was no reason for me not to.
But then it hit me - if I’m using the Blink browser engine and don’t need to worry about compatibility, I can go a step further and use AVIF1. AVIF provides an even bigger boost in compressed sizes, compared to WebP, and after testing it with a few images, I decided that that was going to be the way to go.
The app generates two thumbnails for each image. One is a low-quality AVIF thumbnail, meant to be included as a base64 data URI in the image list’s data, and the other is a virtually indistinguishable optimized AVIF version of the original image. When rendering the images in the app, there was a very slight flash of unloaded content while the images loaded, and by having the tiny base64 data URI as part of the list data, that went away because a low-res version of the image could be displayed immediately while waiting for the higher-quality image to load. Even if this flash of unloaded content lasted for just a few milliseconds, it was enough to make the experience feel a bit jarring and unpolished, so getting rid of it was one of the highlights of the development of this app.
What's next?
Diffusion Depot is virtually complete. There are a few bugs outstanding that I want to tackle, but the last big hurdle will be making sure that it all works smoothly across platforms. Electron handles the app itself, but I need to make sure that things like the AVIF image encoder, and the Python age/gender detection library, are packaged properly so that it will all work regardless of the platform you run the app on.
With development so far along, I decided to take a short break to put together a nice website and get everything ready for when it actually launches. For now the website invites you to join a waitlist, but I intend to release the full app very soon, and when I do, I’ll add a bit more content, including a section showing off features in more detail, as well as a video of the app itself.
Off-topic: I’ve spent my life building things but never actually sharing them with anyone, and in 2023 I’ve started making a conscious effort to make sure that when I build something, I do the hard work of writing about it and putting it out there. Even if it doesn’t go anywhere, it’s something to point to when someone asks me “What sort of things have you done?”.
If you’ve read this far, thank you! I’d love to hear from you. Is there anything you’d like to see in Diffusion Depot? Anything I missed or that you’d like to know more about? Contact details are in the footer!
P.S. Check it out.
Footnotes
-
AVIF was added to Chrome in 2020 and is already supported by most browsers. The tech world really does move fast (or maybe I'm just getting old). I remember when WebP was the hot new thing, and it's already been replaced by something better. ↩