Last week I got my hands on DALL·E 2, a proprietary AI that can conjure up unique and realistic images from only a text prompt.
I’m bad at making art from scratch. However, I feel like I have a reasonable sense of aesthetics and composition. I can visualise something that looks awesome in my head, but when it comes to transferring that vision out through my hands and onto paper or screen, I just can’t do it any justice. Maybe I need more practice. I also don’t care enough to make that happen. So, perhaps in another life.
The closest I can get to realising what’s in my imagination is to work with an actual artist, iterating and guiding them toward a final result. Or by starting with something that already exists, maybe a drawing or photo: I have enough photoshop skills to tweak the finer details to get what I want.
It was obvious from using my first few DALL·E credits that I would have to interact with it in basically this way. While impressive, many of the images it generated for me were missing key bits of my prompt or were in places just garbled messes. Often both. I realised that I could not expect it to read my mind or magically create complex and beautiful compositions without some manual editing involved.
To try this process out, I decided to use my last remaining credits for the month on creating a splash screen for my squirrel-based survival game get dem nuts.
Setting the scene
This would be my game’s splash screen, so I had a few requirements in mind:
- It needed to show the main characters and visually impart their actions and interactions. In order of importance: the protagonist, a red squirrel. The antagonist, a red fox. And the other NPCs: grey squirrels.
- It also needed to show the environment that the game is set in (a forest), and other important interactable entities. Namely, a nut (the only source of energy / resource that the player must compete with NPCs to obtain).
- Stylistically I wanted it to be obviously “not representative of actual gameplay” and be either cartoonish or broadly digital painting-esque. The closer to free-to-play mobile games which heavily rely on beautiful splash screens to tempt prospective players to click their ads, the better.
- It’s safe to say that DALL·E is bad at rendering coherent text, so I decided not to worry about where the game’s title text would go yet. That would be a challenge for another day.
Figuring that asking for all the above at once would be too risky for my precious credits, I decided to focus first on the overall style, the forest environment, and my red squirrel protagonist performing a couple of key actions. I tried out a few prompts to test the water:
digital painting of a panicked red squirrel dashing towards the camera, in a forest, with an acorn in its mouth, ground view
This wasn’t really what I was looking for: too reminiscent of a photograph with an overdone Photoshop Smart Filter. I wanted something more cartoonish. I also wanted the squirrel more side-on, running past the camera instead of directly torwards it. Adjusting my prompt accordingly:
anime cartoon of a panicked red squirrel dashing past the camera, in a forest, with an acorn in its mouth, ground view
Horrifying! Maybe if I’d asked for a daemonic red squirrel, I’d have liked these more. However, the style was going in the direction I wanted, and the leftmost image was close in many ways. I tried again, specifying a cleaner art style and emphasising that I wanted a wide-angle shot to ideally include the entire squirrel:
clean cartoon of a panicked red squirrel dashing past the camera, in a forest, with an acorn in its mouth, ground view, wide-angle shot
Nailed it. All of these are GREAT, however the second particularly captured what I was looking for in the character of the squirrel. This was to become the base of my splash screen.
Expanding the scene
My starting point suffers from a problem that I’ve seen a few people talk about1: the squirrel is very cropped! Even the nut is only half in view.
Luckily DALL·E can not only create brand new images, it can also “infill” blank areas of existing ones. It will make a best effort to transfer the same art style to any new areas of the canvas that it touches.
To achieve this, I used Photoshop to expand the bounds of my chosen image. The expanded areas must be transparent for DALL·E to know which areas to infill. By leaving space around the squirrel and under the nut, I hoped DALL·E on a second pass would complete the rest of the drawing.
Uploading this to the DALL·E edit tool, I then painted out a little more of the right-hand side of the image to give it more room to work. It is also important to remove the coloured signature in the bottom right lest it induces DALL·E to paint something a bit wacky in that corner:
Now, for the prompt. I decided to use this as an opportunity to not only expand the scene, but to introduce my antagonist:
a ravenous fox in the distance chases a panicked red squirrel
Hmmm, no… how about a slight rephrase?
a snarling red fox in the distance runs towards a panicked red squirrel
Great, now I have two squirrels. One more try, and let’s make room on the right-hand side to give it more space to work with:
clean cartoon of a fox in the forest closing in on a red squirrel, chasing, snarling, ravenous, drooling face
OK now we’re just getting straight-up Cronenberg’d! I began to suspect that DALL·E maybe isn’t great at painting multiple creatures at once, and in fairness, red squirrels and red foxes are similar in many ways. Or perhaps I used too many adjectives. I’m not sure.
I decided to try generating the crafty canine on its own. Again, I used the edit function, erasing most of the squirrel but keeping as much of the rest of the image as I dared in the hope that the style would remain consistent. In particular, I wanted some of the orange of the squirrel’s tail to seed the tail of the fox:
clean cartoon of a hungry red fox dashing towards the camera, ground shot
Yes!! The second image is perfect. Look at those hungry eyes!!
I placed my squirrel on the left, the fox in the distance, and erased some of the boundary between the two images: DALL·E will also seamlessly merge images, particularly if the styles match. I went with a prompt that broadly summed up the scene, to infill the rest of the squirrel, nut, and paint in the gaps with appropriate background:
fox watching a squirrel with an acorn in the forest
*Chef’s kiss* to the rightmost result. I probably could have used Photoshop, but feeling suddenly rich in remaining credits I erased the grassy foreground and the foxes' lump and perfection was (almost) achieved:
Grey squirrels are tricky
As the final missing character, I needed a grey squirrel somewhere in the scene. I tried various prompts and combinations of seed images with and without the fox involved, but nothing worked very well:
clean cartoon of a grey squirrel sitting in a tree eating an acorn
Um, I’m not sure what I did to deserve these monstrosities.
clean cartoon of a grey squirrel sitting by a tree far away eating an acorn while a fox watches
OK now I’m getting pretty convinced you don’t know the difference between a squirrel and a fox … or a kangaroo??
clean cartoon of a grey squirrel with wide eyes, sitting in the forest far away eating an acorn while a fox watches
More marsupial features creeping in… possibly some koala in these attempts?
Time to go solo
And with that, my credits were all used up :'-(.
There was still some work to be done, but I was confident that I could do the rest manually with Photoshop:
- Some cropping, mirroring, and recolouring of one of the first images DALL·E had gifted me with and I had my grey squirrel. She dashes in the background behind the fox: not too obvious, but noticeably there and part of the forest. I re-added the shadow to show her mid-leap.
- I remembered quite late that the get dem nuts game window is a random 1.3125:1 aspect ratio, and of course DALL·E only generates square images. Photoshop content-aware scale to the rescue. I found that a combination of shortening the image vertically, as well as stretching it horizontally produced a really good result that didn’t make any of the characters look short or fat.
- I added a silhouette of an owl in the sky (which technically appears as another antagonist in the game). This approached and almost exceeded the limits of my drawing ability, but I think it worked quite well.
- Finally, cleaning up of various artifacts and weirdness.
The final splash screen:
DALL·E is very impressive. I did not expect to get such a complex and coherent result so easily and am somewhat astounded at how compliant DALL·E was with most of my requests. It still feels very “magic”, however, in a sometimes-frustrating way. It’s really hard to tell why some prompts work and some don’t – was it the prompt itself, or bad luck? Therefore, some human oversight and curating is absolutely required. It’s pretty clear that many showcased examples of DALL·E’s work are cherrypicked from a (much) larger set of imperfect or downright garbled messes.
I yearned to interact with the system in greater detail, to give it specific instructions. Draw the rest of the squirrel’s tail. Move the fox more to the right and further away from the camera. Make the eyes wider, more like <an example>. I would not be surprised if this is where the technology is headed.
There is a slight anxiety that DALL·E may end up devaluing art and the practical skills of drawing, painting, sketching, etc. Part of me now feels even less incentive to learn these skills, as surely it would be pointless to compete with this machine?! I can only reassure myself that this same anxiety has been felt every time technology has encroached on art: people said the same thing when the camera was invented. Artists, of course, adapt and push boundaries to find new forms of expression.
Probably not me though, I’m bad at art. Best left to the experts!
- As of writing, DALL·E 2 is still closed beta. You can join the waitlist at https://labs.openai.com/waitlist.
- The free DALL·E 2 Prompt Book by Guy Parsons is a must-read before even thinking about spending your coveted credits.
- DALL·E alternatives:
- Craiyon, formerly DALL·E mini – not as good quality, but free.
- Midjourney – very similar to DALL·E open beta, operated via Discord.
- Stable Diffusion – another very similar system, publicly released, can run locally if you have a powerful enough GPU!