For the last few days, I decided to finally set about playing with Stable Diffusion – an open source model from Hugging Face which has several direct competitors, particularly Dalle2. They all do the same thing, more or less. They take a text prompt, convert it to an embedding, and then use that to generate novel images. Because it is actually a different ‘denoising’ process, different random seeds can result in totally different images from the same prompt. These tools have been around for a few years but in the last year or so they have really reached a level of usability and popularity to reshape the entire graphics design field. They can create almost anything in any style, but still have plenty of problems, struggling with faces, text, and in my case, adding proper sails to square-rigged ships.
I want to throw my ideas into the biggest controversy about these models: that they will destroy jobs in the graphics design field, and that they are stealing the work of existing artists in their training.
On the first note, I think no, they won’t destroy jobs. My reasoning is that the pictures aren’t good enough to stand alone professionally without editing. The above photo I created required extensive editing in Pixelmator Pro using my very-not-good photo editing skills to clear out artifacts. These models help a ton with the idea generation phase of work however. Instead I see these becoming a powerful tool for graphics designers. I think, rather than destroying jobs, it will instead push the standards higher for what a “normal quality” image is for the same price – just as Photoshop and such have made fancy editing relatively commonplace whereas in decades passed it was a highly specialized job. It still takes plenty of effort, ‘prompt engineering’ to create image as well.
On the stealing ideas debate, I say, well, have you read any art history? There’s a reason we can put art down into different historical periods, because everyone during the time was all copying each other’s styles. Things like Impressionism didn’t just pop out of Monet’s head, it evolved over time alongside other artists. And there is a lot of direct copying too. For example, the image of Marilyn Monroe by Andy Warhol gets used inside of many other pieces of art. In art schools they even explicitly study other art to learn from it. Why is it weird that an AI model is doing the same? Outright direct plagiarism is a problem, but heavy influence from other art is common and acceptable.
I should note that I am proud of my artistic creation above. I put a lot of work into shaping it, and I feel it definitely qualifies as a personal artistic creation. Really, these models are just making art creation more accessible, and increased accessibility, while sometimes scary, is a good thing overall.