Technology Tip

Building AI Content: Text-to-Image

Building AI Content: Text-to-Image

It Takes a Human

In our most recent dive into the AI world we took a look at the use of AI algorithms to weigh the pros and cons of changing the payroll timing for a small business payroll. We leveraged the results from that query to build a memo template introducing the payroll change. Although the use of AI was effective in fleshing out the impacts, the decision to change the payroll timing was still a human decision and the exact content of the company-wide memo, required human judgement since any change to payroll systems is a sensitive issue for any business. That example showcased that the cooperative nature of AI in developing text-based results is a key component to achieving an effective result.

Image AI is a Similar, but Different, Tool

Besides a text-to-text use of AI, there are also applications of AI in the production of text-to-image content.  In short, if you can describe something, AI can simulate an image of it. These programs naturally have a different type of machine learning and algorithms built around human understanding of the visual world. Just like their text-to-text counterparts, text-to-image requires responding to a prompt and the structure of the prompt is built around visual description.

It’s All About Fidelity

One of the first things to check with a visual AI program is how faithfully the AI will produce images of real objects.  Here’s an example:  I located a photo of a Western Bluebird, a fairly common bird found throughout the Western US.  I then requested the AI to generate an image of a “western bluebird on a perch”.  You can see these two images side, by side, below. There are very slight differences, so we can see that it is capable of generating a photo-realistic image. But perhaps one of the most interesting things to notice is that the AI chose to create an image of a male bluebird without being instructed to.  It has a built-in bias, at least in this case, to show the male of the species. By the way, the photo is the picture on the right with the AI counterpart on the left. The biggest differences are in the area of the head and the wing feathers.  The program used to generate this image was https://deepai.org/machine-learning-model/text2img .

Building AI Content1Building AI Content2

An Infinite Image Catalog

The text-to-image feature of generative AI is fairly good when reproducing objects from real life.   This addresses one of the common uses of this feature which is basically to use it as an “infinite” image generator.  You do have to be careful with copyright issues, so it is recommended that you perform a reverse image search just to be sure your final image doesn’t infringe on some other creator’s copyright. One other quirk to be aware of is that most image AI generators have difficulty with human hands, feet and, sometimes faces. This is a common enough occurrence that several AI companies have specific fixes designed to correct these anomalies as a secondary action to the original image.

Using Your Imagination

Some ideas are just best when they are expressed visually and this is where the text-to-image AI really shines (and sometimes frustrates). If you are already familiar with a lot of the terms used to describe an image, you will be able to generate useful prompts fairly quickly. If not, then it can take awhile to go from a concept to a finished idea.  In fact, a common application is to use the AI to express a visual concept, before turning the job over to a graphic designer to produce the finished image.  Let’s see how this might work.

How Text-To-Image Works

Suppose you have an idea for promoting a new type of refreshing drink that will be “Out of this World”. You want to call it Rocket Soda and want it to have a bit of a throwback feel.  So this is what you write for the prompt: The main object in the image is an imaginary planet that looks similar to Saturn. A secondary object, about 75% of the size of the first image will look like a 1950’s toy rocket that is orbiting the planet. The rocket will have a large label, red with a blue background that covers most of the rocket body and has the words Rocket Cola in all caps.

Using this prompt, here is the image that was generated.  OK, so it’s not perfect, but it is enough to hand over to a graphic designer.  They can read the prompt and get a feel for what you were going for.  When building this illustration, it took me a few tries to get close enough to the concept that I was happy with.

With about the same amount of effort I could have made a pencil sketch and brought that to a designer, but having an actual image to start from just carries more impact.

A Brand New Way of Working

It should be clear from this example that the text-to-image type of AI is a powerful visual tool for creating anything from common everyday objects to fanciful versions of regular objects, as well as a means to express new ideas. If your marketing needs a refresh on imagery, or your web site needs images to increase the impact of your on-line presence, you now have the tools available to make this happen quickly and economically.

Read other technology articles