The Beginner's Guide to ChatGPT Part 2: Image Generation

This is part two of a series for beginners in AI — if you missed part one, you can read it here

In the first edition of this series, we looked at some basic prompts to start creating resources you can use in your lessons. In part two, we’re going to shift our focus to image generation — but first, a confession: we’re going to explore a couple of different ways of doing this, not all of which use ChatGPT!

Image generation using AI is a speedy way of creating the precise picture prompt you need without spending hours trawling through Google images. But it’s also an engaging and fun way of exploring the relationship between language and imagery.

Much like we saw in the previous blog, getting exactly what you have in your head onto the screen involves patience, practice, and lots of feedback. Before we start exploring the most effective way of prompting, let’s consider the options available.

Choosing a platform

In keeping with the title of this series, ChatGPT is your first option. However, image creation isn’t available through the free version, so you’d need to upgrade to ChatGPT Plus, to access ‘GPT-4’, OpenAI’s flagship model at a cost of $20 / £16 per month.

From here, you can simply ask ChatGPT to make you an image of such and such, or for a little help head to ‘Explore GPTs’ where you can explore a wide range of community-built GPTs, each with a specific focus. For the benefits of this blog, we’ll be using ‘Image generator’ created by Naif J Alotaibi.

Option two comes courtesy of Microsoft, whose newly-released Copilot offers a series of AI assistants programmed to complete a range of tasks, including image generation. Microsoft Copilot is free, but the associated app-specific Copilots can come with a monthly subscription. We’ve used the free Copilot for this blog, which can be accessed using your existing Outlook account or a private email address. You can sign into / up for Copilot here.

Creating your first prompt

As with text-based tasks, the success of your image creation with ChatGPT or Copilot lies in the quality of your prompting. The following guidelines should help:

Be as specific as you can in defining objects or actions you want included in your image e.g. ‘The image should include a teacher and a pupil, and the teacher should be showing the pupil an image on a computer screen’
Use time descriptors to help you — words like ‘traditional’, ‘futuristic’, ‘Victorian’, ‘modern’ or ‘ancient’ can all help refine the image you’re after
Reference artistic styles similar to the one you want to create e.g. ‘The image should be in the style of a film noir poster’ or ‘The image should use a pop art aesthetic’.
Define a colour palette if relevant; this is particularly useful if creating an image to fit with an existing brand or colour scheme. Confession: the CPD page of our website uses AI-generated images! Can you tell which colours we requested in the prompts?
Offer feedback on each attempt — you almost certainly won’t get the perfect outcome straightaway but be patient, clarify what’s wrong, and ask again.

Here’s one I made earlier

Let’s look at some examples!

The task involves generating art inspired by poems from the GCSE anthology as a starting point for discussion of metaphor and imagery. It was inspired by @elucymay on Twitter — you can see her results here.

In this case I used the full text of ‘Sonnet 29’ by Elizabeth Barret Browning, a poem which uses natural imagery to explore the hold the speaker’s lover has on her thoughts. The images below use the same initial prompt, with follow ups as needed.

The prompt I used was:

I need you to help me generate an image inspired by the poem 'Sonnet 29' by Elizabeth Barret Browning.

Here is the poem:

"I think of thee!—my thoughts do twine and bud

About thee, as wild vines, about a tree,

Put out broad leaves, and soon there 's nought to see

Except the straggling green which hides the wood.

Yet, O my palm-tree, be it understood

I will not have my thoughts instead of thee

Who art dearer, better! Rather, instantly

Renew thy presence; as a strong tree should,

Rustle thy boughs and set thy trunk all bare,

And let these bands of greenery which insphere thee

Drop heavily down,—burst, shattered, everywhere!

Because, in this deep joy to see and hear thee

And breathe within thy shadow a new air,

I do not think of thee—I am too near thee"

I would like the image to be in the style of a Renaissance painter. I'd like it to focus on the natural imagery in the poem.

ChatGPT4 did a good job on the first attempt:

An AI-generated image of a palm tree in the middle of a lush, misty forest. The trunk has thick vines wrapped around it with leaves sprouting.

The same prompt using Copilot wasn’t quite so successful:

A panel of 4 AI-generated images, each of which features a woman looking thoughtfully out over a body of water. Each image has a floral border, with some faint images of trees in the background.

Copilot hasn't achieved the focus on natural imagery I asked for so I follow up, this time specifying some quotes to draw from:

A screenshot of a rephrased prompt reading "Hmmm this isn't quite right. Please try again, but focusing on capturing the following parts of the poem: "My thoughts do twine and bud about thee, as wild vines, about a tree", "the straggling green which hides the wood","let these bands of greenery which insphere thee drop heavily down — burst, shattered everywhere." This is followed by an apology from Copilot and offer to try again,

Here's the output:

A series of 3 more AI-generated images, more abstract than the previous attempt. All three focus on images of trees with vines wrapped around, and feature text below. The text is partly lifted from the poem but partly gibberish.

The images it’s created are closer to what I envisaged — in a lesson, I’d be able to use them to prompt discussion around what the description of “wild vines” and “broad leaves” might tell us about the poet’s feelings.

One thing I wasn’t so happy about was the text in the image — despite all its capabilities, LLMs are notoriously bad at combining coherent text with pictures, and anyone with a strong zoom will see the text in the examples above is a combination of words from the actual poem and gibberish!

I asked the Copilot to try again but without the text, but the output is disappointingly similar to what I’ve just seen:

A set of 4 more AI-generated images, similar to the previous set. Two of them feature wreaths rather than trees, and all feature text which is a mix of words from the poem and gibberish.

A second request to remove the text yields much the same result —why might this be?

The big weakness in AI image generators

Humans learn from a young age that while some collections of lines create pictures, others create symbols which we can in turn interpret as words and language.

Imagine the difficulties in trying to train a computer to know the difference — think, for instance, how often we see circles and how many different meanings they can hold.

Imagine the precision needed to teach a computer when a tiny circle coloured black represents a full stop, when it represents a point on a graph, or when it represents a speckle on an egg. And if that same circle is hollow, when it represents the number 0, or the letter o, or an open mouth, or simply a circle…

But although this is a highly relevant factor for training AI models, there’s another — in this case even more potent — reason: making reference to a poem will also ‘trigger’ all the forms of ‘a poem’ that the LLM will have been provided in its training data.

Understanding training parameters

In the previous blog, we looked at how LLMs essentially generate outputs through prediction, providing the user with the most likely response based on what it has ‘learned’ so far. In interpreting a reference to a poem, the LLM has drawn on everything it knows about poems and images and all the contexts in which these things combine. If this includes poetry inscriptions, for instance, then the LLM might reasonably assume that an image based on a poem must include textual elements.

Training in this way generally provides the image we want — though as we’ve seen above, there is still room for confusion.

When trying to tell real photographs from AI-generated images, the big clue used to be the hands — this basic part of the human anatomy had the computers baffled. A single, isolated hand is rarely the focus of an image, and often not all of the hand is visible. When an LLM is trying to ‘create’ a hand based purely on the images it’s been provided, it can therefore be understandably difficult to determine something as simple as the number of fingers and thumbs on a single hand.

4 images of AI-generated images of handshakes — all include too many fingers, and one includes a double-ended, disembodied hand shaking hands at both ends.

Frustrating though the quirks above might be were one to try and create an image of a hand inspired by a poem, they also offer a wealth of opportunities for engagement in the classroom.

Prompting for success

If every image is a predicted ‘best guess’ based on every other image used to train the LLM, then why not work backwards and have pupils recreate the prompt, rather than the image? Challenge classes to identify the artistic style, colours, or objects referenced in an image.

This is also an excellent opportunity to explore bias in action: what does an AI-generated doctor look like, or an AI-generated town in Africa? This in turn opens discussion into current applications of AI embedded into our lives already, like the ‘Up next’ video choice on YouTube or a TikTok ‘For you’ page. Such uses quite literally shape our view of the world — the more pupils recognise this, the more prepared they will be to interact with an increasingly online world.

Pupils might also experiment with changing a single word, or adding a single new detail, to a prompt to see how it affects the style and mood of the subsequent picture. For instance:

Girl walking to school.

Girl walking to school in the rain.

Girl walking to school through the set of Singing in the Rain.

Girl walking to school as an Escher painting.

Small details can radically change the output; an activity like this one will help pupils develop precision in their language choices. Consider a homework exercise or class activity in which pupil’s explore a writer’s use of language in a phrase; “how different does the picture feel with one word altered” could be a powerful method of engaging pupils in what writers conjure with their word choice, and why it matters.

In conclusion

Generating images with AI is sometimes challenging to get right, but it’s often entertaining too and, as with text-based tasks, perseverance is key. Even better, it also offers a novel way of engaging in the bigger questions around how our ability to wield language shapes our view of the world.

The Beginner's Guide to ChatGPT Part 2: Image Generation

Recent Posts