An Experiment in Generative AI: Craiyon (Formerly Dall-e Mini)

Dall-e mini, recently renamed Craiyon due to its resemblance to the unrelated Dall-e created by OpenAI, became an internet phenomenon in recent months for its image generation. For those unaware of the technology, users can type a hyper-specific prompt into the image generator, and the AI generates something similar to the image terms.

What It Is

Craiyon, formerly Dall-e mini, is a text-to-image generator created by Boris Dayma originally for a coding competition. He took inspiration from the same technology as OpenAI, wherein this software was created through a machine learning algorithm that was trained on existing images. This means that the algorithm was fed a series of images and was taught how to discern its elements through text. The AI is trained on an immense amount of visual material as well as Natural Language Processing, meaning it can discern and connect language and its visual suggestions. Between the work of Dayma and open source AI communities on Twitter and GitHub, the technology became refined enough to produce recognizable images that gained traction on the internet.

Overview

This review is solely for the browser version of this software. The mobile app is currently only available for Android users. Additionally, the use of this software is being tested on the latest version of chrome on MacOs at the time of this article (105.0.5195.102).

A quick internet search typing in “dall-e” yields search results for a webpage, (pictured below) which also notes that the software will be moving to Craiyon.com.

Figure 1: Craiyon landing page. Screenshot by author.

From here, the software offers an intuitive process.  The user is cued to type a phrase into the prompt. The user can then either press the orange icon resembling a crayon on the screen or hit enter on the keyboard. It is not instantaneous, thus the user must wait for the image to generate. 

The webpage also provides a FAQ (pictured below) that answers a variety of user questions. At the bottom of the webpage, the site provides two email addresses as a point of contact, a newsletter sign-up, a donation button, as well as social media links.

Figure 2: FAQ section of Craiyon webpage. Screenshot by author.

Experiment 1: Simplicity

First, I decided to test the AI with a simple command, “cat.” The screen noted that the request would take about two minutes, but the images generated a bit faster than that.

Figure 3: Images generated with the prompt “cat.” Screenshot by author.

The image presents a series of almost-cats, not exactly anatomically correct, particularly regarding facial features. Although the faces are objectively incorrect, details on some of the images are generated fairly well, such as the fur of some and the facial structure of others.

Experiment 2: A Little Extra

Next, I wanted to see how the integrity of the cat would hold if a second element were added.  I typed the prompt “cat in a bed.”  I decided on this prompt because it requires two simple elements, as well as the interpretation of one object being placed on another.  Again, I was given the prompt that the generation should not take long, with a timer on the top right corner of the screen.  The results actually took about a minute.

Figure 4: Image generation loading screen. Screenshot by author.

The new images were not much different than the previous images, although some cat generation integrity was sacrificed in order to make a vague bed background. However, this indicates that the AI can distinguish the intention of a request, or at least that of a simple one.

Figure 5: Images generated with the prompt “cat in a bed.” Screenshot by author.

Experiment 3: Dabbling in Verbs

To observe how the AI holds up with added complexity, I used of the same nouns but added some complexity by including a verb. I wanted to include some form of action and direction to see how the AI captures an extra layer of nuance.  Thus, I chose to write “cat jumping into a bed.”

Given the small increase in complexity, Craiyon was unable to retain the realism of a cat’s structure, and even that of the bed in some images. Regardless of realism, the images do provide the grounds for some typical internet humor.

Figure 6: Images generated with the prompt “cat jumping into a bed.” Screenshot by author.

Experiment 4: Fine Arts

I would be remiss to not test visual artwork with this software, as AI art is now found in all corners of the art world.  Therefore, Craiyon was tested by putting in the title of three different famous artworks of differing styles: Girl with a Pearl Earring, by Johannes Vermeer; Composition with Red, Blue and Yellow, by Piet Mondrian; and Guernica, by Pablo Picasso.  This is to not only test its re-creation abilities but whether the prompt attempts to recreate the artwork that is being referenced.

Girl with a Pearl Earring

The Image generator understood exactly what was being asked and delivered a good theoretical replication of the image.  However, realistically, it would not pass as a duplicate. It is also important to note that similar images were generated when the prompt was written in full lowercase.

Figure 7: Images generated with the prompt “girl with a pearl earring.” Screenshot by author.

Figure 8: Girl with a Pearl Earring. Source: Wikipedia.

Composition with Red, Blue and Yellow

Craiyon had a difficult time with this prompt.  While a still very well-known work of art, the actual title of the piece doubles as a curious request in this context.

Figure 9: Images generated with the prompt “Composition with Red, Blue and Yellow.” Screenshot by author.

Figure 10: Composition with Red, Blue and Yellow. Source: Wikipedia.

Due to this, I used the same prompt but added “Mondrian” to the end of the phrase to see if the AI could actually distinguish the ambiguity of the phrasing. 

Figure 11: Images generated with the prompt “Composition with Red, Blue and Yellow mondrian.” Screenshot by author.

In doing so, a much more passable recreation of the work was generated.

Guernica

The AI did a passable job at this painting.  While of course it does not provide nearly the level of detail that exists in the image, the general structure of the work is present and easily discernible.  Another interesting result of this query is that unlike the previous images, each generated image of Guernica seems to present it as a work in a museum or hanging in a room.  This may be a result of the odd dimensions of the work itself.

Figure 12: Images generated with the prompt “Guernia.” Screenshot by author.

Figure 13: Guernica. Source: Wikipedia.

Experiment 5: The Final Frontier

While it is interesting to see how machine learning responds to simple commands, complex requests, attempts at re-creation, and how those may be refined in the future, it is equally as interesting to see how the internet is taking advantage of the software. Below is a query that felt appropriate in terms of the internet memes of the world.

Figure 14: Images generated with the prompt “cat crying and playing a very tiny violin.” Screenshot by author.

Final Thoughts

While Craiyon is impressive in its ability to recall and generate images, even the most simple requests do not result in realistic images. Gleaning insight from my experiments, the algorithm has some difficulty discerning requests that are abstract in nature yet specific. Additionally, the more complex the query is, the less accurate the image becomes to real life. While more powerful tools, such as Dall-e, are not yet available to the general public, Craiyon is still an impressive generative AI software, and at the very least, an amusing pastime for casual users.