Sunday, May 05, 2024 | Shawwal 25, 1445 H
few clouds
weather
OMAN
30°C / 30°C
EDITOR IN CHIEF- ABDULLAH BIN SALIM AL SHUEILI

The image in the imagination of artificial intelligence

The ongoing digital evolution promises more breakthroughs in AI-driven image generation, making it widely accessible and capable of bringing any imagination to life.
minus
plus

Some of us try to translate our thoughts into images by imagination, and others — especially in childhood — attempt to turn these thoughts into pictures drawn on blank papers, hoping these visuals can aid their deeply held wishes or desires to come true. Thus, they find their possible solace through the creativity of converting these emotions and thoughts into still images that in their moment, transport them to a lost world.


This wish is no longer confined to children's scribbles and artists' innovations but has become possible through artificial intelligence tools and models capable of generating images in the same manner, "imagination."


This article will focus on one of the most important image-generating tools used in artificial intelligence, known as DALL·E which is part of the ChatGPT 4 model. I will introduce the reader to this smart tool, its working mechanism, capabilities, and personal experience after using it, which could highlight its strengths and weaknesses.


OpenAI first introduced the DALL·E in 2021, making it part of the smart tools associated with ChatGPT 4, which recently included other tools, some of which are considered subsidiary under the umbrella of main tools like DALL·E. Currently, there are about a dozen subsidiary tools related to DALL·E, all concerning image generation, sharing a general function of generating images. The name DALL·E is derived from the Spanish surrealist painter Salvador Dali (1904-1989), a prominent figure in the school of surrealism, and also from the character WALL-E created by Pixar, a company specialised in computer-animated productions.


DALL·E employs a deep learning algorithm, which relies on mathematical systems to assist in its crucial stages, particularly the "learning or training" phase. One of the key systems is BackPropagation, along with the cost Function or loss function, which evaluates the training capability of the algorithm on the given data and measure its fit and suitability. These are general principles shared by most artificial intelligence algorithms based on deep learning. This article will delve into the working mechanism of the DALL·E tool, where the training phase is one of the first and most crucial tasks undertaken by the algorithm. Here, training on a wide variety of images allows the tool to understand diverse image patterns and imagine their artistic aspects, closely resembling the human brain's mechanism in imagining and mentally generating images, both awake and in dreams.


The digital system core of DALL·E consists of an auto-encoder, which branches into two systems: the encoder, which receives images, and the decoder, which generates images after decrypting their digital codes. This main system, relied upon by DALL·E is similar to the Transformer system used by the ChatGPT model in generating texts. During the training phases of DALL·E the transformer system intervenes by transferring the descriptive text of images -described by humans to the model in text form- into another input that aids the DALL·E in imagining and generating images. This explains the significant computational capacity required by the DALL·E tool compared to the lesser computational capacity needed for the text-generating model.


My exploration of the DALL·E tool wasn't driven by a passion for art, but a curiosity about its algorithm and image generation capabilities. I experimented with various interaction methods, providing descriptions in both Arabic and English, varying from vague to highly detailed, and requesting modifications post-generation. These experiments revealed the tool's strengths and limitations. It effectively translates detailed descriptions into realistic images, but struggles with vague ones and sometimes misses specific features despite modification requests. These challenges are typical for such digital models and are expected to improve with future advancements.


The ongoing digital evolution promises more breakthroughs in AI-driven image generation, making it widely accessible and capable of bringing any imagination to life.


Dr Muamar bin Ali al Tobi, The writer is an academic and researcher


SHARE ARTICLE
arrow up
home icon