Meta, the parent company of Instagram and Facebook, has unveiled a new A.I. tool called CM3leon, which the company is billing as a “state-of-the-art generative model for text and images.”
The company announced CM3leon (pronounced “chameleon”) in a blog post as it simultaneously published a white paper about the tool’s technological developments. Meta did not reveal when, and if, CM3leon would be released to the public.
Still, Meta’s research marks a significant breakthrough in the creation multi-modal models, which can generate text and images.
Currently, there is a divide between A.I. image generators and A.I. text generators, like OpenAI’s ChatGPT. Merging the two has been difficult and, though OpenAI released its multi-modal GPT-4 in March, A.I. developers have not had much success.
Meta’s tool breaks down that divide with a model that allows for the input and generation of text and images, and allows for the creation of captions (or image-to-text generation) and images with “super-resolution.”
Most A.I. image generators on the market such as Stable Diffusion and OpenAI’s DALL-E use diffusion models to generate images, a process that adds and removes Gaussian noise to training data.
The company’s process uses a technique called supervised fine-tuning to train text-based transformer models using a dataset of licensed images and captions from Shutterstock, allowing for the parsing of complex text and objects to better follow the inputs of a user.
“Supervised fine-tuning is critical in training large language models like ChatGPT. Despite this, its application in multi-modal settings remains largely unexplored,” Meta researchers wrote in their paper.
The upshot is text-to-image generation resulting in “more coherent imagery that better follows the input prompts,” according to Meta. In its announcement, the company included highly compositional examples the generator produced based on prompts such as “a small cactus wearing a straw hat and neon sunglasses in the Sahara desert.”
Most notably, the model was able to generate a rather realistic human hand, save for a few glitches, which A.I. generators have historically struggled with.
Meta’s CM3leon also boasts better performance in text-guiding editing—that is, using text prompts to direct what the tool should add or remove in an image—than previous models like InstructPix2Pix, due to its ability recognize both text and visual content.
In the blog post, Meta showed off the tool’s text-guided image editing abilities using Vermeer’s Girl With a Pearl Earring (c. 1665) as the initial input and then generating images by using text such as “put on a pair of sunglasses” and “she should look 100 years old.”
Meanwhile, CM3leon is able to do this with a massive reduction in computing power from other transformer models—an achievement that could create greater equity in the A.I. space, one of the many challenges critics have voiced about artificial intelligence. CM3leon uses five times less computing power than similar models.
Meta has received praise online for fully licensing its data set from Shutterstock, a move the company said “demonstrates that strong performance is possible with a very different data distribution from what all previous models used.”
“By making our work transparent,” the blog post went on, “we hope to encourage collaboration and innovation in the field of generative A.I.”
Over the past few months, Meta has introduced a number of generative A.I. features across its platforms, including A.I.-generated stickers for Messenger, an A.I. sandbox for Facebook’s advertisers, and an A.I.-powered video generation system.
More Trending Stories:
Influencers Are Realizing That A.I. Might Not Be a Magic Money-Making Machine For Artists After All