Some Notes on Generative AI

 It's been some time since I have made notes on my "Gen AI" adventure. Here is an update in no particular order:

THEORY

The current wave of AI is based on "Large Language Models" (LLMs). Roughly speaking, these are models of the human language and how words (tokens) relate to each other. The LLM reflects the vocabulary of the particular AI application.

Many databases of pictures are "tagged" or described in words the AI "understands" because the words are in the LLM. To this, add the magic of graphics cards, and you will have the ability of AI to find pictures that "look like" any given picture. Given these pictures, you have the associated words or tokens, which can then be matched to the "prompt." Ultimately, you have a two-way conversation about pictures mediated by the AI's vocabulary (the LLM). The key idea is that the words AI uses to describe a picture are automatically part of its LLM, so they can be used to create a picture the AI understands. It adds up to a vocabulary of words and pictures like a magic mind.

There are many ways of measuring the capacity of the human brain and the capacity of an LLM. Roughly speaking, they are on the same order of magnitude. The human brain "cheats" by forgetting almost all of our experiences and turning them into a kind of LLM - a compressed memory of the real world.

You can also think of it this way: in 2024, human language (English, for example) will be the programming language. 

Talk to your AI. Get to know it.

ORGANIZATION

"An artist buried under hundreds of paintings"
Microsoft Designer (BING)

I am not alone in generating tons of AI pictures. These need to be saved in some kind of structure, hopefully in the cloud, without downloading.

Microsoft Designer allows you to tag pictures in one or more categories. Images are saved if and only if you tag them. I would not be surprised if Microsoft makes this a paid option. It will probably be worth it.


Microsoft's facility to store projects. Note the long list of categories down the left.

You can go back to see the original prompt, which is a huge benefit in the long run since eventually, you start mixing and matching ideas.

Tengr.ai provides something similar but without categories:

Tengr.ai saved prompts with pictures.
Tengr.ai is a paid service.

Other generators have similar capabilities, but their pictures are usually not worth saving.

MODERATION

Microsoft is heavily "moderated," meaning many prompts are rejected with no comment other than "blocked." This is annoying since the internet, in general, is Wild West. Tengr.ai is not American, so it's a generator for grown-ups.

Moderation is the curse of Perplexity.ai, which is otherwise fantastic. That means I can't get an AI-generated description of any pictures with a hint of naughtiness, which is annoying since naughtiness is one of my special interests. 

PATHWAYS

I have been using several pathways to create pictures with AI.

Inspiration

Text

With applications like Microsoft Designer, you can go straight from a vague idea to a great picture. For example, Haikus and short prompts like "woman Klimpt" produce great results.

There is a cottage industry for beginners in prompt generation, offering advice and packages of prompts. Some even consider prompts to be subject to some kind of "ownership," which is beyond the scope of this article. Even so, direct prompts can be fun and always produce surprising results. An endless supply of these can be found on Facebook or, if you insist, buy or obtain a list of a few thousand good prompts.

One issue with prompts is the context window, which can be small with Microsoft. For a more extended prompt, you must prepare it in a text editor and dump it into Designer.

Tengr.ai allows for very long prompts, but it can be easy to lose track of what exactly is influencing the end result.

Slightly naughty prompts are useless with Microsoft but may work with Tengr.ai.

Pictures

Perplexity.ai can turn pictures into text. You can ask a question like "Make a prompt for dall-e to do x" or feed the description into a generator like Tengr.ai. "Moderation" can be an issue.

Any picture will work, including anything taken from the Web, anything taken with a camera in "real life" or pictures taken as snapshots in Second Life.

Generation

I have mainly used two generators: Microsoft and Tengr.ai. Others, including Playground.ai and NightCafe, provide similar capabilities at similar costs. Microsoft seems "smarter" and creates more "artistic" interpretations if you can get by the moderator. Tengr.ai is for grownups and can swallow huge prompts.

Tengr.ai can also move from picture to picture. For example, if you have a picture with a nice pose and a general background, you can drastically change the model.

Disk Storage

Some processes require the picture to be stored on disk en route to the generator. Others can take the prompt as text directly to the generator. Even so, I now have 4,000 images on my local hard drive. That drive has 4T of storage, so it's not an immediate concern, but it's tempting to roll it out to the cloud to avoid exceeding my local backup capacity. The wider question is: why store any of it? This is the same question with pictures in general. These should be brutally pruned. There are only so many dog pictures you need...

Generative AI is a particular case. Pictures without prompts are of little value. The prompts are back with the generator. The bottom line is to use the generator's filing system and be prepared to pay in the long run. That's similar to Flikr. At some point, there is no point in trying to back it all up. Apple automatically backs up our pictures for a negligible cost, and if you insist, you can park your Generative AI stuff in iCloud or Microsoft OneDrive.

But ask what you will do with all this stuff. Ultimately, it will come back and cost you time to manage.

EXAMPLE 1: SOURCE - SECOND LIFE - GENERATOR TENGR.AI


Captured from Second Life

Perplexirty.ai

Perplexity.ai not only describes the picture, it suggests follow-ups and answers questions. This is the description I fed to Tengr.ai:


The image depicts a stylized virtual scene featuring a blonde woman sitting on a large brown couch in a living room setting. The woman has a youthful appearance and is wearing a colorful, patterned dress with various shapes and designs. She is seated casually on the couch, which is adorned with decorative pillows, including one with a crab motif.
The background consists of a textured wall resembling marble or stone, adding a touch of elegance to the scene. The floor is covered with a patterned rug, complementing the overall aesthetic.
The image has a whimsical and artistic quality, suggesting it may be a digital rendering or a scene from a virtual environment. The vibrant colors, patterns, and stylized elements create a visually appealing and imaginative atmosphereElegance and Sophistication
The smooth, uniform texture and neutral color palette of the marble or stone wall create a sense of elegance and sophistication. The luxurious appearance of the wall contrasts beautifully with the patterned jumpsuit and decorative pillows, adding a touch of refinement to the overall composition.
Visual Interest
The juxtaposition of the solid, durable marble or stone wall against the vibrant patterns and colors of the other elements in the scene creates a visually striking and interesting contrast. This interplay of textures and materials adds depth and dimension to the image.
Timeless Appeal
Marble and natural stone are timeless materials that have been used in architecture and design for centuries. The presence of the marble or stone wall lends a classic and enduring quality to the overall aesthetic, making it feel both contemporary and timeless.
Cohesive Design
The marble or stone wall serves as a neutral backdrop that allows the other design elements, such as the patterned jumpsuit and decorative pillows, to take center stage. At the same time, its luxurious appearance complements and enhances the stylish and fashionable aesthetic of the scene, creating a cohesive and harmonious design.
Sense of Luxury
The use of marble or natural stone as a wall covering is often associated with high-end,

Along the way, Perplexity provides some education in interior design and photography.

 

 Generator

Tengr.ai provides many options for the picture. The following is typical:

One of the unlimited interpretations by Tengr.ai

The following rendition suggests changes to the Second Life sky box and the model's dress. AI will likely have better "taste" than the Second Life sky box builder.


Sending the picture back through Tengr.ai's picture-to-picture option, we can produce drastic changes but preserve the pose:


Of course, drastic changes will totally change the set of pictures that AI draws upon, often in unexpected ways. AI is not resisting your suggestions. It is trying to find pictures in its "Vocabulary" that match what you ask for. Unusual requests produce unusual images.

A minor note: pictures that needed to be locally stored went to a folder I use for stuff that doesn't need to be kept once the idea has been tested. Of course, all images are saved by Tengr.ai or Gyazo for the original.

EXAMPLE 2 - PICTURE FROM FACEBOOK (NO PROMPT)

Perplexity describes the picture as:

The image is a striking artistic photograph that captures a dancer in a whirlwind of flowing, translucent fabric. The dancer, a woman with long hair, is positioned in the center of the frame, her body arched backwards in a graceful pose. Swirling around her in a spiral pattern are numerous strands of sheer, gossamer-like material, giving the impression of movement and energy. The fabrics appear to be catching the light, creating a shimmering, ethereal effect.
The background is a warm, earthy tone, providing a contrasting backdrop that allows the dancer and the swirling fabrics to stand out prominently. The overall composition is dynamic and captivating, with the dancer's form and the flowing fabrics creating a sense of fluidity and motion frozen in time.
While the dancer's face is not clearly visible, the image conveys a sense of elegance, beauty, and the ephemeral nature of dance and movement. The photographer has skillfully captured a fleeting moment, transforming it into a visually stunning and thought-provoking work of art.The background is a warm, earthy tone with hints of autumn foliage, providing a contrasting backdrop that allows the dancer and the swirling fabrics to stand out prominently. The overall composition is dynamic and captivating, with the dancer's form and the flowing fabrics creating a sense of fluidity and motion frozen in time.
While the dancer's face is not clearly visible, the image conveys a sense of elegance, beauty, and the ephemeral nature of dance and movement. The photographer has skillfully captured a fleeting moment, 

Tengr.ai provides (among many others):


 With fancy poses you always need to count legs and fingers.

EXAMPLE 3 - CLASSIC PICTURE TO PICTURE







Comments

Popular posts from this blog

Facebook and Bing - A Killer Combination

A Process ...

Warp Speed Generative AI