Making pebbles

Since a couple of months I’ve been wresting with and discovering the power of generative AI. My aim is to create custom models (LoRA’s) for image generation based on the fundamental properties formulated by design theorist and architect Christopher Alexander. The images serve as sketches for ceramic tableware such as cups, bowls and dishes. This tableware should support states of resonance in the user while eating lunch. (More on all these subjects later).

In this post I want to share some successful results from my Pocket Project* Making pebbles. In this project I create and use a dataset with pebbles and descriptions to create photo realistic images of pebble-like tableware. Pebbles adhere to the fundamental property of Simplicity and inner calm.

Fig 1. Part of my pebble collection and example for dataset image.

I have a large collection of pebbles collected over many years (Fig 1.). On day one I photographed all the pebbles in house, resulting in 63 images (Fig 2.).

The second day I asked ChatGTP to generate detailed captions (descriptions) of the pebbles. I also looked at automated generation but this was either to general or the focus was on the wrong elements. I need very detailed descriptions of the pebbles: their shape, structure, distinguishing features and colour. Only then can the model use these details in new images. Some of the captions were very poetic: A pebble with a compact, rounded triangular geometry, softly domed and grounded. The form feels stable and inward. Its surface is smooth and matte, warm brown in tone, marked by a lighter diagonal band that crosses the body, adding visual interest while remaining fully integrated into the overall coherent form.

Fig. 2. All pebble images of the dataset.

The third day I explored a new development environment. I was using ComfyUI at Runpod but I found the training and inference process lacking in transparency. Above all Runpod had kept me waiting for an available GPU for many hours, even in the mornings. I was fed up with it. I settled on vast.ai as a GPU cloud service. The pricing is good, they support the templates I need and the availability of the European servers is much better.

The fourth day I spend figuring out Kohya-ss, a set of scripts with a graphical user interface for training a custom model. It uses Stable Diffusion, a set of models for training and inference of images and text, as a base for working with your own dataset. The installation of the interface was easy, it was available as a template on vast.ai. But understanding all the settings and folder locations kept me busy for a day…

Fig. 3. Exploring the checkpoints, I chose checkpoint 8, third from the right.

On day five I could start the actual training with my dataset. In Kohya-ss it is easy to control every step of the training. I generated nine checkpoints, different stages in the training to assess the amount of training. Undertraining may result in sloppy, unrealistic images. Overtraining often generates artefacts. It is very helpful that you can let the program generate images during training to see how it is coming along. I chose the best checkpoint using a handy script which makes it easy to compare the checkpoint outputs (Fig 3.).

Fig. 4. Porcelain cup-like shapes from the same seed.

On day six I started the inference process, the actual generation of images. To do that I needed another set of scripts with a graphical user interface: Automatic1111. I installed this template on another instance on vast.ai. I asked ChatGTP to supply prompts, start settings and test protocol for generating images of the tableware items. This went surprisingly well (Fig 4.). A1111 creates images and provides an overview which contains the prompt, all the settings and the seed used. The seed is a random number attached to the specific image or set of images. It allows you to generate the exact same image. Generating interesting images of dishes was the hardest. The shape of a dish is very far removed from the morphology of a pebble. I needed to be much more explicit in the prompt. I ended up with a very useful set of images and a proof of concept of my new workflow (Fig. 5).

Fig. 5. Examples of a bowl, dish and cup generated with my own LoRA.

* A short one to seven day project with a clear and reachable aim, view the manifesto for more information.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.