Sunday, April 19, 2026

Agentic Projects - The Infinite Book

I want to talk about a project I started and abandoned a few years back, it was an early attempt at building an application that used generative AI for both text and image elements. I didn't want to write a children's book, I wanted to write thousands of children's books. Anyway, the project failed, but outlined here are some of the steps I took, and some of the lessons I learned.

Early Concept

One of my first stabs at agentic content creation, making stuff will AI, as we called it then, was in September of 2024. I had played with image generation before that, the age of Dall-E 2 and 3, but of course it was terrible. After playing with Stable Diffusion and chat-gpt 4, ideas started forming around content creation, trying to make something real. I had an 8 year old at the time, she was just growing out of young kids books, but I thought I could make her an infinite book, she would be able to turn a page, say what she wanted to have happen next, and see it come to life. This turnout to be too ambitious for the time, but something still came out of it. Let me show you the never published app, Stories Together.

Stories Together - iOS Dashboard

Above you can see the Dashboard view of the iOS app Stories Together, with many of the generated test books on the dashboard of the app. I never did the get the layout right, but you can see the cover page for each book. Selecting an image, opens the book to pages like this:

A Reasonably Good Page

The idea was simple, I would use Chat GPT (via API) to generate the pages of the text, then use Chat GPT to generate a prompt for Stable Diffusion, and then I could generate the image for each page. The first versions of this failed pretty bad, the story text was often way too long, the prompts for generating the images where too long, almost all of the stories where the same. Plus a lot of other stumbling blocks, it was time to start learning how to use AI to generate the stuff that was not terrible.

The first improvements I made were to generate things in pieces, instead of all at once. For example, I would generate the characters for the story first, before getting into the story at all. "Please generate a character for a children's book, if the character is an animal, make it humanoid. Please describe the character in some detail", something like that as a starting prompt, I would generate 2-4 of those as the main characters. Then for each character, "Please describe the this character visually, describe their cloths, their shoes and anything on their head.", this would produce a piece of text I could use to anchor the character. The stuff about the shoes and the their head became critical later, as these descriptions would force Stable Diffusion to draw their feet and head, insuring a picture of the entire character, and not a cut off. This type of tweaking early in the process was required to get any kind of consistency in later generation steps, text or image.

After generating the character text chunks, name, personality, visual description, etc. I then used Chat GPT to generate a story idea, an ethos for the story as a whole, again, something to anchor the coming page text generations. Once the character and text was generated, I could then combine the visuals description of the characters with the text of the page and generate a prompt for Stable Diffusion. Since I included details about character, there was a chance that I would get images with an identifiable character in each image. For you see, if you ask Stable Diffusion for an image the a cat name fluffy driving a car, and an image of a car named fluffy buying bread, you will get two very different cats driving the car and buying the bread, even if you clamp down on the style of the image.



In the above picture you can see the hero's of the story, a raccoon, a tortoise, and a bear, doing something with balloons. Believe it or not, in these two pictures, they are meant to be the same raccoon, tortoise and bear, in some cases it worked, but it often failed with the tools I had had on hand. This was just emerging in the world of AI as the Consistent Character problem, I was hardly the first to run into this issue. There was a host of complex, error prone, and expensive solutions on the market, but this issue was dragging the project down. Though it was not my only issue.

Monetization

I had hopped, when the project started, to be able to make a little something by selling this app on the Apple iOS app store. I was early to market, so I thought I could beat the rush and try and position myself as a name in agentic content creation, at least in the app space. Big dreams. It became very obvious early in the project, that this was going to be a problem. I need to run AI products, Stable Diffusion and Chat GPT, during app development, and was a noticeable expense. It didn't break the bank, but an evening of coding was costing me 10 bucks in API access. How was I going to scale this up to a meaningful number of users, and how could I afford it.

My initial thought for the app, was for it to be a simple, honest, one time purchase. Maybe ten bucks, let the kid go crazy creating stories in the back of the car. I did some quick back of the napkin math, and I realize it would cost be something like 45¢ per book! A bored kid would blow through 10 bucks in tokens in a few hours. I really didn't want to do in-app purchases for 'story tokens' or some such, though for most business, that would be the right move, simply expose your downstream costs to the buyer. But I was not going to do that to parents, have their kid trying to get mom to buy another 10 books for 5 bucks or whatever. I needed a different solution. 

I explored the costs of self hosting, Stable Diffusion (and many-many others) image generation libraries can be run on local hardware, some with surprisingly modest graphics cards. I could buy a couple of decent graphics cards and get the first version out the door. There was always AWS if I needed to scale up quickly. Then came the question of text generation, and I admit this surprised me a lot, there was no way I was going to be able to host an LLM. There were options for running LLMs locally at the time of course, but they were super dumb, I even tried to run some on the iPad, it would never work. 

As I worked on the project, I would twist the project one way or another, trying to cope with the unresolved monetization problem. What if I used a less expensive model? what if we don't give the user any choices? What if I fake new book generation by simply re-using them? Can I 'cache' a story or an image with an index of embeddings? It was starting to fall apart.

End of a Project

Between not being able to get dynamic character consistence right and not being able to come up with realistic monetization path I knew this project was simply too early, at least for my skill set and budget. It was hard to say goodbye to this one, by the end of the project, a I had generated hundreds of characters and read many-many stories about them. Some were pretty good, better than they should have been. There was this weird experience of running the code, generating a story or a character or a page, realizing I need to tweak the prompt, and also realizing, if I hit 'run' again on Xcode, this image or character would simply vanish, it was a little haunting. 

I am writing this a year and half after I put work into this project, and I know I made the right decision, I think the app could techincally work today, by using 18 month old models at a deep discount, but it would look like slop now. the newer models have raised the bar so much, and people are expecting more and more. Ironically, to produce this app with the quality todays audience would except, is still too expensive, the app is still to early in some ways. One of the many contradictions in this early age of AI.





What I leaned

The most valuable part of this project was learning the basics of working with generative AI. I didn't have these words for it then, but I was starting to learn, that the trick to generative AI is to provide guidelines for it to work with. This started to form when I started breaking up the story generation into steps, first the character, then their personalities, the theme of the story, the summary of the story, and then the text for each page. By the last step, the LLM has a bunch of great context to work from and is focused on the task at hand. My tooling has evolved as to how I accomplish this, but the core idea seems to be holding, if you want to play chess with an LLM, give it a chessboard.











No comments: