AI Weeknotes #01

Getting the reps in!

kaigani
5 min readAug 23, 2023

This past week I moved into the next phase of my R&D into using AI tools for media production. For most of this year, I’d been experimenting with the various ‘off-the-shelf’ tools: Midjourney, RunwayML, ElevenLabs. However, given the pace at which new breakthroughs happen weekly, you quickly find yourself wanting to experiment with the cutting edge techniques, and not wanting to wait until they are assimilated into future releases of mainstream tools — especially realizing that most of these tools are simply putting an easier to use wrapper around open source software and then charging a subscription fee.

As I posted on LinkedIn, it was time to up my game when it comes to moving ahead on the AI learning curve. I still feel about 3–4 months behind the leading YouTube channels I watch for tutorials, and then beyond that I hope to leverage my (dusty) Computer Science background to dig deeper into the fundamental research and software that is driving so much of the change we see.

I was recently listening to the No Film School podcast interview with Alex Buono, and they spoke about this concept of ‘getting the reps in’ — hence my overly buffed deepfake above. You can learn all of the theory, Alex was talking about directing, but it equally applies to keeping up with the latest AI tools, but what really matters is doing. Growing in expertise by repetition and experimentation. In Alex’s case, SNL let him experiment with making short films on a weekly basis. My takeaway from that is — it’s easy to be intimidated by the new AI papers being published weekly, but you’ve got to just get going and learning incrementally.

So these weeknotes are tracking what I’ve been experimenting with in AI. Right now, I’m still behind the curve of the flashy demos you’ll see on YouTube, but soon you’ll see an inflection point and hopefully I’ll be making some exciting stuff too.

Stable Diffusion

Stable Diffusion is like the open source Midjourney. It’s the model that most of the image & video experimentation is using. This is my basic technology ‘stack’ to get started:

  • Pinokio — In its early development, but with a responsive dev who is active in the Discord. Pinokio is aiming to become a 1-click install for all the cool AI tools out there, installing them and spinning up their web apps.
  • Automatic1111 (A1111) — This is the popular web app for running Stable Diffusion and all the various extensions and plug-ins. I installed it with Pinokio.
  • ControlNet — This is a pretty important add-on to Stable Diffusion, and can be managed from A1111. As implied from the name, it gives you more precise control over the composition of your images.
  • Civitai — An essential repository of community created ‘add ons’ to Stable Diffusion. Using Stable Diffusion ‘out of the box’ will not give great results, but the community has used various techniques to fine-tune the model for better outputs.

Be warned though, that the subscription services keep you away from the bleeding edge of technology, meaning the price you pay is the frustrating time spent on troubleshooting when things go wrong. I went down a rabbithole for half a day trying to figure out how I crashed A1111 by installing an outdated extension. ChatGPT actually helped me piece together how to solve the problem.

Face swapping

The first thing I tried, since it was pretty easy to get up and running was Roop, an extension for A1111 that lets you swap faces from a single image. This is the poor man’s version of training a model to reproduce a person’s face using multiple images, but that’s a more advanced technique I’ll try out in the coming weeks. I just used Roop to have some fun as a reward for setting all of the other tools up.

I made the pic of myself above from a text-to-image prompt, and did this face swap of my wife as Hera from Marvel’s Thor film as an image-to-image.

Upscaling

After that, I was ready to experiment with more advanced techniques that would involve ControlNet. Upscaling is an important technique because many of the initial images produced by Generative AI won’t be very high resolution.

There are some cool ways to experiment with it as well. I wanted to see if I could take some old games, with outdated graphics, and make them much more realistic.

Here are the results:

As you can see, details are still difficult to pin down with this technology, but I’m still pretty impressed by these results. Especially when you see just how much detail is possible.

Next up: Video

This is what I’m building up to, but you have to understand how to get good image results before you can even start to think about video.

Hope you enjoy following me on this journey!

Get this via my newsletter: Kai on AI

--

--