Garin Curtis

Beyond Prompts: Developing An Expressive Tool for Real-Time Manipulation of Diffusion Models through Active Divergence

Abstract

This paper introduces a novel approach to real-time manipulation of diffusion models by integrating network bending techniques into the 2D Conditional U-Net model within the StreamDiffusion pipeline. Leveraging the flexibility of TouchDesigner, I developed an interactive tool designed for seamless integration into artistic workflows, enabling users to manipulate generative outputs dynamically and expressively. Unlike traditional text-to-image models, my tool facilitates open ended exploration of the latent space, producing a diverse range of outputs that actively diverge from the training data. Through artistic experimentation, I demonstrate the tool’s ability to gener ate outputs ranging from subtle enhancements to abstract transformations, unlocking new creative possibilities. This research provides a foundation for advancing real-time, artist-driven interaction with generative AI models, bridging the gap between technical innovation and creative expression.

Introduction

Presently, diffusion-based generative tools for image creation, primarily focus on text-to-image functionality offering limited creative control beyond the initial input prompt. As noted in Ko et al. (2023) a fundamental limitation of text-to-image models arises from the text prompting mechanism itself, which inherently restricts the degree of control users can exert over the output. As a result, generated images often resemble the training data too closely, providing minimal novelty or divergence from the source material, as noted in research on the balance between generative deep learning and computational creativity (Berns and Colton, 2020). Current models emphasise replicating training data accurately, limiting their potential for creative exploration and expression. Aside from setting basic parameters before inference, these tools lack real-time interaction capabilities, restricting creative flexibility for digital artists. Real-time interaction is particularly important for artists practicing in live performance, improvisation, or iterative creative processes. Furthermore, existing techniques to introduce active divergence, such as those described in Dz wonczyk et al. (2024) for diffusion-based generative models remain complex, and require expert knowledge of system architecture for manipulation. Accessible user interfaces that enable expressive, real-time control over the model’s outputs are lacking, limiting their applicability for digital artists unfamiliar with these approaches for creative experimentation. Thus, more accessible interfaces are needed to allow real-time manipulation of parameters within the neural network itself, enabling users to achieve novel and interesting outputs. Currently, such tools are available only for Generative Adversarial Networks (GANs) (Goodfel low et al., 2020), like StyleGAN (Karras et al., 2021), and have been integrated into user-friendly interfaces such as AutoLume (Kraasch and Pasquier, 2022) and StyleGAN-Canvas (Zheng, 2023). However, a survey of current research indicates that no equivalent tool exists that extends these capabilities to diffusion models like Stable Diffusion. Furthermore, network bending remains largely unexplored in the context of diffusion models. As such, my work also serves as an opportunity to investigate the underlying mechanisms of diffusion models in greater depth. To address these limitations, I first applied novel network bending techniques (Broad et al., 2021) to the state-of-the-art StreamDiffusion pipeline (Kodaira et al., 2023), originally designed for real-time interaction via image-to-image webcam input. Next, I developed a prototype system within TouchDesigner (Derivative, 2024), enabling expressive, real-time manipulation of these network bending techniques through a more intuitive and accessible interface. Finally, I conducted artistic experiments using the tool to evaluate the outputs of these techniques and demonstrated 1 their practical application in a real-world scenario through a live audiovisual recording, showcasing its potential for dynamic creative expression.

The key primary outcomes of my research are:

• Introduction of novel network bending techniques to the 2D Conditional U-Net model, lever aging the StreamDiffusion pipeline to enable real-time, expressive manipulation of generated images.

• Development ofa user-friendly tool within TouchDesigner, designed for seamless integration into existing workflows with open access to the community, making it available to anyone familiar with the software.

• Production of a diverse range of creative outputs through artistic experimentation that ac tively diverge from the training data, highlighting the tool’s potential for innovation in digital art.

• Alive audiovisual recording generated through the practical utility of this tool by employing it in a real-world scenario and showcasing its dynamic capabilities.

Garin Curtis

Beyond Prompts: Developing An Expressive Tool for Real-Time Manipulation of Diffusion Models through Active Divergence

Abstract

Introduction

Interface

Home