Garin Curtis
Beyond Prompts: Developing An Expressive Tool for Real-Time
Manipulation of Diffusion Models through Active Divergence
Abstract
This paper introduces a novel approach to real-time manipulation of diffusion models by integrating network bending techniques into the 2D Conditional U-Net model within the StreamDiffusion pipeline. Leveraging the flexibility of TouchDesigner, I developed an interactive tool designed
for seamless integration into artistic workflows, enabling users to manipulate generative outputs
dynamically and expressively. Unlike traditional text-to-image models, my tool facilitates open
ended exploration of the latent space, producing a diverse range of outputs that actively diverge
from the training data. Through artistic experimentation, I demonstrate the tool’s ability to gener
ate outputs ranging from subtle enhancements to abstract transformations, unlocking new creative
possibilities. This research provides a foundation for advancing real-time, artist-driven interaction
with generative AI models, bridging the gap between technical innovation and creative expression.
Introduction
Presently, diffusion-based generative tools for image creation, primarily focus on text-to-image
functionality offering limited creative control beyond the initial input prompt. As noted in Ko
et al. (2023) a fundamental limitation of text-to-image models arises from the text prompting mechanism itself, which inherently restricts the degree of control users can exert over the output. As a
result, generated images often resemble the training data too closely, providing minimal novelty
or divergence from the source material, as noted in research on the balance between generative
deep learning and computational creativity (Berns and Colton, 2020). Current models emphasise
replicating training data accurately, limiting their potential for creative exploration and expression.
Aside from setting basic parameters before inference, these tools lack real-time interaction capabilities, restricting creative flexibility for digital artists. Real-time interaction is particularly
important for artists practicing in live performance, improvisation, or iterative creative processes.
Furthermore, existing techniques to introduce active divergence, such as those described in Dz
wonczyk et al. (2024) for diffusion-based generative models remain complex, and require expert
knowledge of system architecture for manipulation. Accessible user interfaces that enable expressive, real-time control over the model’s outputs are lacking, limiting their applicability for digital
artists unfamiliar with these approaches for creative experimentation. Thus, more accessible interfaces are needed to allow real-time manipulation of parameters within the neural network itself,
enabling users to achieve novel and interesting outputs.
Currently, such tools are available only for Generative Adversarial Networks (GANs) (Goodfel
low et al., 2020), like StyleGAN (Karras et al., 2021), and have been integrated into user-friendly
interfaces such as AutoLume (Kraasch and Pasquier, 2022) and StyleGAN-Canvas (Zheng, 2023).
However, a survey of current research indicates that no equivalent tool exists that extends these capabilities to diffusion models like Stable Diffusion. Furthermore, network bending remains largely
unexplored in the context of diffusion models. As such, my work also serves as an opportunity to
investigate the underlying mechanisms of diffusion models in greater depth.
To address these limitations, I first applied novel network bending techniques (Broad et al.,
2021) to the state-of-the-art StreamDiffusion pipeline (Kodaira et al., 2023), originally designed
for real-time interaction via image-to-image webcam input. Next, I developed a prototype system
within TouchDesigner (Derivative, 2024), enabling expressive, real-time manipulation of these
network bending techniques through a more intuitive and accessible interface. Finally, I conducted
artistic experiments using the tool to evaluate the outputs of these techniques and demonstrated
1
their practical application in a real-world scenario through a live audiovisual recording, showcasing
its potential for dynamic creative expression.
The key primary outcomes of my research are:
• Introduction of novel network bending techniques to the 2D Conditional U-Net model, lever
aging the StreamDiffusion pipeline to enable real-time, expressive manipulation of generated
images.
• Development ofa user-friendly tool within TouchDesigner, designed for seamless integration
into existing workflows with open access to the community, making it available to anyone
familiar with the software.
• Production of a diverse range of creative outputs through artistic experimentation that ac
tively diverge from the training data, highlighting the tool’s potential for innovation in digital
art.
• Alive audiovisual recording generated through the practical utility of this tool by employing
it in a real-world scenario and showcasing its dynamic capabilities.
Interface
All rights reserved by Garin Curtis