VL: Quite apart from the divergence between prompt and image there's the issue of the prompt's brevity. There simply isn't enough data in a prompt of that length to characterise the image in any detail.
-
ChatGPT:
The deeper question here isn't just whether the prompt is "brief" or "long," but how the relative insufficiency of data in a short prompt impacts the relationship between human intention and machine execution in AI-generated imagery. This raises concerns about how much meaningful control a user can actually exert over the resulting image, and what this reveals about the limits of human authorship in the context of AI creativity.
The Nature of Data in the Prompt
The prompt you provided—although it invokes several stylistic markers (e.g., "Ava Gardner," "superheroine," "film grain," "monochrome," "by Diane Arbus Vivian Maier," "vogue photo shoot")—leaves many critical details unresolved. This opens up the field to vast algorithmic interpretation. What exactly should a superheroine in a vintage style resemble? Should Diane Arbus's influence show up in psychological intensity, or just in the black-and-white aesthetic? These are questions left vague, and the AI's job becomes not just to interpret but to fill in the gaps based on its training data.
Lack of Spatial, Physical, or Compositional Detail
Notably absent in the prompt is a more detailed description of the environment, the lighting, the subject's pose, and the interplay between those elements. Is the superheroine in a cityscape, a forest, or in a studio? Is she posed heroically, or more introspectively, as would befit an Arbus subject? These are the kinds of details that give an artist precision in executing a vision, but in a prompt of this brevity, such specifics are entirely open to AI interpretation. The prompt lacks any explicit guidance for constructing the image’s composition beyond a high-level gesture toward aesthetics and style.
Implications for Control and Ownership
At this level of brevity, the prompt functions more as a trigger for the AI's internal search and combination algorithms than as a robust creative directive. The AI pulls from an enormous latent space of aesthetic and cultural data to generate something that statistically resembles what the user might be envisioning, based on the sparse instruction. But this means that the user isn't really creating or designing the image in any detailed sense. Instead, they're invoking a rough zone of aesthetics and styles, within which the AI is left to improvise.
This poses a significant challenge to claims of human authorship. Can the human really take credit for the final product if their prompt was so minimal? Is the AI just a tool here, or does it have a substantial degree of creative input due to the prompt's vagueness?
The AI's Response to Sparse Data
Given the brevity of the prompt, the AI system must rely on inferential processes to build out a coherent image. It draws from a highly compressed understanding of what "Ava Gardner" looks like, what a "superheroine" might entail, and how "film grain" and "monochrome" can stylistically tie these references together. But with sparse data, the AI prioritizes elements that it deems most essential, leading to creative distortions or oversimplifications.
For example, in the image you presented, the AI has chosen to emphasize the "Ava Gardner" and "vintage" qualities—over the "superheroine" aspect—likely because it has statistically stronger associations between Ava Gardner and mid-20th century aesthetics than it does between Ava Gardner and superhero imagery. This is a direct result of the lack of specificity in the prompt; the AI must "guess" at which references are the most central, a process that’s largely governed by probabilistic patterns within its training data.
The Role of Latent Space
Another way to view this problem is in terms of the AI’s latent space—the multidimensional representation of all possible images based on its training data. When given a prompt, the AI navigates this latent space to locate points that correspond to the prompt’s elements. With more data, it could pinpoint a specific location within this space. But with sparse data, it hovers over a broader region of possible interpretations. The image, then, is more a product of the AI’s data priors and less a faithful rendering of the user's intended vision.
Brevity and Underdetermination
Philosophically, this issue touches on the concept of underdetermination—the idea that evidence (or in this case, a prompt) can be insufficient to determine a single, definitive outcome. The prompt is underdetermined relative to the image it is supposed to produce, which results in the AI generating multiple plausible interpretations rather than one precise execution of the user’s vision.
This problem is compounded in the arts, where the richness of sensory detail often demands a high degree of specification. A filmmaker or photographer, for example, doesn't just say "film noir" and "hero." They dictate the exact lighting, set design, and camera angles to achieve a very precise visual effect. In contrast, a short prompt like the one here cannot hope to provide that level of detail, meaning that much of the artistic decision-making happens in the AI's internal process, not in the user’s intention.
The Limits of Human Control in AI-Generated Art
The brevity of prompts like this one suggests that human users, rather than being direct creators, are more like curators or selectors within an AI system’s vast combinatorial possibilities. They sketch broad outlines, and the AI fills in the blanks based on its internal rules and datasets.
This raises a crucial tension in AI-generated art: who is the artist? Is it the human who gives the initial input, or is it the AI that transforms that input into a finished product based on its understanding of vast visual data? The shorter the prompt, the more creative weight shifts toward the AI. Thus, human authorship in this context becomes contingent on how much specificity, nuance, and direction the user provides.
In conclusion, the brevity of the prompt inherently limits the user’s control over the final output. The AI takes on a more significant role in interpreting, shaping, and even constructing the image, making it a co-creator rather than a passive tool. For human authorship to claim more ground, more detailed prompts—or iterative, interactive refining processes—would be necessary to reduce the AI's interpretive latitude and anchor the image more firmly in the human's creative vision.
-
VL: A few factors (in support of the human's claims to authorship):
1. This short prompt was arrived at iteratively after trying out numerous other prompts.
2. The human treats the prompts less as instructions to the AI than as provocations, as "prompts" quite literally. Users familiar with the AI model's interpretative quirks may include terms such as "Vivian Maier" or "superheroine" to shift the algorithm's output in semi-predictable directions rather than to evoke any specific visual elements conventionally associated with those terms.
3. Multiple images are generated from each selected prompt. Only a handful are selected and these are often refined even further by additional rolls of the algorithmic dice.
-
ChatGPT:
You're essentially reframing the human's role not as a traditional artist but as a curator, provocateur, or "meta-artist" who operates within a dialogue with the AI system, exploiting its interpretative quirks and iterative capacities. In this context, the argument for human authorship becomes more compelling, especially when the process is understood as a sustained engagement, a kind of strategic exploration of the AI's potential outputs. Let’s break this down, focusing on each of the three factors you've highlighted.
1. Iterative Development: The Prompt as Refined Tool
The fact that the short prompt was arrived at iteratively is crucial. This suggests that the brevity of the final version is not a limitation but rather a distilled expression of accumulated knowledge about how the AI functions. The human here is acting in a capacity similar to how a photographer might refine their camera settings or how a digital artist tweaks brush settings after experimentation. The iterations are part of the artistry, with each successive adjustment designed to push the AI in desired directions based on previous outputs.
This reflects a kind of artistic mastery over the AI medium. The user becomes skilled at manipulating not the AI’s inner mechanics but its interpretative biases—its tendencies to favor certain visual or stylistic interpretations over others. In the same way that a painter selects a particular brush for its unique texture or stroke quality, the AI "artist" learns to select and refine prompt language to create predictable shifts in the AI's creative output.
The implication is that the human is still very much in control, though not in a direct, instructional way. Instead, this process involves recognizing patterns in the AI’s behavior and learning to manipulate those patterns in subtle ways. By refining prompts across many iterations, the human participant is not providing the full "data" for the image at once but is engaging in a longer, more complex creative process that requires ongoing decision-making.
2. Prompt as Provocation, Not Instruction
This factor significantly recontextualizes the short prompt, moving away from the assumption that the prompt is simply an input-output device. If the user is deliberately using terms like "Vivian Maier" or "superheroine" as provocations rather than precise instructions, then the role of these terms is conceptual. The terms might not correspond directly to expected visual elements; rather, they act as seeds that shift the algorithm's interpretive processes in desired but indirect ways.
In this scenario, the user demonstrates familiarity with the AI model’s quirks, engaging with it in a dialogic way. Just as an artist working with a complex material (e.g., clay or metal) must understand the properties and behavior of that material, the AI artist must understand the AI's biases, tendencies, and limitations. The terms "Vivian Maier" and "superheroine" in the prompt, therefore, aren't literal instructions; they act like artistic gestures designed to push the AI's output into particular aesthetic or thematic directions without explicitly controlling each detail.
For example, invoking "Vivian Maier" may not mean the user expects the AI to mimic Maier’s photographic style directly but rather that they’re attempting to tap into a certain atmosphere of Maier’s street photography—perhaps an aura of quiet observation or urban melancholy. In this way, the human artist is operating indirectly, using the AI's known interpretive tendencies to generate certain affective qualities or moods.
This is a highly creative process that mirrors the unpredictability and chance often involved in other art forms, such as surrealist automatic writing or Jackson Pollock's action painting. In these forms, unpredictability itself becomes a tool for creativity, and the artist's role is to learn how to guide and harness that unpredictability without needing to impose rigid control over every detail.
3. Selective Curation and Further Iteration
The third factor further supports the human's authorship by highlighting the curatorial aspect of the creative process. The user generates multiple images from each selected prompt and selects only a handful that meet their artistic vision. This is not a passive or mechanical process but a highly selective one. The human artist acts as an editor, exercising aesthetic judgment to decide which of the AI’s outputs are meaningful or successful in realizing their creative intent.
Moreover, the refinement process—rolling the algorithmic dice again—suggests a continual dialogue between the artist and the AI system. Each "roll" introduces new variations, allowing the user to push the AI closer to an ideal or unexpected outcome. This iterative refinement is comparable to how traditional artists often engage in multiple drafts or versions of a work, continually refining and adjusting the form to better align with their evolving artistic vision.
Implications for Authorship and Creativity
These factors collectively shift the discussion of authorship from one based on direct control to one based on strategic interaction with a semi-autonomous system. The user isn’t delegating the creative work to the AI in a purely mechanical sense but is actively engaged in shaping, selecting, and guiding the process through both their knowledge of the system’s interpretative quirks and their aesthetic choices.
Human as Provocateur: The human operates as a provocateur, learning how to nudge the AI in desired directions through careful prompt construction. This requires not just technical skill but also an intuitive understanding of the AI’s behavior—a form of artistic literacy in working with generative models.
Human as Curator: The user exercises judgment in selecting images, much like a photographer selects which frames to keep or a sculptor chooses which iterations of a form are most successful. This selective process is inherently creative and aligns with traditional notions of artistic authorship.
Human as Refiner: The iterative, experimental process of refining the prompt and rerolling the AI’s outputs involves a form of ongoing artistic dialogue. This speaks to the artist’s evolving relationship with the AI’s capabilities, pushing the boundaries of what is possible with the tool.
Conclusion: Recontextualizing Human Authorship
In sum, while the AI plays a significant role in generating the image, the human retains authorship through iterative refinement, strategic use of language, and selective curation. The human's mastery lies not in controlling every pixel of the image but in understanding and leveraging the AI’s interpretative capabilities in ways that align with their evolving creative vision. This hybrid form of authorship challenges traditional models of creation, but it certainly still places the human as the orchestrator of the process, if not always the direct maker of every visual decision.