Kristoffer ørum - The trojan image

Look at a generative image with a mirror in it. The figure stands at an angle to the glass, and the reflection should obey a set of rules that European picture-making has held to since the fifteenth century. The orthogonals running from the figure, the room and the mirror frame should converge on a shared vanishing point. The reflected body should sit on the same perspectival grid as the body in front of the mirror, displaced by a predictable geometry. In the AI image they do not. The mirror frame recedes toward one point, the room toward another, the reflected figure occupies a space that cannot be reconciled with the space the original figure stands in. The lines do not converge where the eye, trained by five centuries of single-point perspective, expects them to converge.

This is the kind of failure that the forensic reading of AI images was built to detect. Look at the mirror, count the fingers, check the shadows. The technique presumes that perspectival coherence is the baseline against which the image should be measured, and that deviation from it is evidence of fraud. But the forensic reading of AI images is not a spontaneous public reflex, it is a trained one, and the training has identifiable sources. Journalism and fact-checking infrastructure, Reuters, AFP, AP, Bellingcat, Faktisk, TjekDet, has built a working method around hand-counting, reflection analysis, provenance checking and reverse image search, presenting these techniques as civic hygiene against synthetic disinformation. The C2PA coalition, Adobe, Microsoft, the BBC, camera manufacturers, has industrialised the same impulse upstream, proposing cryptographic signing as a way to restore a verifiable chain of custody for images. Academic and state forensics, the field around Hany Farid, NIST, defence-funded detection programmes, frames the work as scientific reliability in distinguishing synthetic from captured. And below all of this sits a vernacular layer, the count-the-fingers reflex on social media, the subreddits and influencer economies devoted to spotting AI. The stated goals differ, civic, commercial, scientific, self-protective, but a single premise runs underneath them. The photograph, and behind the photograph the perspectival picture, is the standard, and the task is to defend it.

What reaches the ordinary viewer from this apparatus is rarely the technique itself. Few people actually count fingers, examine reflections or study the geometry of a hand. What circulates instead is a moral imperative, a duty to distrust, with the forensic methods functioning as cultural signals rather than as practices. The paranoid reading is closer to a posture than a procedure, an instruction to remain on guard, and the techniques are cited as evidence that vigilance is possible rather than actually performed. This matters because the problem is not scrutiny. Wider visual critique and literacy would be welcome, both for generative images and for photographs, which have rarely received the kind of sustained reading they deserve. The difficulty is that the dominant form of scrutiny on offer is suspicion as virtue, and suspicion as virtue tends to foreclose looking rather than open it.

The work the institutional constituencies do is not wrong in its own register. A wire service needs to know whether a photograph from a war zone was captured or generated, and a provenance standard is a reasonable response to a real problem. The difficulty is that the moral charge attached to forensic reading has generalised into the only available posture toward generative images. What began as a professional method for adjudicating evidentiary claims has become a cultural reflex, and the viewer now approaches the generative image already armed, instructed to suspect before looking.

The posture is in part a symptom of the documentary image’s victory, in visual art as well as in broader visual culture. The figurative and the photographically coherent have so thoroughly displaced the fragmented and the abstract that the latter now read as failure rather than as legitimate visual modes. The twentieth century’s experiments with cubism, collage, montage and the abstract did not undo the dominance of the documentary picture, they were absorbed by it as style, as quotation, as historical episode. What returned to the centre was the figure, the scene, the legible space organised by something close to single-point perspective. The photograph carried this regime forward and lent it the additional authority of mechanical evidence. Generative models, which produce images that are figurative in subject but incoherent in their perspectival construction, fall into the gap this victory has opened. They look like documentary pictures and behave like fragmented ones, and the trained eye reads the mismatch as fraud rather than as form.

Behind this lies more than a century of conditioning by what could be called the normative image, and behind that the older perspectival regime the photograph inherited. The photographic tradition has culturally privileged the mechanical capture of light as a special form of evidence, and legal, historical and personal epistemologies have been built around the photograph’s indexical promise (the claim that the image bears a physical trace of what it records, like a footprint or an impression). This was here, and it looked exactly like this. The longing for certainty that follows from this promise is also a fear of the unstable image, and generative models produce inherently unstable imagery. Their visuals are not anchored to a physical moment in space and time but synthesised from a statistical latent space (the high-dimensional mathematical field in which a model represents the patterns learned from its training data). Because they threaten the borders of photographic and perspectival truth, the cultural reflex is to reject them, to mock the anatomical errors, to call them fraudulent and move on.

The issue is not that the forensic rules are wrong, but that one regime of looking has come to define legitimacy in advance. The forensic reading is not mistaken about what it sees, it sees the failure of perspectival convention accurately. What it does is treat that regime as the natural baseline and read everything else as deviation. The cost of inheriting this posture is that the contest is settled before it begins. When an AI image is evaluated by depth of field, lens flare and adherence to Newtonian physics in a mirror reflection, what gets obscured is the question of what the image is actually doing. A generative model is not a camera. It does not capture light, it maps concepts. When the perspective lines fail to converge, when the figure faces forward but the reflection looks away, it is not failing at photography or at the perspectival picture. It is doing something else, something closer to a composited conceptual image, holding the concept of a person and the concept of a mirror at once, untethered from the constraint of bouncing light and from the convention of a single vanishing point.

A reparative reading (in Eve Sedgwick’s sense, a practice that asks what an object might offer rather than scanning it for what it conceals) would stop asking how an AI image fails as a photograph and start asking what it achieves as a generative artefact. Every glitch opens a door to depicting the world in ways that differ from the photographic, from within the photographic tradition itself. The deviation from realism is where an uncomfortable truth surfaces, which is that there has always been a gap between photographic convention and actual human vision. The generative image is not the eye, but neither is photography, and neither is the perspectival picture that stands behind it. We have consumed images organised by single-point perspective for so long that we have convinced ourselves they correspond to how humans see. They do not, and they never did. Photography is already computational in a broad sense. Exposure, framing, focal length, shutter speed and colour balancing are technical abstractions before any image arrives, and the perspectival grid the camera inherits from painting is itself a construction with a history. Human vision is temporal, partial, conceptual. The brain stitches saccades, memories, emotional states and contextual assumptions into a continuous illusion. Neither apparatus reproduces this. The photograph approximates it through one set of conventions, the generative image through another, and which set gets to count as the legitimate standard is itself the contested question.

If perception itself is partial, layered and assembled, then the perspectival window is only one way of giving it pictorial form, and the historical record contains many others. Abstraction is not a twentieth-century European invention, it has long histories in textile, ornament, calligraphy, ceramic, manuscript, weaving and architectural surface, traditions that have always understood the image as a field of colour, shape and texture before, or alongside, any account of figures or situations. These traditions were not absent from the centre of art history by accident, they were subordinated through hierarchies that bound the perspectival picture to forms of cultural, colonial and gendered authority, and that ranked the depicted scene above the worked surface. The post-Renaissance regime extended this by treating the picture primarily as a witness, a window onto a scene whose claim to attention rested on what it depicted. But every image is also a worked surface, and that surface carries at least as much weight, and at least as much political potential, as the figures and situations rendered on it. Pattern, rhythm, density, the distribution of colour across a plane, these are not subordinate to representation, they are themselves where a great deal of what an image does actually happens.

The generative image is in this sense something close to a trojan horse inside the documentary tradition. It enters under photographic credentials, indistinguishable at first glance from the captured picture, and once inside it turns out to be ornament, pattern, repetition, a field in which fragments of the photographic recur, overlap and tile. The training corpus, billions of photographs absorbed and statistically redistributed, returns as surface rather than as scene. What looks like a single coherent view is in fact a worked plane in which photographic material has been reorganised according to logics closer to weaving or ornament than to the perspectival window. Return now to the mirror. Its failure is no longer a defect but a place where the construction becomes visible. The frame receding toward one point and the room toward another, the reflected figure occupying a space that does not connect to the space in front of the glass, these are the moments at which the worked surface declares itself. The mirror that was meant to confirm the perspectival picture instead exposes the plane on which photographic fragments have been laid down.

The unstable image does not threaten the truth of the world. It threatens the privileged position the photograph, and the perspectival and figurative tradition before it, has held in modern visual culture. The fight is not between truth and falsehood but between regimes of legitimacy, and to take the glitch seriously is to refuse to settle that fight in advance, to let the image be read as surface as well as scene, and to reconnect a medium currently treated as anomalous to traditions of abstraction far older and far more widespread than the perspectival window has been willing to acknowledge.