Unindexed study 3

A Python script reads the plain text of Attention Is All You Need, the 2017 paper that introduced the transformer, the design that underpins almost all contemporary AI, from chatbots and image generators to coding and translation tools. The script runs a small, untrained version of the paper’s own mechanism over it: a few lines of software that reproduce the attention calculation the paper describes, without the trained model that would normally carry it out.

The text runs as a single horizontal line through the centre of the screen, one large character at a time, drifting slowly from right to left. The pace is deliberate: a little over seven characters a second, so a minute reaches only the first few hundred characters, the title and the opening of the abstract. Behind the line, each character casts a vertical band of colour, its hue fixed by the character itself, blurred into a soft field so the present reads as a slowly shifting wash rather than discrete blocks. The letters themselves are dark, almost a silhouette against the colour.

For each new character the script measures how much weight it gives the characters before it, comparing it with the previous forty-eight and scoring how close each one is. The strongest of these connections are drawn as curved white lines reaching back along the line to the earlier characters they point to. The connections alternate: one arches above the line of text, the next below it, the next above, so the structure spreads symmetrically around the central line and makes visible how far back the mechanism reaches.

The script runs this comparison, but unlike a working language model it has never been trained. A trained model has been fed enormous amounts of text and adjusted itself until it can predict what tends to follow what, which is, in effect, how it learns what words mean. This has had none of that. It can only register the plain facts of each character, whether it is a letter, a digit, or a mark of punctuation, and how recently the same one last appeared. The connections it draws are between characters that resemble each other, not between words that belong together in meaning. The web around the line is built from likeness alone.

The sound is made the same way, from the characters rather than their meaning. Each character sets off three overlapping voices at once, a tone, a lower rougher one, and a brief hiss of noise. The pitch comes from the character, and the mix between the three follows how that character’s connections fall, tighter links lean towards the clear tone, looser ones towards the noise. The voices are long and overlap into a slow, continuous drone rather than separate notes, and each sits between the left and right speakers according to its place in the line.

Here the calculation that drives modern AI is turned on the paper that proposed it. The line moves too quickly to be read with ease, so the text stops functioning as language and becomes a moving pattern of letters and connections. The sound, slow and continuous, offers the time the image withholds. Applied to the paper itself, the calculation registers only how alike one character is to another, not the meaning the sentences carry.

Attention Is All You Need was written by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, then mostly at Google, and published in 2017. It is freely available and has been read and built on more widely than almost any paper in the field. There is no model to run here and no interface to work around. The text is open, and the only thing imposed on it is the decision to read it this way.

This excerpt is one minute taken from the opening of the paper.

The video is 1080 × 1080, at 60 frames per second.

#unindexed #nearinference #openweights #softwarecinema

Kristoffer ørum @Oerum