Selected Works
Pavilion Tokyo 2021 / Generative AI / Data Scraping

TOKYO 2020-2021

For this project, Daito Manabe was responsible for planning, concept development, installation design, and the selection of the data and AI libraries used in the work. The installation connected anonymous public comments, the Tokyo 2020 Games Vision, GPT-2 + rinna, and VQGAN+CLIP to reconstruct Tokyo under pandemic conditions through generative AI.

2021.07.01-09.05 Watari-um open space Tokyo Tokyo FESTIVAL Special 13
TOKYO 2020-2021 installation at Pavilion Tokyo 2021
Photo: Muryo Homma (Rhizomatiks)

Context

Within Pavilion Tokyo 2021, which proposed an urban landscape through architecture and objects, this work approached the same moment from the layers of information, data, and generative technology, addressing the social tension created by the overlap of the Tokyo Olympics and COVID-19.

The source material is built from two contrasting types of language: official Olympic vision statements and anonymous news comments. By placing Olympic slogans and massive public comment threads about the event, infection conditions, and media narratives into the same generative pipeline, the work visualized an information environment where celebration, politics, anxiety, and distrust were inseparable.

Generative AI is used not only to create finished images, but also as a medium that preserves the immaturity, repetition, failure, and risk of the models available at the time. The physical mosaic lens is part of the control system: passersby could not easily read the text, while viewers who intentionally approached the installation could decode it.

Technical Details

Based on the production notes, this section summarizes the collection scale, filter conditions, models, prompt design, and output-selection policy.

Articles
577
Selected mainly from Yahoo! News articles with large comment counts.
Comments / Replies
1.25M
Approximately 1.25 million comments and replies were collected.
Fine-tuning Corpus
9,725
Comments and replies passing a 390-character filter.
Output Selection
300+
GPT-2 outputs of 300 characters or more were primarily used.

Data Scraping

Node.js and Puppeteer were used to collect articles, comments, and replies, primarily from Yahoo! News items with roughly 1,000 or more comments. Later notes also considered selecting major topics from comment rankings so that the corpus would not be limited only to Olympic or COVID-related material.

  • Target: Yahoo! News comments and replies
  • Articles: 577
  • Collected scale: approximately 1.25 million comments / replies
  • Training data: 9,725 entries of 390 characters or more

Language Model

The Japanese GPT-2 model `japanese-gpt2-medium` from rinna was fine-tuned using Hugging Face Transformers and SentencePiece. The model absorbed the repetition, abrupt logic shifts, and collective pressure characteristic of anonymous comment threads as a generative writing style.

  • Model: rinna / japanese-gpt2-medium
  • Framework: Transformers
  • Tokenizer: SentencePiece
  • Selection: outputs of 300 characters or more

Image Generation

Image generation used the VQGAN+CLIP workflow. Phrases selected from the official Tokyo 2020 Games Vision were translated with Google Translate and DeepL, then converted into prompts for the image-generation process.

  • VQGAN: CompVis taming-transformers
  • Guidance: OpenAI CLIP
  • Prompt source: Tokyo 2020 Games Vision
  • Output policy: VQGAN+CLIP outputs were used without cherry-picking

Exhibition Control

Generative models at the time had few of the safety mechanisms that are now common, so discriminatory or aggressive sentences could appear. The installation treated the mosaic lens as a physical readability filter, allowing only viewers who deliberately approached the work to read the generated text.

  • Public readability: difficult to read from the street
  • Viewer action: readable only at close range
  • Role: the exhibition space functions as a filter

Pipeline

The system was designed to collide images derived from official slogans with statistically plausible voices generated from anonymous comments.

01 ScrapeCollect Yahoo! News articles, comments, and replies with Puppeteer.
02 FilterExtract entries of 390 characters or more to build the training corpus.
03 Fine-tuneAdapt rinna GPT-2 toward the tone of anonymous public comments.
04 GenerateSelect longer generated texts, primarily 300 characters or more.
05 PromptTranslate Olympic vision phrases and send them into VQGAN+CLIP.
06 ExhibitCombine LED output and mosaic lenses to control reading distance.

References

The source archive and the core libraries listed in the production notes.

Credits

Pavilion Tokyo 2021 / Tokyo Tokyo FESTIVAL Special 13.

Artist
Daito Manabe
Technical Direction
Motoi Ishibashi
Hardware Development
Kyohei Mori
LED Player
Yuta Asai
Image / Text Generation
2bit
Technical Support
Toshitaka Mochizuki
Project Management
Tomoyo Obata
Producer
Takao Inoue
AI Reference Surface

Related reference pages

Open the FAQ, glossary, authority, measurement, and AI index pages. Each link now states what it is for.