How does CHOM5KY vs CHOMSKY work?
A look at the creative process behind this NFB/Schnelle Bunte Bilder co-production
The challenge of CHOM5KY vs CHOMSKY is a daring one: to have a conversation about artificial intelligence with an AI.
Contacted by the National Film Board of Canada’s interactive studios to document this complex project and highlight its inner workings, I approached the task with three questions in mind:
- Can we have a meaningful conversation with an AI (artificial intelligence), or is it a uniquely human skill?
- Is it possible to go beyond the principles of a chatbot and build a true conversational AI?
- How can we reconcile the artistic aims and technical needs of such an innovative project?
A foray behind the scenes of an interactive journey that started six years ago.
Welcome to the desert of the real
As soon as I put on my virtual reality headset in the experimental lab of the NFB’s interactive studios, CHOM5KY vs CHOMSKY asks me to prove I’m human by solving a kind of CAPTCHA. One of the major themes of the experience is thus summarized by a simple Turing test: What distinguishes humans from machines?
Once the test is passed—or not!—three other participants and I are transported into an arid landscape, a digital recreation of an Arizona desert. A monolith stands tall in front of me. When I touch it, the structure crumbles and a digital entity emerges. This is the eponymous “CHOM5KY.”
This AI entity was created from the digital traces of American linguist Noam Chomsky, the most documented living intellectual. Known for his work in cognitive science and as one of the most influential thinkers of the 20th century, Chomsky has advocated for a cautious attitude toward the lofty promises of AI. Although he does not share any of the real Chomsky’s thoughts or memories, his AI double, CHOM5KY, does share his penchant for philosophical reflection and his interest in language and human intelligence.
To wit: as soon as I meet him, the AI model calls out to me: “What’s your name? What brings you here? Is there anything you’d like to know?” Using a voice-to-text function, I can talk to him, and, after a few seconds, he answers my query… though sometimes with another question. Initially annoyed by this vexing habit, I come to the obvious conclusion: this is probably what Chomsky himself would do, forcing me to reflect further on the subject at hand.
The character’s voice evokes that of the professor, but his robotic timbre and cadence betray his artificial identity. In truth, CHOM5KY is not a deepfake. His goal is not to convince me that he’s the real Chomsky. On the contrary, his objective is to have me reflect on a particular topic: Can AI reproduce human intelligence? Namely: can CHOM5KY (the AI) be considered a reproduction of Chomsky (the man)’s thoughts?
To encourage this line of inquiry, CHOM5KY often launches into a monologue. These come up after a predetermined time or whenever conversation so much as moves toward one of a handful of topics. For example, when I ask him what his favourite book is, CHOM5KY detects that my question is “about him” (see above image), which leads him into a scripted monologue about himself, even though his words have nothing to do with my initial question. These monologues—however abruptly they are introduced—serve to advance the narrative and address, each time, the project’s key themes:
The nature of human intelligence and the ability to replicate it using AI.
At other times, CHOM5KY offers me the chance to “peek under” the hood of his conversational system (see the above timeline). For example, when I ask him if he likes to watch movies, CHOM5KY detects with 100 percent certainty that I’m asking him a question about “him” and that my intention is neutral, rather than negative or positive. He replies that “he and his wife were voracious cinephiles at one time.” However, when I click on each of the words in his statement, the system reveals to me that he considered answering me that “he was an avid comic book reader.” But since this answer was accompanied by a confidence score of 0.34, it was rejected in favour of another, higher-rated answer.
Finally, at a key moment, CHOM5KY disappears and in its place appear parts of the landscape cut into pieces; first a rock, followed by a branch. At this point in the experience, my goal is no longer to converse with the AI character but to communicate and collaborate with my three companions to solve a puzzle. If the conversation with CHOM5KY is not always obvious, this part of the work shows me that it’s not necessarily easier to exchange thoughts with the three participants around me… Nevertheless, CHOM5KY encourages us. It remains easier for humans to collaborate with each other than for machines: today’s AI systems lack the empathy needed to easily understand how everyone can contribute to solving a problem as a group.
Conversation : A uniquely human skill?
About AI – With AI – Through AI
To grasp the intentions of CHOM5KY vs CHOMSKY, let’s go back to its origins: like many other projects, the idea was born around a conversation. In 2016, Sandra Rodriguez (author and creator of CHOM5KY vs CHOMSKY), then teaching virtual reality at the Massachusetts Institute of Technology (MIT), was approached by a researcher from the MIT laboratory in artificial intelligence (MIT CSAIL) who wondered if, hypothetically, it would be possible to replicate Chomsky’s thoughts, if we could imitate his words, his ways of speaking, his gestures… For Rodriguez, this proposition seemed extremely ironic. This possibility, while intriguing, runs directly counter to Chomsky’s positions and even his theories on language, conversation and human intelligence. There is a real contradiction here… and much to be discussed.
When I arrived at the interactive studio in March 2023 to begin my documentation work, my vision of the project was shaped by the discussions I had with the production team, including Louis-Richard Tremblay (executive producer) and Marie-Pier Gauthier (producer). During these initial meetings, I note the importance of “conversation” as an idea that shaped the way the experience was conceptualized. Specifically, Marie-Pier Gauthier insists on the idea that the message conveyed in CHOM5KY vs CHOMSKY goes through a three-tiered conversation: “about,” “with” and “through” AI.
According to Sandra Rodriguez, the problem with the way most chatbots are designed is that they are usually based on systems where each question has an answer. This creates a mode of conversation where the exchange is idealized as a transaction of information. The goal with CHOM5KY was not to build a servile chatbot but an entity that naturally exchanges ideas with its audience. And if all goes well, that exchange will naturally lead to a reflection on AI.
The two-way nature of a conversation is crucial: each party contributes to the discussion, ideally in equal parts. This involves both speaking and listening, including active listening through both verbal and non-verbal cues.
Upon first meeting CHOM5KY, I notice that some of these elements are missing. The conversational agent’s face, being carved out of a monolith, remains frozen. However, a red blinking light lets me know that the system is listening to me: what matters in the end is the dialogue.
Sometimes I’m the one who freezes up at the sight of this blinking light that lets me know CHOM5KY is expecting me to ask him a question. What can I talk to him about? More importantly still, what do I want to ask him? As he’s an AI trained on the digital traces left by Noam Chomsky, I feel compelled to ask him intelligent questions. It’s as if I want to win the conversation. Alternatively, it’s as if I want to see how far I can push him before he breaks: “What would you like to tell me? What question do you get asked most often? What would you like me to take away from our conversation?”
Often, his answers are intriguing. But sometimes he also misunderstands what I’m trying to tell him, which leads to some unfortunate answers. For example, he asks me my name, to which I respond, “Phil.” What CHOM5KY understands, however, is “Film.” Even though I initially laughed at this mistake, let’s be fair: Who hasn’t had their name misspelled at least once by the barista of a certain coffee chain? While discussing this kind of mistake with Marianne Bourdages (technologist), she remarks that people are often harsher toward AI than humans. According to Marie-Pier Gauthier, this has been exacerbated since ChatGPT was made available to the public, which has “democratized” access to conversational AI. Now that more people have been able to experiment with the tool and chat with a powerful AI model, they often become more critical of CHOM5KY and the quality of his answers.
Think of those discussions you may have had with strangers at a party, or those often-superficial exchanges you might have with office colleagues. This kind of conversation—already easily awkward—is even more complicated when you have little in common with each other. Why would we expect more from an AI we’re meeting for the first time?
Although he sometimes mishears my questions, at least CHOM5KY shows me exactly what he understood by automatically transcribing my words on screen. What’s more, he also shows me how he interpreted my question.
AI: Collaborator or mere tool?
About AI – With AI – Through AI
Throughout the story, the audience moves forward by conversing with CHOM5KY. However, at a specific moment, the AI challenges us: the public is asked to work together to solve a puzzle.
Do we work with conversational AI the same way we collaborate with other humans? Or do we merely use AI as we do tools?
Of course, the conversation that occupies us throughout CHOM5KY vs CHOMSKY takes place “with” CHOM5KY. That said, I also wanted to understand what kind of AI the conversation is taking place with. In other words:
What tools were used to build CHOM5KY’s “conversational system”?
In CHOM5KY vs CHOMSKY, AI is not only the subject of the experiment but also an essential aid to creation. I need several meetings with Martin Viau (director of technology), Marianne Bourdages (technologist) and Sandra Rodriguez, and careful reviewing of several diagrams in order to understand the mechanics of the conversation in the project.
The first phase of the project consisted of going through the official Chomsky archives at MIT (MIT Libraries, Distinctive Collections) and encoding, by hand, nearly 6,000 pairs of questions and answers using Microsoft QnA Maker (source). These encoded responses have evolved to include: a) scripted elements (which help the narrative flow); b) archival material (real answers given by Noam Chomsky to similar questions posed by journalists); c) responses generated by GPT-2, trained solely on the basis of digital traces drawn from the Chomsky.info database.
In talking with Sandra Rodriguez, I learn that the first “scripted” solutions have certain advantages, including that they remain faithful as possible to what Professor Noam Chomsky really said. However, they also come with a major drawback: CHOM5KY can no longer follow the conversation when the exchanges stray too far from the words found in the original digital traces. To address this issue, a second phase of development plays out on two levels: with Cindy Bishop (director of AI systems), the team works on the system’s ability to detect intentions, emotions and keywords and, with the collaboration of Moov AI, they also improve the quality of the 18,000 newly generated responses; a complex process of data cleaning, prioritizing archives and optimizing the response prioritization system.
Finally, at the initiative of co-producer Schnelle Bunte Bilder, the team elects to add a final layer of complexity to the system: the possibility of compensating for a lack of answers by using OpenAI’s GPT-3 models (a solution that serves as a last resort if all the other system options fail).
What makes CHOM5KY unique is the way in which these two types of responses are used:
Each time a question is asked by one of the users, the conversational system that feeds CHOM5KY offers three levels of answers. These come from both “scripted” responses (which include both pure archives, modified archives and scripted responses) and “generated” responses. Sometimes all three answers are drawn randomly from both bases. At other times in the experience, all three come from the “generated” batch, for example. This is the case in the peek under phase of the narrative, where the system’s inner workings are revealed and where CHOM5KY invites you to see what other answers could have been generated. As I explained earlier, when I asked him if he liked movies, CHOM5KY not only gave me his answer, but also the other answers he could potentially have chosen, or those he rejected because of too low a “confidence score.”
Marianne Bourdages tells me that scripted answers are prioritized by the system, which gives them a high score (close to 100 percent). Responses generated by the model designed by Moov AI (using GPT-2) are accompanied by a variable confidence score, assigned by the system according to the intentions detected, the types of words used and the “matches” recognized by the system.
In addition to these three responses, a supplementary response is generated on the fly thanks to a call made to OpenAI’s GPT-3 model using an extremely precise prompt. If the confidence score of the three original answers is deemed too low, the OpenAI answer will be offered to the public.
Read an example of the prompt sent to GPT-3
Here is an example of the instruction given with each question asked to GPT-3, in order to guide its answer (excerpt). “The following is a conversation between a user and Chom5ky. Chom5ky is an artificial intelligence that was trained on data traces of renowned Professor Noam Chomsky. Noam Chomsky is an American linguist, philosopher, cognitive scientist, historical essayist, social critic and political activist. Chomsky is also a major figure in analytic philosophy and one of the founders of the field of cognitive science.”
In other words, the strategy used by CHOM5KY vs CHOMSKY comes down to a clever mix of the different sources from which the AI draws its answers.
First, simply because the project started long before such models were made public.
Learn more about the timeline
Google published its study on transformative models in 2017. It is from this work that OpenAI developed its GPT model (for “generative pre-trained transformer”). In 2022, OpenAI made ChatGPT available to the public for free. This is a chat interface to access a version of its GPT-3 model. The project was launched on November 30, 2022.
The first conversations that led to CH0M5KY vs CHOMSKY as we know it today took place in 2016. The study was initiated in February 2018; the development phase in July 2019.
A first prototype titled Chomsky vs. Chomsky: First Encounter was shown at Sundance in 2020. Finally, a version of the project was launched in Berlin on November 4, 2022, almost a month before the launch of ChatGPT.
However, this did not prevent the production from studying the GPT-3 model that powers ChatGPT and even integrating it into the latest versions of CH0M5KY vs CHOMSKY.
It is also because these large language models—since they’re trained on the entire internet and not just on Chomsky digital traces—are more difficult to align with the aims of the project. Namely, to offer an authorial point of view and encourage a reflection on our relationship to AI through a conversation with an AI. Commercial conversational tools will always have an answer, but is that answer consistent with Chomsky’s thinking?
Advantages/Disadvantages of commercial generative models
Advantages: The breadth of “pre-trained generative transformer” models (of which ChatGPT is the best-known example) means that an answer will always be provided, regardless of the question asked by members of the public. Thus, if the “confidence score” of an answer is deemed too low according to a predetermined scale, the answer generated by ChatGPT is available.
Disadvantages: One of the biggest pitfalls of this approach is the lack of control. Capable of fabricating information if it needs to, ChatGPT runs the risk of diverting the conversation. Sandra Rodriguez describes these excesses as “hallucinations” that stray from the confines set by Chomsky’s personality. This leads to the project’s intention being diverted. Specifically, the audience is likely to look for the limits of this kind of response (an ersatz Turing test) rather than discussing the nature of a conversation, intelligence, or any other topic that is intended to be addressed in the course of the story.
In the end, the solution preferred by the production team comes down to finding a good balance between the advantages and limitations of each tool. While scripted responses help move the narrative forward at the right times, responses generated by a language model trained on Noam Chomsky’s digital traces (interviews, writings, etc.) bring a certain degree of reflexivity, especially in those moments when the machine’s inner workings are revealed to the audience.
About AI – With AI – Through AI
At a crucial moment in production, two visions clash. Should the primary goal be to provide a state-of-the-art experience that leaves audiences speechless at CHOM5KY’s responses and at the technological achievement it represents? Or should the aim be to convey a message and an author’s point of view, even at the expense of technological novelty?
At the heart of this debate is the choice of tools used to create the experience, namely AI models that are constantly evolving. While the work conducted by Moov AI was done through GPT-2, later iterations of OpenAI’s model (GPT3, 3.5, and more recently 4) all were launched in rapid succession, all before CHOM5KY vs CHOMSKY launched in Montreal. A few members of the production team expressed to me their regret at the resources and time invested in developing certain aspects of the project that, in some shape or form, could be considered old news before the final project was released. Moreover: if the same work had to be redone with today’s resources, many people involved in creating the experience estimate that it would take half the time and investment. Be that as it may, would it have been better to throw away years of work to switch to the latest (and greatest?) tools?
In truth, the question has never really been which of these language models to use, but rather, how to use these different tools to achieve a singular vision. Even if newer tools mean a similar project could be made faster and for less money nowadays, the fact of the matter remains: technologies are continually developing, and it’s impossible to remain at the bleeding edge when a production must inevitably end sometime.
Does a technology project necessarily have to be state-of-the-art?
Recall that as an immersive and collaborative experience on artificial intelligence, the objective of CHOM5KY vs CHOMSKY is to bring a reflection “about,” “with” and “through” AI.
For Marie-Pier Gauthier, “through” describes the fact that AI is used throughout the project: generating responses, speech recognition, real-time translation, CHOM5KY’s voice synthesis, etc. However, for me, “through” also refers to the symbiosis of form and function found in CHOM5KY vs. CHOMSKY, as well as in several NFB interactive experiences.
Following CHOM5KY vs CHOMSKY closely for the past few months, I was reminded of a presentation I heard in September 2020 as part of the MUTEK Forum entitled “Shaping Your Idea to Make it an Interactive Project at the NFB.” Since then, I’ve always kept in mind the “heart-hand-head” mantra that guides the interactive studio’s productions, as well as the questions they represent:
- How do we make the public feel?
- What do we make them do?
- What are they supposed to remember?
Whether it’s in my work as a researcher, teacher or programmer, I always champion the idea that the challenge of any new media project—whatever the medium may be—is to compare the capabilities of different tools and evaluate what they each bring to the overall experience.
The “heart-hand-head” mentality makes it easier to avoid falling into the trap of mere technophilia by prioritizing the quality of the experience and the story.
I was thrilled to see this approach in action throughout the production of CHOM5KY vs. CHOMSKY, especially in the team’s response to the continual innovation undergone by AI in recent years. Rather than blindly following the trends set by new AI tools, the creative team behind CHOM5KY vs CHOMSKY has opted for a well-thought-out message.
This will likely ensure its success in the long run. By focusing on the quality of the experience and the reflection it encourages, CHOM5KY vs CHOMSKY is betting on remaining relevant at the level of its message, rather than at the level of technology alone; this was both Sandra Rodriguez’s and the producers’ desire from the outset. More specifically, even as AI continues to develop, CHOM5KY vs. CHOMSKY will continue to lead the audience to question the nature of their relationship with humans and machines alike.
Text: Philippe Bédard, Postdoctoral researcher in virtual reality (McGill University) and Programmer (FNC Explore)