GPT-2: A New Kind of AI-based Forgery?
Wed, 12 Jun 2019 || By Raka Wicaksono

Artificial Intelligence (AI), has been progressing so much since John McCarthy firstly coined the term in 1956. Such progress has been leading human to discover another extraordinary innovation on AI, and one of them is the GPT-2. However, with the existence of GPT-2, there are also concerns about its harmful potentials, especially in the area of forgery. I believe that despite its intelligence and sophistication, the GPT-2 possesses the chance of being a new kind of AI-based fraud. Here’s why.

GPT-2: An Advancement to The World of AI-based Language Modelling

GPT-2 was initially developed by OpenAI, a research organization that is focusing on AI development. Firstly introduced in February 2019, GPT-2 is a language model program with 1,5 billion parameters, scrapping on a dataset of 8 million web pages (the dataset was named WebText).[1] Based on such amount of data, GPT-2 may have been one of the leading innovations in the field of language processing model. It can produce several sets of capabilities, including the ability to generate conditional AI-made text samples, ‘triggered’ with an initial sentence-prompt from human, then the GPT-2 can automatically create a lengthy continuation. It is also known to be more superior than other language models because the GPT-2 doesn’t need to use any domain-specific training datasets and it can be more easily shaped to specific ends.

Looking at the superior capabilities of GPT-2, we should consider what it can eventually do. Since it has an extensive dataset that is continuously trained at it, the GPT-2 can produce writing, based on its predictive-text capability. Although we might have seen other AI programs that perform text prediction (like iOS Predictive Text function or Google’s AutoComplete feature), the GPT-2 is surpassing other language models’ capabilities, specifically when it comes to the matter of ‘word depth.’[2]

The term ‘word depth’ refers to the program’s ability to dig deeper into the multiple meanings of a word. For example, say if we write the word ‘moody,’ the program can identify whether the word refers to someone’s mood or someone’s name. This is a primacy that is possessed by the GPT-2, considering when other language models are drawing on relatively simple types of language modeling.

Besides the features above and capabilities, the GPT-2 can also be trained to answer some specific trivia when it’s given a dataset, or even a text, to be analyzed. This is a significant advancement yet can be a scary one, because standing from a positive point of view, the feature can help humans to understand anything better, considering that for now, we usually use the AI-based programs on simple question-answering tasks that are involving predicting text, like asking the time or checking directions. But what if we’re looking at the GPT-2 from a negative point of view?

 

The Dangerous Side of GPT-2 Advancement

We’ve seen so many potentials that the GPT-2 has for helping humans to excel in life. But as much as we’ve seen its real possibilities, the GPT-2 also has its negative potentials, as Jack Clark, policy director at OpenAI emphasized, “Our hypothesis is that it might be a better and safer world if you talk about the dangerous side of GPT-2 before they arrive”.[3]

When we look at its capability to translate text without being explicitly monitored and programmed to, it drives us to think whether the GPT-2 can produce malicious content autonomously, especially when its base dataset was taken from the internet. As a common knowledge, we probably realized that the internet itself holds various malicious contents that are related to racism, violence, abusive, or other inappropriate content.[4] Based on this logic, if someone feeds the GPT-2 with those ‘bad and harmful’-kind of texts and materials, then it might be a sophisticated program that can produce any inappropriate contents, autonomously.

Other than that, the GPT-2 can also be a program that can produce a ‘fake writings’ that might be able to persuade the readers convincingly. Imagine if the program can provide some text that can tricks people into giving up online credentials by pretending to be a trusted person or institution. It was also because of these risks that the OpenAI team decided not to unveil the full code and algorithm of this program to the public.[5]

Modern Problems Requires Ancient Solutions

Since we’ve discussed the case of AI forgery, there is a simple yet ancient solution to tackle the issue of AI forgery. Signatures have been a reliable method for people to ensure the authenticity of a person. Considering that society has been relying on this method since 5000 years ago, there is no harm for us to take the same approach to the cyberspace. On the cyberspace, we are familiar with the term ‘digital signature’, a computer method (based on cryptography) of ensuring that an item didn't tamper after it was signed.[6]

Several available services can certify items using digital signature features, such as DocuSign, Adobe Sign, and SignEasy. But then, the available services per se cannot necessarily tackle the issue of AI forgery. Full certification of the digital signature usage is needed to be done by the government.[7] The certification will eventually lead to a common understanding in the society, that digital signature is the only legal method to identify whether a digital item is potentially forged or not. If those two challenging steps are done, then technology companies will also follow by providing a digital environment that makes it seamless for us to sign and verify signatures, eventually to tackle the proliferation of AI-based forgery.

Editor: Janitra Haryanto

Read another article written by Raka Wicaksono


[1]Radford, A., Wu, J., Amodei, D., Amodei, D., Clark, J., Brundage, M., Sutskever, I. (2019). Better Language Models and Their Implications. OpenAI (Online). Available at: https://openai.com/blog/better-language-models/#sample1. [Accessed on: May, 23rd 2019].

[2]Vincent, J. (2019). OpenAI’s New Multitalented AI Writes, Translates, and Slanders. The Verge (Online). Available at: https://www.theverge.com/2019/2/14/18224704/ai-machine-learning-language-models-read-write-openai-gpt2. [Accessed on: May, 23rd 2019].

[3]Ibid.

[4]Kozh, M. (2018). The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation. Malicious AI Report (Online). Available at: https://malciousaireport.com. [Accessed on: May 23rd 2019].

[5]Radford, A., Wu, J., Amodei, D., Amodei, D., Clark, J., Brundage, M., Sutskever, I. (2019). Better Language Models and Their Implications. OpenAI (Online). Available at: https://openai.com/blog/better-language-models/#sample1. [Accessed on: May, 23rd 2019].

[6]Etzioni, O. (2019). How Will We Prevent AI-Based Forgery?. Harvard Business Review (Online). Available at: https://hbr.org/2019/03/how-will-we-prevent-ai-based-forgery. [Accessed on: May 23rd 2019].

[7]Etzioni, O. (2018). Point: Should AI Technology Be Regulated?: Yes, and Here’s How. Communications of the ACM (Online). Available at: https://cacm.acm.org/magazines/2018/12/232893-point-should-ai-technology-be-regulated/fulltext. [Accessed on: May 23rd 2019].