How AI works - Mysterious and dangerously human
The first part of an interview with a student of AI and Voice Technology about how we should think about OpenAI's ChatGPT and other artificial intelligence.
In my previous posts, I wrote about recent developments in AI, about the fact that ChatGPT is a proud member of the Hogwarts House of Ravenclaw, and what that tells us about the way it works. These posts are part of an ongoing look at the promise and dangers of AI, so that we can better understand which aspects of this technology are overhyped and which ones we should actually be worrying about. That is why I am excited to share with you this interview with Sjors Weggeman (25), who has completed a bachelor in AI and is currently awaiting the approval of his master thesis in the field of Voice Technology. Sjors is going to tell us more about how seemingly intelligent systems like ChatGPT work, what dangers we should be on the lookout for, as well as about his own views on AI and the ways in which it is being discussed. Because we talked about a lot–mainly because I kept asking more questions–I have split the interview into two posts. In this first part, we talk about his work and interests, how AI actually functions, and whether we should buy into all the hype around AI.
In case you missed my previous posts on AI, you can find them here:
Interview has been edited for clarity.
Tell us about what you do, what your areas of expertise is, what you encounter in your day-to-day work.
“My personal interest in the field of AI comes from its interdisciplinary nature: technology is all around us and is here to stay, but the human aspect is the most important. The intersection and interaction between these two is where the interesting things happen: for a long time our only way of interaction with computers was through the use of buttons—think of a mouse and a keyboard—whilst the most natural way of communicating is through language.
Since technology is advancing, a shift is taking place towards the use of voice to communicate with our technology. Voice however, consists of more than just words: there is laughter, breathing, and even silences can be meaningful. We humans are generally quite capable of extracting and interpreting all these signals, but technology is not, at least not yet. My expertise is increasing the capabilities of technology to understand and use these non-verbal signals like humans do, effectively making the interaction with technology feel more natural.
In my day-to-day life as a student of Voice Technology, I learned about making computers interpret text and spoken words. Examples of this are speech recognition (speech to text), natural language processing (extraction of meaning), and speech synthesis (text to speech). This started with understanding the production and perception of speech sounds, followed by information about the best machine learning techniques to handle chronological data [ed. data for which the order matters for its interpretation] such as speech, and lastly we applied this knowledge by improving existing speech recognisers and synthesisers or by implementing our own. Each machine learning technique has its own pros and cons, and knowing these we can for example take a state-of-the-art model and suggest using a different machine learning technique that has different pros and cons. This might not necessarily be an improvement due to the cons, but maybe there are better ways to deal with the cons of the newly suggested method, hence improving the model a bit.
For example, together with fellow students I submitted a research proposal for making a speech recognition model interpret most languages. It should be able to do this when trained with the five most common language families. Then, if you ask it to recognise certain speech, it should be able to identify which family the language belongs to and therefore much better able to interpret it. In other words, it should be much better at finding the right words, with less spelling errors, than if it hadn’t been trained on these language families.”
AI has been in the news a lot lately, being either hailed as having the potential to create a utopian future for humanity, or feared as the harbinger of the end of human civilisation. Where do you stand on this issue, and can you explain your reasoning?
“Personally, I do not believe that AI is capable of doing this anytime soon, but before I can explain why I want to clarify the following: when people are talking about AI they usually refer to general AI. Currently, we have many AI applications that excel in one specific task, often outperforming humans. Think of how chess computer DeepBlue defeated the chess world champion, for instance.
But general AI should excel at any task a human could do, which basically means it should think like a human. What this means is that, for example, it should have the capacity to solve a sudoku, to fold laundry, to follow a cooking recipe, as well as to build a house. The sheer amount of flexibility, computing power, and problem-solving skills this takes, is almost impossible to achieve with the knowledge and equipment we have today. This is one of the main reasons why AI will not likely be able to save or doom humanity anytime soon, that is, unless we create a non-general AI with that specific task. Even then however, there would be no reason for panic, because AI does not have the means to actually implement these changes. The best it can do is tell us how to get there.
A perfect example of this is this advertisement that I saw recently:
ChatGPT can’t actually finish that building. Maybe it can send a message to a contractor and ask if they can finish the building, but it won’t decide that on its own nor would it be able to take all the steps that are necessary to bring that construction to a successful conclusion. Basically, AI is nothing but a highly versatile tool, comparable to a Swiss army knife. Its functionality and its usefulness depend on what it was made to do, what you want to do with it, and how you use it. There is no reason to be afraid of a hammer or a saw. What we should be afraid of is who is handling it: you wouldn't give a toddler a hammer or a saw.
The people that hail or fear AI are usually the people with little to no knowledge and experience with AI. The people who work with AI are usually not afraid, because they are aware of what it is doing: it is working exactly as intended, nothing more, nothing less. What happens is that people tend to read about developments in AI in the news. Being unaware of these developments before, their world changed over the course of a news article: first it wasn't there, and now it exists. This gives people a false belief about the rate of change, which appears to be way higher than they can possibly catch up to, giving rise to speculation, wonder, and fear. This holds especially for people who are less adaptable, like older people.”
These past months, I have been looking into the workings of ChatGPT. This was partly in jest, for instance by having it complete the Harry Potter house sorting test to see in which Hogwarts house it belongs. However, I mainly did this because I wanted to figure out how it 'thinks' and where its strengths and shortcomings are. What mostly stuck with me was how easy it was to argue around its supposed ethical constraints, by rephrasing or using its own logic against it. What does this tell us about how these systems work and what possible risks there are if these systems are implemented more broadly, for instance as a personal assistant that handles your finances, which is where Microsoft seems to want to take this?
“We call it AI, but ChatGPT isn't intelligent in the sense that it is aware. This means that it is just simulating intelligent behaviour. Being trained on tons of language data (books, documents, webpages, etc.), it knows very well how we humans would respond in a conversation, without actually KNOWING what it's saying. This entails that for ethical issues, a human in the loop is needed. This is exactly what happens: OpenAI has a whole team of moderators working on ethical and sensitive subjects, trying to block them off by forcing ChatGPT to refrain from answering these kind of prompts. However, this is of course not even close to being fool proof: all ways lead to Rome. You can only block off so many roads, but where there is a will, there is a way. Honestly, I am also surprised by how easy it is, though. I once asked ChatGPT to paraphrase a poem, and it refused to execute the prompt simply because it wouldn't be able to capture the original message from the author. Then I acknowledged this incapability in my next prompt and asked it to do it anyways just for fun. Surprisingly, that worked like a charm.
What this tells us about the system is that it is still in its early phase, and cannot function properly without humans in the loop still (not so intelligent after all). There is a reason why OpenAI advises not to upload personal data. It learns continuously from all its inputs, so if you input personal data, that data becomes (partially) available to other people too. So right now, it is too early to implement such systems in a broad fashion, there are way too many ethical and privacy related issues.”
ChatGPT is a transformer-based model. Can you explain what transformers are?
“Transformer modules are one of the most interesting, rather recent developments in the field of AI. I’m not saying that they are going to make AI aware, but it does increase the possibilities for making it give better advice. At the moment they are mostly used for speech interpretation, but transformers are undoubtedly going to used in many other ways as well. In basic terms, a transformer module is a module (or subnetwork if you will), that can be used in for example automatic translation systems to make sure that words are put in the correct order.
In more technical terms, transformers are encoder-decoder networks with an attention (or self-attention) module in between. Therefore it can best be compared to a data compression and recovery method. The encoder compresses the data, reducing it to the essential information: the core message and small amounts of information needed for the recovery by the decoder network.
The decoder decompresses the reduced information to the original. Without the attention module you would thus end up with an almost identical sequence to the one you started with.
What this means for how these transformers function, can best be illustrated by looking at an example of an automatic translation network. When translating “I slept” from English to Dutch, this would become “Ik sliep”. So far so good. However, if we slightly alter this expression, suddenly literal translation doesn't work anymore : "I was asleep" -> "Ik was in slaap". The correct translation would be: "Ik was aan het slapen". This is where the attention module would come into play. The attention module would have been trained to recognise these situations and would insert "aan het" between "was" and the verb (slapen) in the Dutch translation. Which showcases how teaching this translation network the context of was + verb can make it a better translator. And as ChatGPT shows, it can also be used to learn context in other use cases, such as extracting meaning from a long prompt and remembering topics discussed over multiple interactions. ChatGPT 4.0 uses 100 trillion parameters [ed. GPT-3.0 uses 175 billion]. This number is so large, we cannot even begin to comprehend it.
→ For more information on transformer models, check out this blog post.
What can you explain about how systems are built, and to what extent an AI's answers are influenced by biases? Can they ever be the objective sources of information people want them to be? Should we see them in the same light as a search engine like Google, with its results shaped by its own biases and interests?
“ChatGPT is trained on tons of data. Most of this data stems from the past decades. As of yet, we still live in a world where bias is all around us. Let's start with equality for example: ask ChatGPT to come up with emojis for a CEO, a Vice President, and secretary, and it will initially likely use male emojis for the first two roles, and a female one for the secretary. Simply because that has been the standard for many years, something that we are now trying to move away from in society. And that is just one example of the many biases that exist.
To summarise that: we humans are inherently biased, and some biases we cannot even change, even when we become aware of them. Being trained on human language data, lots from a time in which we weren't even aware of many biases, ChatGPT is also biased for sure. Some of these biases can be mitigated with appropriate training and action, but whether it will ever be truly objective is hard to say, because we as humans will never reach that point.”
In the next post, Sjors and I continue our discussion as I ask what he thinks about some of my own concerns about AI and the worrying ways in which it is likely going to be implemented by governments and large tech corporations. If you have any more questions which you would like to see answered, leave them in the comments below and if we have not talked about something similar already I will pass them along to Sjors. If you liked the article, please consider liking and/or sharing it with others, it helps people find my newsletter.
You can find part 2 of this interview here:
Fascinating points, Robert. I have a few observations if I may:
"This holds especially for people who are less adaptable, like older people.”" This is something of a generalisation, and for a computer scientist rather vague. What counts as "old": people over 30? People over 50? When my father-in-law was 96 we had discussions about electric cars, electric bikes, digital photography etc. I think so-called older people are a lot more adaptable than people give us* credit for! (*I went to school with Methuselah) I know what is meant, I think, but I'm not sure it always holds true!
I was reminded of Amara's Law reading this: people over-estimate the impact of technology in the short-term and over-estimate it in the long-term. Is this something Sjors would agree with?
I LOVE the advert: "finish this building"!
By the way, Robert, I dropped you a long-ass email. If you missed it I'll resend it, hopefully with a different result!