The story in brief: Microsoft created a self-learning chatbot designed to emulate the speech of Millenials. They let her loose on Twitter, where she immediately got trolled hard by members of one of the most notorious boards of 4chan, and she turned into a massive Hitler-quoting racist. Microsoft took the bot down, and are working hard to remove the worst of her tweets. Oops.
Aside from obvious conclusion that there are some awful people on 4chan, what can the testing community learn from this?
One seems to be that if you carry out testing in a very public space, any testing failures will be very public as well. An artificial intelligence turning into a noxious racist is a pretty spectacular fail in anyone’s books. Given the well-known nature of the bottom half of Twitter, it’s also an all-too-predictable failure; people in my own Twitter feed express very little surprise over what happened. It’s not as if anyone is unaware of the trolls of 4chan and the sorts of things they do.
What they should have done is another question. Tay was a self-learning algorithm that merely repeated the things she’d been told, without any understanding of their social contexts or emotional meanings. She’s like a parrot that overhears too much swearing. It meant that if she fell in with bad company at the start, she’d inevitably go bad.
The most important lesson, perhaps, is that both software designers and testers need to consider evil. Not to be evil, of course, but to think of what evil might do, and how it might be stopped.