AI Voice Cloning for Gaming

by | Blog

Transcribed from an interview with Usiku Games CEO, Jay Shapiro on – recorded in Nairobi, Kenya in September 2021.


Hi, I’m Jay Shapiro, I’m CEO and founder of Usiku.Games based here in Nairobi, Kenya. 

What does gaming mean to you, for Africa?

Gaming to me is all about the internet of storytelling, that narrative arc using emotions like challenge, frustration, joy, curiosity, to ultimately tell a story.

When was Usiku Games founded? What does the organization focus on?

So Usiku Games started about three years ago, we are what’s called a social impact gaming company, we’re really focused on what we call #GamingForGood, which is harnessing the power of gamification, to try and do some good in the world by doing education, behavioral nudges, accomplishing things around healthcare, education, women’s empowerment, and climate change. 

What are some of the games you have worked on?

We’ve done a lot of different games over the years across Africa, focused on things like COVID working with Unilever, deforestation with Seedballs, we did an agricultural game with the World Bank, and we’ve just done one on sports for development with the German government. So a lot of different things. We try and keep it fun, but accomplish a mission. 

Tell us more about came out of the realization that when you look at the election cycles in Kenya, and really across a lot of countries in Africa, since multiparty elections started, every time we get into an election year, the GDP growth drops substantially, and the rate of violent acts goes up substantially. What we decided to do is to try to borrow a term from COVID, “to flatten the curve”, to this next election cycle in 2022. By creating, creating a game oriented towards Kenya’s youth, educating them on the principles of peaceful participation in democracy, anti-tribalism, critical thinking and not selling your votes, all those sorts of elements in a fun way, but with a real mission, that will accomplish a lot for the Kenyan economy and society. 

What does the game involve? 

To play the game, you play the character of a citizen journalist who is doing investigations, and you create the character by first making it look like you, using Snap’s Bitmoji engine. The player makes the characters look exactly as you like, then you choose your politician. Before you can interview them you first have to chase after them through the roads. Then when you catch them, you get to ask them questions around the election cycle. After they’ve answer we ask “What would you do?” It’s through that that we do really the educational learning around those values that we’re trying to convey.

What are some of the interesting tech that was used in the game?

It is really interesting because it’s the first game we’ve done that is really powered by AI. Using a platform called Overdub by Descript. We were able to create the voices of the main characters, that will sound quite familiar to most of our players. Most of the people when they see the game will know who it is implying to be without necessarily actually stating their names. The descript platform, was originally designed for content producers, videographers, and podcasters and that sort of thing. It is meant for filling in gaps, where maybe they forgot a word, or the audio cut out or that sort of thing. As far as I know, this is the first time it has been used really for voice cloning a full script with multiple virtual AI based characters, but the tool was fantastic for that. 

How did you arrive at using voice cloning technology?

What you have to do is train the AI initially for each voice. We had voice actors, come in initially and record what was about a 45 minute script with standardized words. It frankly, doesn’t make a lot of sense, it’s kind of gibberish. What that does though, is it teaches the AI all of the sounds the phenoms, and the intonations, in order for the AI to be able to say, basically anything phonetically. The biggest challenge with the voice cloning was really around the dialects. We chose to go with English, but we really wanted to do Sheng or Swahili. That was our original intention, we tried it first however, the AI just couldn’t get the pronunciations correct. It sounded like a computer. Whereas on the English, it was more natural, because that’s what the AI had been trained with I guess in the lab. By using local Kenyan voice actors for the different voices, it managed to get the nuances of the Kenyan voice and the correct accent. So it sounds more natural.

How do you avoid copyright or personality infringements suits for the different voices used?

We have a disclaimer at the beginning of the game, as you would on all games, frankly, and a lot of TV shows, saying that the characters in the game are purely fictional even though they may seem like other people, they are not those people and they’re not meant to be. So there’s no confusion that we’re trying to spoof real people. Furthermore, we are using animated characters, so it’s pretty obvious that it’s not the characters actually speaking like this.

Do you see this technology being adopted more widely by other creators in Africa? 

As a social impact gaming company, we’re dedicated to #GamingForGood, and it’s in our mission, to use technologies like this AI responsibly, to accomplish good, and to try and make a positive social impact. At the same time, we’re not naive enough to think that there aren’t people who are going to use it for evil purposes. I think we’ll see, as we’ve seen in other countries around the world, that when the next election rolls around, there will be people using this technology to create social posts, spoofing, famous politicians saying things that they didn’t really say, and that is going to be something that we as a society will have to deal with, now and into the future.

When it came to deciding whether to build in-house or leverage existing systems, what was the deal-breaker?

When we first started planning out the game, we did a lot of research on various different voice cloning platforms that are out there. We realized that there are some great technologies that exist that had substantial teams who had spent millions of dollars over several years. We just don’t have those kinds of resources to reinvent the wheel and create a whole new platform from the ground up, especially when off the shelf there are already great voice cloning technologies available. 

What were some of the key challenges you faced while experimenting with and implementing this solution?

Fundamentally, all of our games have to be fun as our first priority. Otherwise, people won’t want to play them. We always start with the game being fun, by using animation and fun characters, and the bitmojis. running and chasing through the streets, makes the experience fun. At the same time though, we also have a real objective to the education that we want to accomplish. Striking that balance of getting the messaging across, but still making it very playable, and fun and repeat playable, is definitely critical. 

Any other games that you’re working on that leverage AI?

One of the games that we are working on now, which is really interesting from an AI perspective, is a game called SIMburu, which is sort of the African version of the SIM’s. It has the main character named “Lulu”, who is a 3d animated character created in Unreal Engine, living in a fictitious county called SIMburu instead of Samburu. The players of the game will be teenage girls in secondary schools. The game will be packaged through the Ministry of Education, to teach the life lessons of financial management, sexual & reproductive health, and generally just making just good life choices. However, the player can only control Lulu, and as they say “it takes a village” so we have many different characters wandering around. All of those other non-player characters (or “NPCs”), are all AI driven. They all have personalities and needs. They walk around the village doing different things. Some of them are farmers, some of them have their cows, some of them run small shops or “Dukas” and that sort of thing. They all go about their daily lives and have dialogues with Lulu, and with each other in very real ways to make the village seem like a vibrant community. 

What would be your advice to other solution creators in the African space, who want to experiment with such technology? 

Wherever you look around, there is evidence that gaming is absolutely the future of Africa. There are over 350 million connected smartphones already in Sub-Saharan Africa. That is more than all of the USA, Mexico and Canada combined. Before COVID, that was growing at 20% year over year. AI is going to be a critical component of that development. Anybody who is thinking about getting into this space needs to recognize, first of all, that coding and development -what people normally think of when it comes to gaming and AI- are actually just a small piece of it. There are also the illustrators, and the animators, the music composers, and the scriptwriters. There are so many different parts that go into creating a game and particularly creating an AI-driven game. I would say, people who are interested in gaming, should go to school, maybe even study AI, from a theoretical perspective, but at least understand the first principles of it, and then apply that in a really fun way to the growing area of gaming. And if you do, we’ll happily hire you. Then we will have an industry full of talented people with the experience needed to start creating stories and content that are our own, that are African with local, relevant stories and legends. We have 1000s of years of great stories here on the continent, and it’s time for them to flourish in the gaming industry.


Submit a Comment

Your email address will not be published. Required fields are marked *