Zuckerberg said in a Jan. 3 Facebook post that his personal challenge for 2016 was to use artificial intelligence to create a personal assistant, which he described as his own version of Jarvis from Iron Man. On Monday, he offered a detailed update on Jarvis, and highlights follow.
He provided the following overview of Jarvis:
So far this year, I’ve built a simple AI that I can talk to on my phone and computer; that can control my home, including lights, temperature, appliances, music and security; that learns my tastes and patterns; that can learn new words and concepts; and that can even entertain Max. It uses several artificial intelligence techniques, including natural language processing, speech recognition, face recognition and reinforcement learning, written in Python, PHP and Objective C. In this note, I’ll explain what I built and what I learned along the way.
Zuckerberg explained the challenges of connecting his home to Jarvis:
Before I could build any AI, I first needed to write code to connect these systems, which all speak different languages and protocols. We use a Crestron system with our lights, thermostat and doors; a Sonos system with Spotify for music; a Samsung TV; a Nest cam for Max; and of course my work is connected to Facebook’s systems. I had to reverse-engineer application-programming interfaces for some of these to even get to the point where I could issue a command from my computer to turn the lights on or get a song to play.
Further, most appliances aren’t even connected to the internet yet. It’s possible to control some of these using internet-connected power switches that let you turn the power on and off remotely. But often that isn’t enough. For example, one thing I learned is it’s hard to find a toaster that will let you push the bread down while it’s powered off so you can automatically start toasting when the power goes on. I ended up finding an old toaster from the 1950s and rigging it up with a connected switch. Similarly, I found that connecting a food dispenser for Beast or a grey T-shirt cannon would require hardware modifications to work.
On Jarvis’ use of facial-recognition technology, he wrote:
To do this, I installed a few cameras at my door that can capture images from all angles. AI systems today cannot identify people from the back of their heads, so having a few angles ensures that we see the person’s face. I built a simple server that continuously watches the cameras and runs a two-step process: First, it runs face detection to see if any person has come into view, and second, if it finds a face, then it runs face recognition to identify who the person is. Once it identifies the person, it checks a list to confirm I’m expecting that person, and if I am, then it will let them in and tell me they’re here.
Zuckerberg also created a Messenger bot for Jarvis, and he wrote:
I can text anything to my Jarvis bot, and it will instantly be relayed to my Jarvis server and processed. I can also send audio clips and the server can translate them into text and then execute those commands. In the middle of the day, if someone arrives at my home, Jarvis can text me an image and tell me who’s there, or it can text me when I need to go do something.
One thing that surprised me about my communication with Jarvis is that when I have the choice of either speaking or texting, I text much more than I would have expected. This is for a number of reasons, but mostly, it feels less disturbing to people around me. If I’m doing something that relates to them, like playing music for all of us, then speaking feels fine, but most of the time, text feels more appropriate. Similarly, when Jarvis communicates with me, I’d much rather receive that over text message than voice. That’s because voice can be disruptive and text gives you more control of when you want to look at it. Even when I speak to Jarvis, if I’m using my phone, I often prefer it to text or display its response.
He added on the addition of speech-recognition technology to Jarvis:
To enable voice for Jarvis, I needed to build a dedicated Jarvis application that could listen continuously to what I say. The Messenger bot is great for many things, but the friction for using speech is way too much. My dedicated Jarvis app lets me put my phone on a desk and just have it listen. I could also put a number of phones with the Jarvis app around my home so I could talk to Jarvis in any room. That seems similar to Amazon’s vision with Echo, but in my experience, it’s surprising how frequently I want to communicate with Jarvis when I’m not home, so having the phone be the primary interface rather than a home device seems critical.
Finally, Zuckerberg shared his thoughts on the future of Jarvis:
Although this challenge is ending, I’m sure I’ll continue improving Jarvis, since I use it every day and I’m always finding new things I want to add.
In the near term, the clearest next steps are building an Android app, setting up Jarvis voice terminals in more rooms around my home and connecting more appliances. I’d love to have Jarvis control my Big Green Egg and help me cook, but that will take even more serious hacking than rigging up the T-shirt cannon.
In the longer term, I’d like to explore teaching Jarvis how to learn new skills itself rather than me having to teach it how to perform specific tasks. If I spent another year on this challenge, I’d focus more on learning how learning works.
Finally, over time, it would be interesting to find ways to make this available to the world. I considered open-sourcing my code, but it’s currently too tightly tied to my own home, appliances and network configuration. If I ever build a layer that abstracts more home automation functionality, I may release that. Or, of course, that could be a great foundation to build a new product.
Readers: What do you think Zuckerberg’s New Year’s resolution will be for 2017?
Video courtesy of Daniel Terdiman, Fast Company.