Friday, March 20, 2015

I, ROBOT - part 2


The apple cannot be stuck back on the Tree of Knowledge; once we begin to see, we are doomed and challenged to seek the strength to see more, not less. - Arthur Miller

The somewhat menacing figure above is not a prop in a science fiction film. This is Atlas, developed as part of a DARPA program by Boston Dynamics. Designed originally for search & rescue tasks (such as pulling people out of a burning building and that sort of thing), this 6', 330 lb vehicle is proving to be highly adaptable, as it is the hardware platform of choice for a significant number of entrants in the DARPA Robotics Challenge beginning this June. Most of the participants are university research teams scattered across the USA, APAC and Europe. A slide show of many of the contenders for the competition's prize can be found here:

The claim that Atlas will be used only for civilian purposes is laughable, of course, when one considers who its original developers are. Since the first story of automatons was dreamed up by Man, there have been those who have forewarned of the dangers of machine servants employing their mechanical strength and robustness to overthrow their masters or being used by the the unscrupulous and power-hungry to impose their will on others. Such concerns are still with us today and evidently not everyone is prepared to welcome our new robot overlords:

What we are interested in discussing, though, is not the threat of android war machines, but the possibilities offered by sentient robots - machines that can think. There are any number of interesting developments to watch, as robotics is clearly a growth sector. For the moment, we'll focus on the efforts of a few of the companies that are covered in the quarterly "By The Numbers" financial reporting posts.


The beginnings of all things are small. - Cicero

Back in 2011, Google began to attract sustained public attention on its AI efforts with the purchase of Deepmind Technologies for somewhere between $400M and $500M. Deepmind was developing Deep Learning neural networks and became a part of the Google X group upon acquisition to support the Google Brain initiative.

In a sense, Google Brain describes the company as a whole. AI is a natural extension of what Google does; after all, its search engine is a kind of machine learning software. As a consequence, Google Brain and its Deep Learning research touch upon just about everything the firm is doing.

As an example: Google Maps previously relied on a beleaguered team of people to manually sort thru street level photographs and individually screen numbers on the sides of buildings to determine whether or not they actually indicated a unique address. Now, thanks to a Machine Vision capability developed by the Google Brain folks, this screening has become automated. 

Google now has voice recognition in Android and image search in Google+. There are efforts underway to apply these capabilities to Google Translate.

An intriguing research program is afoot stemming from the outcome of an experiment with two neural networks - one for image recognition, the other for text translation. The machine vision software was capturing images fed to it and storing mathematical representations of them in order to build a library of general identification/pattern matching models. When the mathematical abstractions were fed to the translation tool, the combined neural network became partially capable of identifying images fed to it and printing a text description of that image. When fed a fresh picture, the system was able to correctly identify it (at least in a generic sense) about 2/3 of the time. The most fascinating thing about this system is that the Google Brain team hasn't yet figured out how it actually works.

Further advances in Google's AI capabilities are being primarily driven by Google Now, the digital assistant built into the Android OS and now available on iOS as well. Google Now uses a voice recognition interface to perform an impressive set of functions - tracking packages, retrieving flight information or reservations at hotels and restaurants, scheduling calendar events, getting weather updates, setting personal reminders, interacting with social media, recognizing musical passages and a further dozen or so utilities and apps. These activities are all directly supported by the various basic AI capabilities residing on Google's servers.

Yet upon further scrutiny, one can deduce that there are fundamental flaws in the approach of Google Brain R&D on AI. The Machine Vision endeavor starkly illustrates these deficiencies. Google's researchers consider it a remarkable achievement that they made a network of 16,000 servers peruse 10M images and recognize on its own that they were all images of a cat. The research team is following this up by collecting billions more images as inputs to the neural network to provoke it into learning more things. The defect in this method is that this is not at all how the human mind works. 

As an illustrative example, think back to when you were a wee tot of 2 or 3. As a toddler, you one day might have noticed the family cat walking into the room. Your mother probably saw you looking at the cat and said 'cat' while pointing at the critter. She likely repeated this a few times until you repeated the word 'cat' as best as you could. Undoubtedly your mother then praised you for this and then probably called somebody to discuss the momentous event. She might have even taken a photo of you with the cat to commemorate the occasion. 


Perhaps over the course of the next few weeks your mom repeated the lesson a few times. Maybe a neighborhood cat came walking by a window you were staring out of and your mom again repeated the lesson. Furthermore, the family dog might have strolled into the room at a certain point and you, calling it 'cat', were corrected by your mother, who pointed and called it 'dog.' Soon after, while mom was pushing you in your stroller down the sidewalk, you noticed the neighborhood varmints strolling around and you alternately pointed and said 'cat' or 'dog.'

Notice that learning to visually identify cats and dogs along with associating a word to each takes a human child perhaps a dozen attempts - not 10's of millions. This suggests that the Google researchers are not anywhere close to building a true AI. Google Brain appear to have quite a long road to travel before it can be said that it has achieved even basic Awareness, let alone Perception/Cognition or Consciousness.


As the births of living creatures at first are ill-shapen, so are all Innovations, which are the births of time. - Francis Bacon

If one needed any further evidence that a major part of Microsoft's long term strategy involves painting a bull's-eye onto Google's back, it could be found in the company's AI work. Cortana, the digital assistant that competes with Google Now and Apple Siri, also drives much of the AI effort for Microsoft. Though it is restricted for the moment to Windows phones, MSFT has made clear its intention to port Cortana to iOS and Android.

Cortana has an impressive functionality set. The voice-activated digital assistant can apparently read and understand email. The software supports Windows 8.1 search capability. The assistant can even be presented an image captured by the phone's camera and be asked to identify it. 

The many functions of Cortana are backed up by Adam, Microsoft's counterpart to Google Brain. For instance, the image identification utility described above is done by Adam, using Cortana as the interface to the company "AI." 

Microsoft researchers claim Adam's machine vision requires 30x fewer images to correctly identify an object. If true, then Microsoft AI developers have made a breakthrough.

Architecturally, Adam differs markedly from Google's AI implementation. Adam is a neural network optimized for the servers that form Microsoft's Azure Cloud offering. The servers operate independently and pool their computational results asynchronously. The architecture is scalable; the more servers added for a task, the greater the accuracy of results. 

Microsoft's architectural breakthrough is complemented by further applied research underway focused on the hardware aspect of AI. The company has found that GPUs in its servers can get saturated in isolation when processing their advanced algorithms if, for instance, a model is too big for one GPU or cannot be effectively partitioned across all of a server's GPUs. As a result, systems sometimes nose over and seize up. Making matters worse is that the data center networks are sometimes unable to feed data fast enough to the GPUs. MSFT is trying to fix this with novel hardware-software combinations and are even looking beyond GPUs to high end FPGAs, as discussed in previous editorials. 

Steps and Missteps

Progress has not followed a straight ascending line, but a spiral with rhythms of progress and retrogression, of evolution and dissolution. - Goethe

Both Google and Microsoft are evidently tackling AI aggressively. There seems to be no fear within their research ranks of adopting neural networks and the complex, non-linear mathematical concepts which regulate them.

Nevertheless, both are making subtle errors in their approach to developing AI. Notice, for instance, that their focus is still compartmentalized. For the most part, voice recognition and machine vision are treated separately from each other and in total isolation.

The picture below glaringly illustrates another of the deficiencies in the current approaches of both companies. The diagram depicts the algorithmic architecture of Microsoft's image recognition/machine vision concept.

The left 3/4 of the diagram are a variant on Deep Learning networks known as a Convolutional Network. The right 1/4, though, is based on Gaussian probability distributions. Microsoft researchers are still trying to reduce a fundamentally non-linear process to a linear model.

Granted: Microsoft's method reduces image learning - at least in the 'cat' example - from the 10M or more of Google Brain down to perhaps 340,000 instances. But that's still far higher than the same function in a human brain. 

Viewed in that light, one can readily (and sadly) conclude that Microsoft and Google are not actually working on AI. They are instead simply making automatons that run increasingly sophisticated routines for low level tasks. They are, in the end, merely more complicated elaborations on a basic Difference Engine. As a result, the 'AI' development efforts at both companies are still stuck at something less than an insect level of intelligence - that of Awareness.

There are some core capabilities missing from the Adam and Brain programs:
1. A base set of values akin to the instinctive impulses residing in the cerebellum and brain stem. Isaac Asimov's three laws of robotics aren't anywhere near sufficient to serve such a function. There needs to be the synthetic equivalent of things such as the Fight vs Flight response, the need to belong to a group, the warm sense of satisfaction that one gets from a slice of prime rib or a well made slice of cheesecake, etc...
2. An interpretive mechanism which can weigh fresh data inputs against the base proclivities and their relative strengths, adjusting them dynamically (within certain ranges.)

Without these two functionalities integrated into their AI programs, I fear the Adam and Brain teams will increasingly find themselves chasing their own tails. Their efforts will never generate a true, independent Consciousness in a machine that is capable of self-directed learning.

Next week, we'll look at two more R&D programs from our portfolio companies. :-)

Dear readers,
Don't forget the Amazon and FlipKart banners! Many thanks to those who already have clicked thru them to do their shopping, and I hope more will join in to help keep this blog going. :-)
p.s. If any of you would like those banners to hilite particular products or categories, please let me know.