Friday, January 16, 2015

Machine Vision, Part 1 - Hephaestus


I believe in intuition and inspiration. Imagination is more important than knowledge. - Albert Einstein

Smithing, metalworking, masonry, architecture and all manner of engineering and craftmanship were highly prized among the ancient Greeks. Real and fabled examples of how illustrious and revered the practice of engineering was among the Hellenes includes names such as Archimedes (who may have been the first to theorize the basic concepts of calculus) and even the mythical Pygmalion, who breathed life into a beautiful statue he had himself sculpted.

So honored were these professions amongst the Greeks that there was even a diety in the pantheon dedicated to creating and crafting - Hephaestus, the God of Fire. Known to the Romans as Vulcan, nothing was beyond the Fire God's hammer, forge and innate skill - the winged sandals of Hermes, the chariot of the Sun God Helios or the armor of mighty Achilles. It was from the forge of Hephaestus that Prometheus stole the flame which he gave to Man, gifting mortals with the light of curiosity and the drive of creativity by which they might begin to suborn nature to their will. Another of the Fire God's creations was the box which Pandora in her overpowering curiosity opened to unwittingly unleash all the ills and evils of mortals upon the world, with only Hope remaining within.

The skill of Hephaestus was such that he was capable even of crafting creatures of bronze and iron that could serve as assistants or guards. In this aspect of his talents, both the spirit of Hephaestus and the fire of his forge infuse the hopes and dreams of today's High Tech engineers. It is these modern day craftsmen who conceive circuits, systems and software that permit devices to absorb information from their surroundings thru sight, temperature, pressure and sound, analyze the data and then decide on and execute actions as a consequence. All of this includes a variety of technologies such as AI, Voice Activation, and the subject of today's editorial - Machine Vision.


The fringed curtains of thine eye advance,
And say what thou seest yond. - Shakespeare, "The Tempest"

Both existing and envisioned applications for Computer/Machine Vision are as varied as the range of inventions and mechanisms contrived by man. In the end, though, they can all be characterized in general functionality and scope as systems for taking pictures, extracting detailed information, comparing data against models, using predetermined criteria to arrive at some sort of decision for action and then executing on that imperative.

The spectrum of applications can be broken down into certain broad categories based on the information that is sought in a captured image. These categories include:
- Motion detection & tracking
- Object recognition
- Pattern recognition
- Event detection
- Learning

The image information itself can vary to quite a degree, ranging from 2D to 3D and even 4D (if one includes the element of time for a sequence of images or a video stream.) It stands to reason that digital signal and graphics processing requirements are the crux of the matter for proper system design, in terms of the volume of data that must be processed and the computing load for rendering models or reconstructions of those images.

In every object there is inexhaustible meaning; the eye sees in it what the eye brings means of seeing. - Thomas Carlyle


Naturally, there is more to machine/computer vision than just having an optical sensor. One cannot credibly market a cellphone with a camera lens as a platform with "machine vision." For many applications, captured images must undergo filtering and/or contrast changes in order to highlight certain features or shapes. These altered images are then compared to a library of models to match against. 

For most developers, the current challenge is creating a sufficiently large and detailed library of these primitives. The work is quite difficult and fraught with subtle issues for modelers. As an example: if you or I see a ladybug on a flower petal, we can distinguish between the small creature and the petal. However, a computer/machine vision system can have a great deal of trouble separating the ladybug from its background, often confounding the distinction between the two and 'seeing' the ladybug as a colored spot in the fabric of the petal's tissue.

Since Mother Nature has been developing visual systems for a much longer time than we have, a great deal of research has gone into revealing the details of how biological systems capture, process and recognize visual sensory input. The human eye has a complex architecture and an intricate array of optical sensing tissues. The transmission of this data to the visual center of the brain and its subsequent interpretation is a whole other level of mystery and is still incompletely understood.

Consequently, the replication of such sensing, processing and comparative functions in silicon & software is, as yet, an unrealized goal. Nevertheless, the implementations that have been achieved to date are still quite impressive. 


The history of science is full of revolutionary advances that required small insights that anyone might have had, but that, in fact, only one person did. - Isaac Asimov, "The Three Numbers"

The most common applications today originated over twenty years ago for industrial process and quality control. A robot on a manufacturing line checks a manufactured object for defects by comparing a newly captured image with a stored reference model and accepts or rejects the object on that basis. 

These simple applications of identifying an object from a stock/original photo have been extended into everyday life. License plate identification has been used by police departments for traffic monitoring since the early 1990's. Facial recognition - either by sorting an image database or picking out a face in a crowd - is somewhat similar in terms of computational load and is a very appealing concept to law enforcement agencies. Current systems are being enhanced to include 3D recognition that can account for variations in facial expression and capture unique details in skin texture and features or see thru disguises. The technology naturally extends itself to fingerprint or eye recognition systems. 

Applications have continued evolving to higher levels of sophistication over the last two decades. Automotive-based computer vision systems for collision avoidance have been available for a good half decade. These are intricate solutions that involve image capture, motion estimation, monitoring rates of change in relative positions & road gradients and so forth. Such vehicles can also include lane change/drift alerts that have their own sets of calculations, estimates and model libraries. 

The every day commonality of keeping a car within its proscribed lane on a road or highway and navigating safely belies the complexity of the task when one tries to mimic it with computer vision systems. There are quite a few calculations that need to be performed dynamically - gradient determinations, probabilities and so forth. Even the primitives libraries for such an application are quite elaborate and extensive. The newest developments for vehicles that are still in the prototyping stage boast the android-like quality of navigating on their own. Hence, their computational needs for computer/machine vision are the most Daedalean, as there is continual change in environmental factors, road/traffic conditions and relative position over time.

Medical applications under development are quite revolutionary. Taking a 3D CAT scan or MRI of a patient can lead to the imaged area being reconstructed as a simulation model to study injuries or pathologies as well as assist in diagnoses and courses for treatment. Other applications include detection of abnormalities - for instances, cancerous growths or damaged tissue.

The military has been a big fan of computer/machine vision since its very earliest days - ever since the US Air Force and Navy started deploying Tomahawk cruise missiles that used Xilinx FPGAs for landmark-based navigation in the late 80's and early 90's. The FPGAs would periodically load maps stored in memory and compare them to the landscape the weapon was traversing in order to make in-flight course adjustments without needing direct intervention by a remote operator. 

Newer missiles include guidance systems that can dynamically interpret more detailed geographical features as well as select & track moving targets. Inevitably, this is leading military researchers to the next step in unmanned aerial and submersible vehicles that would, in effect, become combat robots able to pick out targets on their own and independently act on the information. Who knows - maybe there is even something equivalent to a T-800 Model 101 under development in an industrial or government laboratory somewhere.

Of course, to truly reach the level of operability like the servants or helpers Hephaestus created at his forge, the robots will have to be able to see, understand, act and learn from the experience. This will ultimately require an AI. And that, dear readers, is a topic which we'll explore in a future post. ;-)