The Big Data Frontier - EBN

High Tech "State of the Union", Q3 2015

Greeks sacrificing a boar, c. 500 BC, pottery collection of the Louvre (source: Wikipedia)

"An nescis, mi fili, quantilla prudentia mundus regatur?"

Don't you know, my son, with how little wisdom the world is governed? - Latin proverb, multiple attributions

In these last 19 months we've made quite a number of visits to Delphi to receive the augury of the Pythia regarding the future of High Tech. Though we have left gold in the temple of Apollo and brought laurel branches to the seance, we have consistently neglected one of the oracle's primary traditions - that of bringing a goat for sacrifice.

Animal sacrifice was widely practiced by the ancient Greeks who, depending on the diety to be appeased and the purpose of the ceremony, would bring deer, goats, cattle and other animals to the altar. The behavior of the victim before the sacrifice was considered significant, and reading the entrails afterwards was also thought to be important (a practice shared by the Romans, particularly in their reading of omens in bird organs before battle.)

Legends tell of the Greeks first learning sacrificial rites from Prometheus, the Titan who was a patron of Mankind, protecting mortals from the early machinations of Zeus and gifting them the secret of Fire stolen from the very halls of Olympus itself. Tricking Zeus by exploiting his vanity, Prometheus taught humans to leave only the bones and hides of sacrificed animals on the altar while keeping the edible parts for themselves.

Oral traditions passed down from Homer, however, tell of a darker period in Greek history during the era of the Mycenaeans, predecessors to what we consider the classical ancient Greeks. Before setting sail for Troy, Agamemnon appeased an angry Artemis, goddess of the hunt, by sacrificing his eldest daughter Iphigenia. Even after the war's genocidal conclusion, the Mycenaeans sacrificed Polyxena, youngest daughter of Priam, over the grave of Achilles, slain by a poison arrow from the bow of Paris, brother to Polyxena, in order to appease the angry ghost of the semi-divine warrior.

Fortunately, we no longer need to engage in bloody rituals to pacify deities and receive prophecies. In their place we have mathematics and statistics to guide our way.

From previous "State of the Union" installments, everyone is already well acquainted with the backgrounds of the 17 covered companies. This time the individual company commentaries will be brief and the focus will be on revenue data. Select macroeconomic indicators will be used subsequently to place the current state of affairs of High Tech in perspective against the general economy.

The Big Iron

Cisco is the pick of the litter in this group, showing relatively steady if uninspiring growth. The Q413 year-long dip was provoked by revelations of NSA tampering with Cisco hardware, with predictable consequences to international sales. The impact of new executive leadership is not yet evident in the numbers, and I suspect we will need at least a full year of fresh data before we can even begin making any assessments along those lines. Cisco's future is potentially wonderful - the IoT, automotive, robotics and AI all drive Big Data, which should play right into the company's hands and hopefully drive its growth more energetically. Naturally, time will tell.

HP is now officially two companies - a smaller Enterprise firm and a significantly larger & far less profitable consumer arm for personal computing and printers. Interestingly, all product revenues declined except for the Server group. Services, PCs, printers and Software are all 'sucking wind.' 

It is now 'do or die' time for HP - revenues have been steadily declining for 19 quarters, employee morale is terrible and privately held Dell is obviously regrouping in both personal and enterprise computing (as its $67B buyout of storage powerhouse EMC attests.) Perhaps the memristor/RRAM - based "Machine" will be the savior of both subsidiaries. Whatever HP does to rescue itself from its predicament, it had better do it quickly.

USS Essex under kamikaze attack off the Philippines, November 1944 (source:

IBM continues to slide, unsurprisingly, thanks to the most stunningly incompetent and willfully value-poisoning executive team in the history of private enterprise. As is plainly evident from the revenue graph, earnings are deteriorating at an accelerating rate. Ginni Rometty and her Keystone Cops continue to undermine the server hardware business - the very heart of the company - in favor of low margin cloud computing and analytics services which Rometty inexplicably describes as 'high value.' When asked about the fact that IBM's hardware business was decreasing faster than its 'strategic imperatives' were growing, she responded with the mind-boggling claim that the company has been shrinking these last four years 'by design.' And thru all this, the $1.5B stock buyback program continues unabated so that Rometty can reward herself and her staff for the bang-up job they're doing. Unless this executive team is replaced wholesale by the end of the year, I think we may be seeing the end of Big Blue - and from all self-inflicted wounds.

German battleship Graf Spee, burning and sinking after being scuttled by its officers following the Battle of the Rio de la Plata off the coast of Uruguay, 1940 (source:


Intel has evidently recovered from its 1st quarter slump but continues to struggle with the albatross of the Personal Computing market strung around its neck. The decline in desktops, laptops and tablets has been at least somewhat offset by Intel's gains in the datacenter business. Nevertheless, with Microsoft making Windows 10 available for free, it is unlikely that there will be any near term recovery in the PC market. On the bright side, the Skylake CPU launch seems to be reassuring customers who had found the Broadwell line to be a major disappointment. The X-Point MRAM/RRAM project with Micron also holds much promise for the future. I remain cautiously optimistic about the long term prospects of the company.

Microsoft, on the other hand, has taken a decided turn for the worse. Once the clear strong horse of this pair, MSFT is smarting from revenue losses incurred by offering Windows 10 for free as well as from a 40% collapse in its mobile phone business. The Nokia phone acquisition turned into a total failure, and MSFT's main segment for its handsets is in very cash-restricted, low margin developing economies. It will be some time before the company will have an opportunity to reap higher gains from this niche with more advanced smartphone offerings. 

Despite these problems, the enterprise software and cloud services businesses are growing, as well as the subscription base for Office 365. Overall, Microsoft seems intent on diluting its consumer profile and is taking on an increasingly B2B character. In the short and medium term, I am expecting MSFT's revenue to enter a plateau much like that of Intel. The great strategic challenge that lies ahead is the contest between Surface Pro 4 and the iPad Pro for both consumer and enterprise markets. The stakes are enormous, as the winner of this contest will likely be the best prepared to champion and dominate an enormous future market where laptops, tablets and smartphones finally merge into a Portable Personal Processor line of hardware.

Area 51

Apple continues to define the future of computing with its iPhone 6/6s line. The above revenue curve is beautiful to behold. But there are clouds building on the horizon. The ADHD-addled press and fanboy pundits are oblivious to the fact that Apple doesn't appear to be much interested in talking about the Apple Watch anymore. Furthermore, iPad sales dropped 20% from last year. There are others who have seen thru the hype of Apple's financial talking points and have noticed indications of future weakness in the China geography. Details can be found at these links:

Perhaps the iPad Pro will revive the company's tablet fortunes and put Apple back on the path of discovering a product line that combines the laptop, tablet and smartphone markets in some sort of merged iPad/iPhone Personal Processor. Early reviews, however, suggest that the iPad Pro falls short of its principal rival, the Microsoft Surface Pro 4. As a consequence, my outlook for Apple in the near term is cautious, and, in contrast to the mentally deficient blabbermouths of the MSM, I suspect Apple's seasonal revenue spike in Q4 will fall short of last year's record (you heard it here first, folks.)

Google is now officially known as "Alphabet." The "Google" name now only applies to the search engine subsidiary, with all advanced technology research projects being treated in effect as a separate group of company-seeded startups. Alphabet's growth continues to be steady, but nonetheless not stellar - a increasing source of irritation to investors who continue to wait for an actual ROI from the company's many R&D projects. That growth should not be taken for granted, however - competition from Microsoft's Bing engine is maintaining pressure on Alphabet's advertising rates. In an effort to alleviate investor anger, Alphabet has announced that it will embark on a $5.1B stock buyback program. This is a particularly disturbing development, as it suggests Alphabet's executive lineup has developed the same disgraceful attitude of so many other corporate management teams who, lacking the necessary intelligence, skillset, dedication and ethics to grow the firm, decide instead to loot its value and line their own pockets. Because of this, I suspect Alphabet's growth will begin plateauing and anticipate a flat 2016.

The Vanara

As a whole, the programmable logic companies are not particularly interesting per se, but nevertheless serve as useful indicators of things to come. Xilinx and Altera gave notice 4-6 quarters ago that a downturn was in the making, and the fact that revenues continue to decline strongly suggests that this will become evident in the wider High Tech market and the global economy in general thru the last months of the year. 

Lattice stands out because of its steady growth against the trend. This has come about from an M&A effort of a strategic prescience rivaled only by Avago's purchase of Broadcom (see the May 29th editorial here: .) 

Lattice has grown its IP portfolio to include specific low power and I/F capabilities particularly suited for the nascent Robotics and IoT markets. I expect Lattice will perform comparatively better than its much larger counterparts during the near term economic decline and emerge from it much faster and stronger than its rivals.

The Carolingians

The above chart is unfortunately incomplete, as Infineon (red line) has entered an extended 'quiet period' and will not be announcing Q3 results until the end of November. Nevertheless, I will stick my neck out for the company and anticipate that it will report at least modest growth for the quarter. 

All three companies have a more or less identical strategic vision - that of avoiding highly price sensitive and high volume consumer applications in favor of specialized niches in analog, mixed signal, power management and MCU. The earnings differences between them are issues of finer granularity. 

Infineon is executing beautifully and has the advantage of a quasi-lock on the German domestic market. NXP is by far the most aggressive of the trio. Its merger with Freescale should complete before the end of the year and we will naturally witness a significant jump in revenues afterwards. The dutch firm has also been energetically divesting itself of product lines and divisions to sharpen its focus on value-added offerings for its target markets, yet has not shown a corresponding dip in revenues - an impressive feat. STMicro continues to be the 'sick man of Europe', though, as its serial quarterly revenue plateau suggests more than anything else that its deterioration will continue next quarter and perhaps even steepen. This is a company that needs an executive purge just as urgently as IBM. STMicro's executives do not seem to realize what it is to be leaders and are obviously quite content to function as mere administrative clerks.

The Stone Masons

Qualcomm, for so long a seemingly perennial champion in wireless semiconductors, is finally experiencing a swing towards the bottom end of the wheel of fortune. Based upon the company's R&D in Robotics, AI & the IoT as well as its growing interest in the datacenter, Qualcomm has indeed been preparing for the day that the smartphone market peaked and began to wain. The problem is that the peak in its established markets occurred before the R&D poured into nascent markets could bear fruit.

An aggravating factor is the targeting of Qualcomm by Beijing. Based on the open hostility of the Chinese central government to the California firm, China's handset makers likely feel a certain level of impunity in not reporting accurate sales numbers to Qualcomm and withholding payments. It stands to reason that they are also probably favoring the more culturally and politically well-aligned Mediatek for future designs. Several Chinese manufacturers are also busy developing their own applications and baseband processors, suggesting long term difficulties for Qualcomm in this largest of all mobile phone geographies.

I am still optimistic about the company's prospects in the long term. They are obviously not throwing in the towel on the future of mobile computing and are working to turn 5G into a standard that will be friendly to the company's own objectives in the IoT:

Yet despite the evident MSM idiocy in reporting Qualcomm's Q3 as a 'good quarter', one can see from the chart that the firm is in real trouble. Executive management announced a 15% layoff mid-summer; however, based on Q3 results, the continuing downward trend in revenues and management's own guidance for more bad news to come, I suspect there are several more RIF's in Qualcomm's future. The smart near and medium term move for Qualcomm is to use some of that $30.9B it has in excess cash to buy one or more companies that can quickly push the company into other established markets. It is, in fact, surprising that Qualcomm has not done so already, at least for datacenter products.

Broadcom surprised everyone - including me - by having a strong Q3. This came about despite continuing weakness for the company in networking. The Avago merger should close in Q1 2016, at which point revenues should jump by 20-25%. That is not likely to hold, though, as the merger is bound to cause disruptions to the management hierarchy and multiple points of friction, with the inevitable result that the company partially takes its eyes off the ball. Depending on how poorly the combined Broadcom-Avago executive team handles this will determine whether 2016 earnings are flat or down.

Mediatek is showing some life again. Revenue growth is mostly in China, though net income is sliding due to ferocious pricing pressures from Chinese handset makers. I suspect that Mediatek has been growing in the China handset market at Qualcomm's expense, since the San Diego company is clearly considered by Beijing to be a mortal enemy of China's domestic technology sector. 

Mediatek's focus on the IoT and its attempts at fostering an expanding ecosystem have both increased. Thus, the company is reasonably well positioned strategically in terms of technology initiatives, culture and the extant political atmosphere. Whether this is sufficient to bring high enough growth and overcome the negative effects of margin pressure remains to be seen. One quarter of good financial data is not enough to determine a trend.

Nvidia reported record quarterly revenues. All their growth, however, was in graphics chips and cards. The other product lines - professional graphics, datacenter and the Tegra line - all shrank. One can hypothesize that the slowdown in PC sales is provoking desktop and laptop owners to upgrade their current machines as they see fit rather than buy new tricked-out systems, leading to higher graphics processor sales for Nvidia. 
The company continues more than ever before to be a one trick pony, despite its many and lengthy R&D efforts. Investors are going to lose patience with Nvidia management one day, just as they are beginning to do so with Google/Alphabet. Something further to take note of: going into the holiday season, Nvidia is forecasting flat revenues. I find that to be very revealing indeed.


Athenean amphora c. 530 BC (

For the love of gain would reconcile the weaker to the dominion of the stronger, and the possession of capital enabled the more powerful to reduce the smaller cities to subjection. - Thucydides, "The Peloponnesian War"
The Persian invasion of Greece came to a definitive conclusion in 480 B.C. on the plain to the east of the tiny city of Plataea, where a combined 50,000-60,000 southern Greek hoplites, led by large contingents from Athens and Sparta, utterly routed a 360,000 man Persian force (half of whom were northern Greeks from territories under subjugation to the Persians and who were, shall we say, less than enthusiastic about being there.) The war was such a tremendous experience for the Greeks that, for a while, it united them spiritually as a people and initiated their Classical Age, producing famous historical figures such as Aeschylus, Sophocles, Aristotle, Herodotus, Xenophon and many others who generated art, literature, philosophy, science, architecture and sculpture that inspired cultures across Europe, North Africa and the Middle East for another 2400 years and counting.

"Non semper erit aestas."

It will not always be summer. - Roman proverb

Yet as Greek civilization blossomed splendidly at its zenith, the seeds of its demise were being sown. The Delian League, led by Athens, continued warring against a staggered Persian Empire, conquering territory in northern Greece and along the coast of Asia Minor, ejecting the Persians altogether from the Aegean and vaulting Athens into dominance, its maritime empire controlling roughly 1/3 of mainland Greece and the entire coastline of the Aegean Sea along with all its islands.

Athens succumbed to the vices and temptations of runaway success and began to exercise oppressive control over previously independent Greek cities and towns, even extracting tribute from former allies. Resentment, fear and envy began to build throughout the Greek world, culminating in a fateful gathering of the Peloponnesian League in 432 B.C. Egged on by Corinth and other Greek cities with their own ambitions and grievances against Athens and reinforced by recent arrogant imperial actions by the Athenians themselves, the Spartans took counsel of their fears concerning Athenian hegemony and declared the long peace to be at an end. 

And so did the Golden Age of Greece come to a close to begin a long tale of woe, where first Athens and then Sparta fell, neither to ever recover their former glory, while Greek cities rapidly entered and exited alliances of convenience as Thebes, Corinth and eventually Macedonia vied for hegemony over a war-torn and desperately impoverished Hellas. Battles, massacres, genocides, revolutions, assassinations, cruelties and injustices followed one upon the other beyond counting, until the Roman consul Lucius Mummius finally put an end to it all nearly three hundred years later by razing the city of Corinth and absorbing all of Greece into the Roman sphere.

Though clearly not as sanguinary, High Tech's participating firms are entering a comparable historic period, busily consolidating into larger competing entities and engaging in ruinous, margin-crushing price wars. M&A for Technology firms in 2015 has already surpassed $100B, a higher level than the previous six years combined:

We've already reviewed or reported on some of the bigger acquisitions that have happened during the year - Broadcom and Avago, Altera and Intel, Freescale and NXP and so forth. There are quite a few others, however, which have completed, are in process or are reputedly being negotiated:
 - Dell & EMC
 - Global Foundries & IBM's microelectronics group
 - Dialog & Atmel
 - Microsemi & Vitesse
 - Infineon & International Rectifier
 - Avago & LSI
 - KLA-Tencor & Lam Research
 - Microsemi and Skyworks bidding on PMC-Sierra
 - ADI and TI pursuing Maxim (rumored)
 - Western Digital & Sandisk
 - Infineon & Fairchild (rumored)

What we've also touched on in recent editorials is the strategic priority for Beijing in becoming self-reliant in High Tech, from the foundry level up to systems. Government-backed Chinese corporations and holding companies have bid on or executed several deals this year and further activity is virtually assured:

Converging with this trend are (1) the general decline in High Tech revenues and (2) a heavy round of layoffs across the industry. HP alone announced 30k in headcount reductions and Sprint is putting together a $2B expense cutting plan to be implemented by the end of January which will cost 'several thousand' people their jobs. To be explicit, it would not surprise me if, after the M&A surge tails off and the smoke clears, up to 100k High Tech employees will be out of work.

So little pains do the vulgar take in the investigation of truth, accepting readily the first story that comes to hand. - Thucydides

Bouyant (and supremely deluded) forecasts at the beginning of the year for 5%+ growth in semiconductors have, as predicted in previous posts, turned sour. Hype-prone IC Insights is now predicting -1% and Gartner is also projecting a 1% decline from 2014. I suspect the SIA/WSTS, late to the party as always, will follow suit before the year is out.

As one can see from the charts above, many of the industry-leading companies examined are shrinking or struggling just to stay even. Of the very few genuine growth companies, all but Lattice (and maybe Infineon) are quite obviously at risk and sailing into rising headwinds spawned from the end of the spectacular growth of mobile computing. 

All of this data stands in stark contrast to articles filed by the MSM during this quarter's financial reporting season. The published stories by and large reflect the posturing and positive spin of company executives. The fact that their verbal dreck is accepted passively or even occasionally reinforced by members of the financial press suggests either willing and active participation in the deception or profound vapidity on the part of the reporters.

"Historia est vitae magistra."

History is the tutor of life. - Roman proverb

And what of the global economic environment? Some would say that the height of the Dow Jones is all the indication one needs to assess the relative health of the economy. We all know better at this point, as the reason why the S&P and Nasdaq have done so spectacularly well has been covered in previous "State of the Union" reports, along with the deceptions in BLS employment statistics.

Let's return to the favorite measures we've used in the past - the CRB (both classic and Reuters-Jefferies) and the BDI.

First, the classic CRB going back to 1947:

As we can see, the index has been in near continual freefall since early 2014.

Now the Reuters - Jefferies CRB, from May 2008 to today:

The commodities mix is somewhat different and the scale is logarithmic, but the story is still nearly identical - a 'dead cat bounce' from the 2008-2009 recession to early 2011, then a steady decline until early 2014 with a steepening drop afterwards.

Finally, two versions of the BDI - a 3 year chart:

And the BDI from May 2008 until today:

Some observations:

1. The three year BDI shows the improvement in shipping rates that resulted from a mass retirement of old freight haulers in 2013. Note the very clear downward trend in the data that followed, as well as the truly piss-poor level of the index today - a period that should be experiencing a sharp spike for shipping final product to retailers for the holiday season. One can readily deduce that the happy-smiley MSM reports of most retailers breaking with the practice of opening on Thursday the 25th has nothing to do with executives wanting their employees to spend time with families on Thanksgiving but rather their expectation that low customer turnout this year will make paying workers overtime on Thanksgiving day pointless.

2. The logarithmic scale on the Y axis for the longer graph again compresses the peaks in the data. Nevertheless, the trend is clear - after the May 2008 all time high of 11k+, there was a severe crash in shipping rates which has been bottom-bouncing ever since, with spikes of steadily decreasing magnitude.

"Bene diagnoscitur, bene curatur."

A disease known is half cured. - Roman proverb

The 'time of troubles' is upon us, dear readers. Rare as the Koh-i-Noor are generations that experience no major trials during their lifetimes. But no one enters the field of High Technology without becoming reasonably inured to stress, difficulties, disappointments and severe tests of fortitude.

Before we emerge from this tempest, we will have to sail over its boiling mountains of water and canvas-shredding blasts of wind. I have colleagues who, most encouragingly, are finding that their 20-something direct reports are proving to be much more willing, capable and hardy than the way Millenials are being portrayed by the MSM. They do not seem to be wilting as they hear the first gales howling thru the lanyards and great waves roll up and crash into the bow. Yet it is the old sea salts who will need to guide the efforts of the novice swabbies and who will make the greatest difference in surviving this storm - those veterans of 15 or more years who bear the scars of many a battle with Poseidon and his monsters of the watery abyss. This essay is primarily directed at you, my fellow swashbuckling mariners, and I charge you with this duty:

"Ductus Exemplo."

Lead by Example. - Roman proverb

Greek Trireme on the Nile, section of a Roman Mosaic, 1st century B.C. (source:


Tools For the Big Data Frontier, Part 2

Alfred Jacob Miller, "Rendezvous on the Green River", 1837 (source:

And this I believe: that the free, exploring mind of the individual human is the most valuable thing in the world. And this I would fight for: the freedom of the mind to take any direction it wishes, undirected. And this I must fight against: any idea, religion, or government which limits or destroys the individual. - John Steinbeck

There are few symbols as iconic to a culture as the Mountain Man is to America. Fiercely independent and highly individualistic, these intrepid souls forsook the comfort and safety of civilized life in order to be free of any restraints or shackles and struck out into the wilds, depending only on their wits, raw courage and fieldcraft to survive against the elements, ferocious predators and a broad selection of mutually antagonistic native tribes.

At the height of the Mountain Man era during the opening of the American West, an annual spring gathering was organized that brought together fur trading companies and trappers to trade accumulated pelts, skins and furs for needed supplies & equipment. Held between 1825 and 1840 during the apex of the fur trade, the Mountain Man Rendezvous was almost always staged in a well known valley in Northern Utah or Western Wyoming.

Though animal pelts (especially those of the northern beaver) commanded attractive prices on the eastern seaboard and across the Atlantic, the mountain men needed an expensive collection of gear to harvest nature's bounty. The list of tools and implements, considered leading technology for the era, is almost endless - a Hawken muzzle loading rifle of 50 caliber or larger, lead shot, bars of lead with specialized tools to allow the trapper to make his own shot as needed, a powder horn with gunpowder, a toolkit for loading and maintaining the weapon, one or more single shot pistols, an assortment of knives, axes and hatchets, a sharpening stone, multiple iron spring traps for a variety of game, a hodgepodge of tools such as saws, needles, scrapers, awls, pliers, hammers and mauls, nails, a plane for shaping wood, augers, a flint with a piece of iron, shovels, picks, canteens or water skins, pots and pans for cooking, along with basic supplies including salt, sugar, coffee, cornmeal, tobacco and such.


In the first installment of this series, we discussed Hadoop and its suitability for exploring the new Big Data frontier. Today we will examine Hadoop more thorougly - specifically, from the point of view of how it serves as a framework for an extensive assortment of specialized software tools to support data scientists as they explore massive datasets to discover hidden insights and value.

The Apache Hadoop Distribution

Just as a rifle and traps were the most vital pieces of equipment in a mountain man's kit, there are certain programs and utilities which are core to the effectiveness of Hadoop. This set of software consists of the following tools:

Hadoop Common - a collection of libraries, utilities and tools developed to keep Hadoop functionality and programming as simple, easy to use and as far from 'assembly' level as possible. Included are file system, serialization and RPC (remote procedure call) libraries. The RPC library is particularly robust, as it must support client demands to run a given program on one server which requires a subroutine that runs on another. Such diffusion of functions can be a problem in terms of execution, and the Apache distribution has taken this concern well into account.

Hadoop Distributed File System (HDFS) - we briefly touched on this in the previous editorial. HDFS distributes data across a server cluster and makes more than one copy. The default settings are 64MB data blocks replicated 3 times. The file system has been developed to be highly scalable and, of course, fault tolerant thru its redundancy and virtualization. The programming interface is marvelously flexible, supporting Java, C and many other languages.

MapReduce - a highly flexible framework for distributed computation, this tool is exceptionally powerful and requires a detailed explanation of its functions and duties. MapReduce organizes a server cluster hierarchically as one Master node and multiple Worker nodes, with 'node' being equivalent to 'server.' When a client requests support for a job, MapReduce uses the Master node as the Job Tracker, with each Worker node becoming an individual Task Tracker. The Job Tracker copies the client program to each node, parcels tasks amongst the Task Trackers based on proximity to the data and puts them to work. As each Task Tracker finishes its assignment, it reports back "all done, boss," and the Job Tracker then aggregates results.
HDFS follows this hierarchy in parallel, recognizing the Master node as a 'Name node' that holds the metadata for the entire file system, with individual Data nodes beneath it. At the commencement of a job, HDFS responds to the client query by finding the closest data block for the job, with the Data node responding back to the client. If data changes in that block, the first or 'primary' data node forces all the other data nodes (secondaries) to update their blocks. Then the primary Data node reports back to the client. Any changes to data block distribution or metadata are updated in the Name node.

The hierarchy of servers and operations for both HDFS and MapReduce is illustrated in the diagram below.


The tools and utilities described above are part of an already far-flung and growing ecosystem, pictured below:

Source: Philippe Julio, "Big Data Analytics with Hadoop" (

One can readily discern the hierarchical nature of Hadoop. Nevertheless, the architecture is not a mess of latency-adding protocols and interfaces but is quite efficient. Granted - there are vulnerabilities in such an approach. But one must accept that while Hadoop is clearly fault-tolerant, there is no such thing as a software architecture which is fault - immune.

As evident from the above ecosystem diagram, there are quite a few other tools in the distribution essential to the proper operation of Hadoop. Below is a description of some of those tools (please note: as the Hadoop developer community is quite dynamic, the list and the individual capabilities of the tools continues to expand, and of this writing there are at least another 20 development projects underway.) 

The common theme among the tools is that Hadoop developers are instinctively demonstrating the same flexibility, practicality and resilience of frontiersmen, in that they completely eschew the insecurity-based egotism of NIH (Not Invented Here) development and freely adopt good ideas from outside sources. As we can see below, Hadoop community developers also share the frontiersman's knack for inventing colorful names for people, places and things: 

Data Access Tools

HBase - a clone, if you will, of Google's "Big Table" database. The software is written in Java, but is accessible also with Ruby, C++ and other languages. Like many a database, it is column oriented and distributed.Hardware fault tolerance is coded in. In the software stack, HBase is layered over HDFS.
Though it is described as a database tool, HBase is not a classic RDBMS. It was developed with unstructured data in mind. The DB is, as you might expect, highly scalable. ROOT and META tables assist clients in navigation.

Hive - this tool rides on top of MapReduce and HDFS. A highly flexible utility that can be used with PhP, Python and Java, Hive allows clients to summarize data and perform ad-hoc queries.

PIG - a rather amusingly named programming language and framework for MapReduce. It was developed to handle much of the intimate detail for clients in order to simplify programming for jobs that benefit from parallelism.

HCatalog - this is a table and storage management service. The tool offers a table abstraction so that clients need not concern themselves with how or where data is stored.

Data Transfer Tools

SQOOP - for importing existing formatted RDBMS data into Hadoop and vice versa.

FLUME - collects, aggregates and moves large selections of log files. A fault tolerant utility, it supports batching compression, filtering and transformation.

Management Tools

OOZIE - for batching and coordinating workflows.

CHUKWA - a tool which repurposes HDFS and MapReduce to manage large distributed systems.

ZOOKEEPER - detailed system management, including configuration information, synchronization, naming and a variety of other functions.

There are quite a few other tools available under the Apache Hadoop 'umbrella', many of which have been developed by commercial interests. These offerings tend to focus on optimizing, refining and enhancing the existing management/administration features of Hadoop and its current tools & utilities, improve its ability to handle Big Data problems, support parallelization in computing and facilitate the development of new tools. Below is a partial list, along with descriptions as applicable.

Mahout - the extension of Hadoop, with its virtualization and parallelism, to AI applications is an intuitively natural one. This tool was developed to support Machine Learning on Hadoop systems. It includes a variety of capabilities, including:
 - Clustering of text document by topic
 - Classification of new documents
 - Item 'set mining', wherein items are grouped based on query activities by clients
 - 'recommendation mining', where the tool proposes items to a client based on their behavior

WHIRR - supports the deployment and management of cloud services.

AMBARI - a web-based utility for managing Hadoop clusters.

Cloudera Impala - this is Cloudera's version of Hadoop, with enhanced interfaces for languages, improved SQL support, security and other optimizations.

HUE - an anacronym for Hadoop User Experience, begun by Cloudera and since turned over to the open source community. Fundamentally, it boils down to a web-based user interface.

Stinger - a community project that is making Hive more SQL-friendly and improving its ability to handle petabytes of data volume (yes, you read that right.)

POLYBASE - a Microsoft tool (yes, the Dark Side wants to play in this sandbox, too) which provides a framework for SQL work on both RDBMS and non-RDBMS data that bypasses MapReduce.

The above only scratches the surface of all the projects and endeavors swirling around and undertaken on behalf of Hadoop. There are many other capabilities being created, including backups for the Name node, incorporation of the Cassandra File System (CFS), disaster recovery, security, backup schemes, archiving improvements, data compression and so on. The opportunities for innovation appear to be endless.

A Blue Sky and a Range on the Horizon

Albert Bierstadt, "The Rocky Mountains, Lander's Peak," 1863 (source:

In Hadoop, we are seeing the story of Linux repeating itself - an industry-changing OS being developed by contributors not for a narrow commercial interest, but for the sake of advancing technology itself. I am far from being a collectivist, and despite the contradiction in their general voting patterns, very few scientists and engineers are either. There are things that we do as individuals in High Tech which evoke the same spirit that drove most Mountain Men - that fierce longing to explore a new frontier, regardless of the risk and hardship, simply because it's there.

There will be further editorials on Big Data over the coming weeks and months, including explorations of what it is that data scientists do and what kinds of topics are drawing their attention. Our adventures across the frontier are far from over, dear readers. :-)