Spark Muscles Into Big Data Processing
It comes as no great surprise that the Apache Spark architecture has been horning in on the batch processing domain once controlled by Hadoop's MapReduce. But that's only part of the story. With data processing, streaming and machine learning capabilities on its résumé, the open source engine is learning to get along entirely without Hadoop in certain applications. In fact, one industry analyst cautiously sees a day when Spark could declare total independence, potentially bust up Hadoop cluster dominance and link separately with other Apache technologies.
In this three-part handbook, senior news writer Jack Vaughan examines the distinct advantages the Apache Spark architecture has over MapReduce. Also highlighted is how Spark's ability to process and analyze streaming data is helping detect fraudulent activities at a major banking and credit-card company. Next, Spark 2.0's upcoming upgrades to analytics speed, machine learning libraries, SQL support and stream processing are detailed. To close, Vaughan and senior news writer Ed Burns look at combining Spark and NoSQL databases in operational analytics applications, which could help broaden the use of both technologies.