Insights & Data and IBM Analytics

By Weronika Skotnicka

The Big Picture on Big Data and Cognos

IBM has a long history of supporting major open source projects and the most widely adopted open standards. Their enterprise customers have benefited from the flexibility, choice, and innovation that come with the open source philosophy. Major projects include SOA (Service-Oriented Architecture), Linux, Eclipse and now Hadoop. The big data analytics open source offering is known as the IBM Open Platform with Apache Hadoop. The commercial side of this platform, announced in early 2015, is a suite of products for the enterprise branded as BigInsights.

To better understand IBM’s big data offerings around Hadoop and its open data platform, it is helpful to put this in context of the overall vision for the platform and the three phases of the IBM Big Data Analytics lifecycle:

Pull in all types of data from disparate sources Put the data into a business context Produce intelligent, data driven business outcomes, for example, operational efficiency, customer engagement or risk management

IBM endeavors to cover a lot of business territory with its analytics platform. For the enterprise IT department, the technology enables data integration, governance, security and regulatory compliance. For line of business managers, the analytics environment is the home of customer and operational intelligence. While analytics play an important role in increasing operational efficiency and eliminating business process bottlenecks, it is the customer-centric analytics that have captured the imagination of business executives. Big data analytics offers many opportunities for improving customer relationships and increasing engagement across marketing channels.

A common big data use case is delivering relevant promotions to customers. We all share the experience of receiving credit card offers in the mail from the bank and tossing the envelope directly into the recycling bin without even thinking about it. Despite the dismal response rate, it was cost effective for the bank to send the same direct mail piece to everyone. With a big data platform, it is possible to develop customer profiles and create targeted offers for each segment. For example, customers that have a single account and a short customer history would be candidates for a different array of promotions than someone who has been a customer for decades. The cost of amassing enough data and having the processing power to crunch the numbers in a timely fashion has dropped enough to make it profitable to do so.

With digital advertising and social media data, analysis is required on huge amounts of unstructured data. A couple of years ago this was experimental at best, but now Hadoop software enables capturing and processing unprecedented amounts of data. It complements the enterprise data warehouse and is an integral part of the business intelligence ecosystem.

Open data platform ODPi

The ODPi open data platform is a consortium of IBM and 18 other enterprise software vendors working together to maximize the adoption of technologies based on Apache Hadoop. The goal of ODPi is to accelerate software development by providing a standard Hadoop solution on which an applications can be run, whether it is commercial software, open source, or custom code developed in-house. This gives enterprise customers assurance that they are not locking themselves into a single vendor’s Hadoop solution. It also permits using a Hadoop implementation with products from multiple vendors. For Hadoop to fulfill its role as an enterprise data source, it must accommodate a broad audience who will be using many different applications.

To that end, the ODPi provides a core platform of agreed on and tested big data Apache Hadoop modules. This is the ODPi standard, on which the vendors build their applications. For example, Hortonworks, IBM Open Platform 4.0 with Apache Hadoop, EMC Pivotal HD 3.0 and Infosys IIP all adhere to the ODPi standard. Analytics software vendors or in-house development shops can concentrate on developing applications further up the stack, knowing that the Hadoop core adheres to a standard and its application will interoperate with any compliant Hadoop system. This accelerates development, promotes code re-use, and simplifies the technical architecture. Implementing a Hadoop distribution that adheres to the ODPi standard means not being locked into a proprietary technology.

As a standard, only time will tell if the ODPi will have a lasting impact. The organization has been criticized as being nothing more than a joint marketing effort for vendors pushing their own commercial flavor of Hadoop. Also to note are the big data vendors who are conspicuous by their absence: Cloudera, MapR and Amazon (AWS – EMR Elastic MapReduce).

IBM BigInsights and Cognos

On top of Hadoop, IBM has developed a suite of big data and analytics tools under the BigInsights brand. There are tools for scaling and managing the platform (BigInsights Enterprise Management), a machine learning engine (BigInsights Data Scientist – Decision Trees, PageRank, Clustering) and a data exploration and discovery tool (BigSheets). Of particular interest to Cognos customers is BigSQL which runs SQL queries against Hadoop or in other words, BigSQL permits Cognos to use Hadoop as a data source.

This is interesting as data stored in Hadoop only becomes useful when it is put into a business context. Cognos Analytics (V11) is well suited for this role. It is a powerful tool for BI developers and business power users, enabling the presentation of Hadoop data in a visually appealing format for executives, managers and line of business staffers. Big data becomes much more valuable when it can be interpreted and understood by non-technical users.

Cognos supports connecting to Hadoop using Hive, which translates code from SQL to MapReduce to get results from Hadoop. There will always be some latency as Hive cannot change the nature of MapReduce, which distributes processing work across Hadoop nodes. The query is split into discrete chunks of work and the results are assembled as they are returned. SQL join conditions, which are commonplace in Cognos generated SQL, create an additional layer of complexity for MapReduce. This further increases the query processing time and will prevent some queries from running at all.

IBM addresses these problems with BigSQL. It works on the same Hive megastore, but produces faster and more reliable results. BigSQL is not just about performance, but also assuring that the SQL query will run. It optimizes SQL for MapReduce so that it will run faster and prevent having to modify the Cognos Framework Manager model or hand code SQL inside of Cognos. An alternative to Hive and BigSQL is Impala, which makes similar claims to performance.

Success with big data requires getting key pieces to work together. With BigInsights and BigSQL, IBM is providing tools for facilitating Hadoop adoption, including interoperability with existing Cognos infrastructure and functionality.

Resources

Our on-demand webinar: Running Cognos on Hadoop

Video of Hive and BigSQL performance test results

IBM BigSQL technology sandbox demo cloud environment for Hadoop and BigSQL:

Thanks to David Currie for contributing this article. David is a long-time business analytics consultant. He blogs about business intelligence and big data at

DATA+ Big Data & Business Analytics:

Chief Data Scientist,

mBank

Jest profesorem nadzwyczajnym w Instytucie Informatyki Politechniki Warszawskiej oraz udziałowcem w spółce Polidea, tworzącej oprogramowanie dedykowane dla urządzeń mobilnych. Piotr Gawrysiak posiada stopień doktora informatyki nadany przez Wydział Elektroniki i Technik Informacyjnych Politechniki Warszawskiej. Rozprawę habilitacyjną obronił zaś na Wydziale Historycznym Uniwersytetu Warszawskiego, ukończył także studia magisterskie w zakresie zarządzania gospodarczego na Wydziale Zarządzania Uniwersytetu Warszawskiego. Jego zainteresowania naukowe dotyczą systemów zarządzania bazami danych, eksploracji danych ze szczególnym uwzględnieniem dokumentów tekstowych i języka naturalnego oraz sieci Internet, mobilnych technologii IT, odkrywania i zarządzania wiedzą oraz sztucznej inteligencji. Interesują go także związki techniki z procesami społecznymi i kulturowymi. Był promotorem ponad 20 prac magisterskich i jest autorem lub współautorem ponad 60 publikacji naukowych, w tym czterech książek, z czego większość została opublikowana w zagranicznych lub międzynarodowych czasopismach i materiałach konferencyjnych. W 2012 roku brał udział w programie Top500 Ministerstwa Nauki i Szkolnictwa Wyższego odbywając staż na Uniwersytecie Stanforda. Prof. Gawrysiak kierował licznymi projektami naukowo-badawczymi i aplikacyjnymi dla przemysłu (m.in. Procter & Gamble, T-Mobile, Samsung, France Telecom) i organizacji międzynarodowych (ONZ, IFAD, UNEP), zarówno jako przedstawiciel uczelni jak też i niezależny konsultant. Był ewaluatorem projektów w ramach programów ramowych Komisji Europejskiej. Obecnie pełni także funkcję przedstawiciela RP w komitecie TC14 IFIP.

Insights & Data and IBM Analytics

Driving Digital Transformation

IBM is one of our most strategic and co-operative partners for the Insights & Data global practice – together we combine our analytics and business insights expertise with best-in-class IBM technologies, such as Watson, to create a number of industry-specific solutions that benefit our joint clients in their digital transformation journey.

As an IBM Platinum Business Partner, Capgemini has access to exclusive, advanced training and architecture resources that allow us to build innovative big data and analytics solutions for our clients.

Based on Capgemini’s SMART Analytics Platform, a robust analytics engine powered by IBM technologies in NLP and Cognitive Analytics, it analyzes data generated from multiple customer touch points throughout the customer journey, primarily in the banking sector. This provides actionable information to improve customer experiences, minimize customer attrition, encourage customer loyalty, promote customer acquisition, and grow share of wallet.

Complying with the EU’s GDPR from May 2018 is both a regulatory and a strategic imperative. Capgemini and IBM have come together to help clients minimize the risk of non-compliance and maximize their customers’ trust. With a clear approach that takes away the complexity, our end-to-end solution provides a clear roadmap for meeting GDPR requirements including Privacy Impact Assessment, data discovery, protection and remediation.

Many companies are struggling to get full value from their data, or obtain insights that contribute real competitive advantage. Our Insight-Driven Transformation is an innovative approach to disrupt operational processes using big data and predictive analytics. Typically delivered as-a-service, we manage the complex data science tasks from your organization so you can focus on delivering the business benefits.

One such service is Smart Leakage Management. By integrating and analyzing data previously tapped, we can generate the insights needed to tackle key optimization problems. The Capgemini analytics platform, using IBM Watson® Data Platform, makes this possible, and has achieved dramatic results.

Another service, Energy Optimization, where Energy usage and optimization of industrial assets; or large real estate is profiled to enable real-time changes to reduce energy consumption.

Capgemini’ Smart Asset Management solution is a business analytics solution that provides a 360 degree view into asset performance. It allows better tracking and efficient management of assets that are critical to organizational success.

This is an innovative and powerful solution to HR challenges around talent acquisition – attracting the right talents before the competition – and mobility – matching internal resources with open positions. Capgemini & IBM have developed a cutting-edge HR solution that leverages Capgemini cognitive expertise and based on Watson Explorer and Big Insights IBM tools.

Capgemini leaders from Insights & Data discuss the value of leveraging IBM technology to benefit clients in their digital transformation journey.

With over 600 dedicated analytics resources, Capgemini has established a mature competency center and lab focusing on IBM Watson Analytics and Cognos capabilities, including over 80 proven assets and accelerators to enable your organization to make the best possible business decisions.

To find out how you can harness Insights & Data and IBM Analytics to drive innovation in your organization, please contact:

Anne Aussems, IBM Analytics & Cognitive Lead, Capgemini