Come see us.Precocity LLC
3400 N. Central Expressway, Suite 100
Richardson, TX 75080
Last week in Part 1, we looked at some of the technical reasons why big data projects fail. In Part 2, we shift the focus to the most common process and people challenges.
For the most part, “Agile” software development has replaced more traditional methods. In the majority of IT shops, the concepts of iterative development, frequent releases, and continuous integration / deployment are now considered de rigueur.
In order to be successful, Big Data projects must extend the Agile concepts a bit further.
In addition to the traditional implementation cycle, a successful Big Data methodology also includes the Experiment and Evaluate components. In the Experiment cycle, teams are working at a smaller scale, with negligible risk, to mine insights from the data and optimize the solution based on business objectives. This process must iterate rapidly unencumbered by normal IT processes and procedures, and is the essential input to a successful implementation cycle.
Once it is operational, the solution must be constantly evaluated in order to determine overall performance based on changing conditions in business objectives or in the data itself. As Google pointed out, in order to avoid the accumulation of technical debt, analytics-focused solutions need constant attention and evaluation. Naturally, this feedback becomes a key input into the Experiment cycle as it uncovers the need to tune algorithms and methods.
In traditional Business Intelligence projects, it has always been necessary to have a deep understanding of the data. The engineer must understand the sources, how they relate to each other and the fundamental business requirements necessary to produce the report in question. For example, “Create a report showing year over year sales by department.” We must understand where to find sales data, how it ties to a particular department, and how to apply any filters or special calculations; but there is no fundamental need to understand how and why customers make a purchase.
Analytics projects take this to the next level. In addition to a fundamental understanding of the data sources, the data scientist or engineer must also have a thorough understanding of the business context. In many cases, the starting point is a high-level business objective. For example, “Increase online revenue via a distinctly personalized customer experience.” In order to fulfill this objective, we must start on a quest to become very familiar with the business landscape by asking a string of questions.
One may derive the answers to these questions via inspection of the data or via conversations with associated subject matter experts. Regardless of how the answers are collected, the understanding they represent is necessary in order for formulate and test the hypotheses that will comprise the solutions to the initial business objective.
IT-driven Big Data projects with a “build it and they will come” approach rarely succeed. Prior to acquiring a sizable cluster with a collection of the latest tools, products and frameworks installed, there must be alignment on the key business objects, and equally important, a shared understanding as to how success will be measured against those objectives.
Once we understand the business objectives and success criteria, the experimentation cycle can begin to determine the extent to which we can meet the objectives and the optimal approach for doing so. One can accomplish this without significant investments in hardware or human resources, and it paves the way for a more educated build out of the final production environment.
Cloud-based environments tend to be well suited to the “start small” approach. They allow for the construction of a “right-sized” environment focused on research and experimentation that can quickly transform to a production-scale environment as necessary–all without the need for significant capital investment.
We can roughly divide a typical analytics project into the following tracks:
While each of the typical roles involved in the project will have its own set of responsibilities, it is essential that all of them have some involvement and knowledge of each of the areas above. The data engineer, responsible for implementing the solution at scale in a production setting, needs to have an understanding of the underlying analytics. The data scientist, while crafting complex algorithms and machine learning routines, must keep in mind the platform upon which the fruits of her labor will ultimately be deployed. A similar kind of overlapping concern applies to all other resources involved from creative, UX/UI specialists to business analysts.
The “throw it over the wall” mentality is a non-starter for big data projects. Constant collaboration (not just cooperation) is required from start to finish.
There is no question that machine learning is currently all the rage. Engineers and data scientists are flocking to learn and apply the latest techniques in the hopes of unlocking some hidden insight in their data.
In many cases, machine learning is the clear and obvious solution for a particular problem domain. However, in others, this isn’t always the case. Going blindly down this path without considering the entirety of the business context and without evaluating and comparing alternative approaches can lead to suboptimal results–particularly in cases where the business demands a clear explanation as to how a particular insight or decision was derived.
While big data projects are certainly complex and somewhat non-traditional, embarking on the journey with a solid understanding of the common pitfalls goes a long way towards avoiding a disappointing result. Precocity, with a combination of solid technical expertise and ZipTie, our big data accelerator, can help reduce the time to value and increase your project’s chances of success.
Introduction Google Cloud Composer is a managed version of Apache Airflow that runs on a GKE (Google Kubernetes Engine) cluster....
The proverbial wagon is packed and your organization has decided to move on from the current, on-premise data warehouse. Perhaps...