Automated Data Ingestion and Blending Case Study

Neiman Marcus

Precocity was selected to work with Neiman Marcus to build an advanced analytics platform from the ground up that would ultimately provide the ability to store, synthesize and analyze large amounts of data from numerous sources. The first goal of the platform was to provide a notion of customer identity that was purpose-built for driving 1-to-1 recommendations across all channels of customer interaction. To accomplish this identity resolution, we built a sophisticated data pipeline to automate the ingestion of multiple sources (clickstream, online sales, customer/clientele, the point of sale, sales audit, email, etc.). This system was designed to onboard different source systems with minimal configuration changes- thereby decreasing the overall turnaround time to ingest a new data feed. 

The batch component of this workflow landed about 450GB of raw data per day in HDFS and staged it in such a way to provide schema resiliency to downstream processes if/when the source data format changed. The real-time component of the pipeline streamed updates through Kafka/Storm into Cassandra and Couchbase and was followed by micro-batched ingestion into HDFS using Gobblin. 

After automating the ingestion of raw data from disparate systems, we normalized and blended it into a common format to extract identifiable customer attributes (fingerprinting), ex: name, address, email, phone, IP address, cookie. These fingerprints formed the basis for an unsupervised, graph-centric clustering that grouped related fingerprints together to essentially form a “customer”. For example, an e-mail ID (coming from the website source system), a first name, last name, phone-number (coming from a physical store point of sale system) could be grouped together in the same cluster and hence define a customer and his/her attributes. The resulting clusters could also be compared to and reconciled with the CRM’s definition of a customer. 

The diagram below is an example that highlights the different definitions of customer and the need for reconciliation:

Scatter Plot of User Behavior

This automated ingestion, blending, and cleaning provided Neiman Marcus with the foundation necessary to derive customer profiles and implement various personalization use cases- such as dynamic email, geo-fenced mobile recommendations, and recommendations to in-store associates. In fact, 80% of Neiman Marcus’ digital revenue utilizes our customer identity resolution and associated recommendation models. It also provides a scalable platform which can onboard new data feeds quickly and the infrastructure and tools needed to troubleshoot production issues effectively. 

Want to talk about how data science can grow your business?

Contact us today. Our data scientists are not standing by, they are doing data-sciencey things, but they will get back to you when the data compiles.