Feature Engineering vs. Machine Learning in optimizing customer behaviour

By Richard Boire

The debate on this topic is not a new one. What is the secret sauce in yielding improved modelling performance?  Is it  the inputs, features or variables of  a given predictive model  or is it  the specific mathematics that is used alongside these inputs or features?

Historically, practitioners including myself, have tended to argue that it is the inputs or the feature engineering component which yield the most value when building models. In fact, I wrote a paper several years ago which was published in the “Journal of Marketing Analytics” –May2013 entitled “Is predictive analytics for marketers really that accurate?” In that paper, I argued that the noise of the data or random error is very high when we are attempting to predict customer  marketing behaviour. As a result, simple or less advanced mathematical solutions are very effective in delivering the optimal solution. The real difference in building these predictive analytics solutions are the variables that one utilizes as inputs into a given model. In fact our latest research continues to demonstrate that inputs or variables within the model are critical to model performance. This is consistent with both big and small data.

Yet advances in machine learning and in particular artificial intelligence(AI) have amplified the need for more continued examination within this area. These new highly publicized technologies are not new and have been in existence for decades. In fact, the use of neural nets which is the fundamental math behind artificial intelligence, yielded very marginal results during the 90’s . In those days, the R&D applications were on its ability to predict images. No real breakthroughs were achieved and the continued focus on AI  was more R&D with the realization  that there were potential barriers to really leveraging the  benefits of artificial intelligence. These barriers comprised both the inability to process large volumes of data which would have allowed  the practitioner to more fully leverage the benefits of neural net technology. Essentially, this touches on the topic or theme of deep learning where practitioners are able to explore multiple hidden layers instead of just one single layer. The ability to leverage more advanced and complex neural networks has been one of just many positive outcomes that have arisen through advances in “Big Data” technology.  This newfound capability in deep learning  has been the breakthrough which has yielded the dramatic improvements and have now brought AI within the public realm.  In the area of image recognition, a focal point of most AI research, The National Institute of Standards and Technology (NIST) reports that results from its 2013 test of facial recognition algorithms show that accuracy has improved up to 30 percent since 2010.

These significant performance improvements from AI are exhibiting transformative effects across all industries. With these advances in AI, it warranted more investigation on our part in reexamining the benefits of AI versus our more traditional techniques. Specifically , we examined what this  impact might be within our experience of developing solutions  dealing with customer behaviour regarding both marketing response and insurance claim risk. As you might expect, there is no “magic” bullet answer here in saying that AI is the optimum technique under all scenarios. Volume of data and signal to noise are two key considerations in determining whether AI can provide significant benefits. As discussed in the paper I wrote several years ago, in the world of consumer behaviour, the signal to noise ratio in most cases is very low which would suggest that the more simple regression type techniques perform very well when compared to more advanced techniques. In fact, with smaller data volumes such as below 300M records, we are not observing significant improvements in lift.  However, with very large volumes such as over 1mm records, we are observing  significantly better results with neural net (AI) type techniques ranging from 25% to over 50% performance improvement. Despite the fact of potentially small signal to noise ratios, extremely large volume of records seem to be the primary impetus behind these improved results. Furthermore, we also see that the advantages of using neural net technology within a big data environment does mitigate the impact of not including or excluding certain features or variables within a model.  

One would think that significantly improved model performance  would result in more deployment of these solutions within business applications involving consumer behaviour(marketing response and insurance claim risk). But this has not been the case as our discomfort with rolling out with these results  is our inability to explain the meaning of the model and to identify which key variables have the most impact within the overall model. Neural net models are still considered “black box” solutions by most organizations today and many organizations are uncomfortable with applying solutions that do not provide at least a basic level of understanding. But from a data science standpoint, if these techniques are going to yield significantly improved results, then it is incumbent on us as practitioners to find approaches that essentially allow us to better understand these solutions.  More about how this might be accomplished will be the next topic of discussion at a later date.

Tell Us What You Think
  1. If you haven't left a comment here before, you may need to be approved by CMA before your comment will appear. Until then, it won't appear on the entry.
    Thanks for waiting. View CMA's Blogging Policy.

Tags: machine learning, feature engineering, data, analytics, ai, technology