Image courtesy of Jean Francois Podevin/theispot.com
This article is based on an in-depth study of the data science efforts in three large, private-sector Indian banks with collective assets exceeding $200 million.
The study included onsite observations; semistructured interviews with 57 executives, managers, and data scientists; and the examination of archival records.
The five obstacles and the solutions for overcoming them emerged from an inductive analytical process based on the qualitative data.
More and more companies are embracing data science as a function and a capability. But many of them have not been able to consistently derive business value from their investments in big data, artificial intelligence, and machine learning.1 Moreover, evidence suggests that the gap is widening between organizations successfully gaining value from data science and those struggling to do so.2
To better understand the mistakes that companies make when implementing profitable data science projects, and discover how to avoid them, we conducted in-depth studies of the data science activities in three of India’s top 10 private-sector banks with well-established analytics departments. We identified five common mistakes, as exemplified by the following cases we encountered, and below we suggest corresponding solutions to address them.
Mistake 1: The Hammer in Search of a Nail
Hiren, a recently hired data scientist in one of the banks we studied, is the kind of analytics wizard that organizations covet.3 He is especially taken with the k-nearest neighbors algorithm, which is useful for identifying and classifying clusters of data. “I have applied k-nearest neighbors to several simulated data sets during my studies,” he told us, “and I can’t wait to apply it to the real data soon.”
Hiren did exactly that a few months later, when he used the k-nearest neighbors algorithm to identify especially profitable industry segments within the bank’s portfolio of business checking accounts. His recommendation to the business checking accounts team: Target two of the portfolio’s 33 industry segments.
This conclusion underwhelmed the business team members. They already knew about these segments and were able to ascertain segment profitability with simple back-of-the-envelope calculations. Using the k-nearest neighbors algorithm for this task was like using a guided missile when a pellet gun would have sufficed.
In this case and some others we examined in all three banks, the failure to achieve business value resulted from an infatuation with data science solutions. This failure can play out in several ways. In Hiren’s case, the problem did not require such an elaborate solution. In other situations, we saw the successful use of a data science solution in one arena become the justification for its use in another arena in which it wasn’t as appropriate or effective. In short, this mistake does not arise from the technical execution of the analytical technique; it arises from its misapplication.
After Hiren developed a deeper understanding of the business, he returned to the team with a new recommendation: Again, he proposed using the k-nearest neighbors algorithm, but this time at the customer level instead of the industry level. This proved to be a much better fit, and it resulted in new insights that allowed the team to target as-yet untapped customer segments. The same algorithm in a more appropriate context offered a much greater potential for realizing business value.
It’s not exactly rocket science to observe that analytical solutions are likely to work best when they are developed and applied in a way that is sensitive to the business context. But we found that data science does seem like rocket science to many managers. Dazzled by the high-tech aura of analytics, they can lose sight of context. This was more likely, we discovered, when managers saw a solution work well elsewhere, or when the solution was accompanied by an intriguing label, such as “AI” or “machine learning.” Data scientists, who were typically focused on the analytical methods, often could not or, at any rate, did not provide a more holistic perspective.
To combat this problem, senior managers at the banks in our study often turned to training. At one bank, data science recruits were required to take product training courses taught by domain experts alongside product relationship manager trainees. This bank also offered data science training tailored for business managers at all levels and taught by the head of the data science unit. The curriculum included basic analytics concepts, with an emphasis on questions to ask about specific solution techniques and where the techniques should or should not be used. In general, the training interventions designed to address this problem aimed to facilitate the cross-fertilization of knowledge among data scientists, business managers, and domain experts and help them develop a better understanding of one another’s jobs.
In related fieldwork, we have also seen process-based fixes for avoiding the mistake of jumping too quickly to a favored solution. One large U.S.-based aerospace company uses an approach it calls the Seven Ways, which requires that teams identify and compare at least seven possible solution approaches and then explicitly justify their final selection.
Mistake 2: Unrecognized Sources of Bias
Pranav, a data scientist with expertise in statistical modeling, was developing an algorithm aimed at producing a recommendation for the underwriters responsible for approving secured loans to small and medium-sized enterprises. Using the credit approval memos (CAMs) for all loan applications processed over the previous 10 years, he compared the borrowers’ financial health at the time of their application with their current financial status. Within a couple of months, Pranav had a software tool built around a highly accurate model, which the underwriting team implemented.
Unfortunately, after six months, it became clear that the delinquency rates on the loans were higher after the tool was implemented than before. Perplexed, senior managers assigned an experienced underwriter to work with Pranav to figure out what had gone wrong.
The epiphany came when the underwriter discovered that the input data came from CAMs. What the underwriter knew, but Pranav hadn’t, was that CAMs were prepared only for loans that had already been prescreened by experienced relationship managers and were very likely to be approved. Data from loan applications rejected at the prescreening stage was not used in the development of the model, which produced a huge selection bias. This bias led Pranav to miss the import of a critical decision parameter: bounced checks. Unsurprisingly, there were very few instances of bounced checks among the borrowers whom relationship managers had prescreened.
The technical fix in this case was easy: Pranav added data on loan applications rejected in prescreening, and the “bounced checks” parameter became an important element in his model. The tool began to work as intended.
The bigger problem for companies seeking to achieve business value from data science is how to discern such sources of bias upfront and ensure that they do not creep into models in the first place. This is challenging because laypeople — and sometimes analytics experts themselves — can’t easily tell how the “black box” of analytics generates output. And analytics experts who do understand the black box often do not recognize the biases embedded in the raw data they use.
The banks in our study avoid unrecognized bias by requiring that data scientists become more familiar with the sources of the data they use in their models. For instance, we saw one data scientist spend a month in a branch shadowing a relationship manager to identify the data needed to ensure that a model produced accurate results.
We also saw a project team composed of data scientists and business professionals use a formal bias-avoidance process, in which they identified potential predictor variables and their data sources and then scrutinized each for potential biases. The objective of this process was to question assumptions and otherwise “deodorize” the data — thus avoiding problems that can arise from the circumstances in which the data was created or gathered.4
Mistake 3: Right Solution, Wrong Time
Kartik, a data scientist with expertise in machine learning, spent a month developing a sophisticated model for analyzing savings account attrition, and he then spent three more months fine-tuning it to improve its accuracy. When he shared the final product with the savings account product team, they were impressed, but they could not sponsor its implementation because their annual budget had already been expended.
Eager to avoid the same result the following year, Kartik presented his model to the product team before the budgeting cycle began. But now the team’s mandate from senior management had shifted from account retention to account acquisition. Again, the team was unable to sponsor a project based on Kartik’s model.
In his third year of trying, Kartik finally got approval for the project, but he had little to celebrate. “Now they want to implement it,” he told us, with evident dismay, “but the model has decayed and I will need to build it again!”
The mistake that blocks banks from achieving value in cases like this is a lack of synchronization between data science and the priorities and processes of the business. To avoid it, better links between data science and the strategies and systems of the business are needed.
Senior executives can ensure the alignment of data science activities with organizational strategies and systems by more tightly integrating data science practices and data scientists with the business in physical, structural, and process terms. For example, one bank embedded data scientists in business teams on a project basis. In this way, the data scientists rubbed elbows with the business team day to day, becoming more aware of its priorities and deadlines — and in some cases actually anticipating unarticulated business needs. We have also seen data science teams colocated with business teams, as well as the use of process mandates, such as requiring that project activities be conducted at the business team’s location or that data scientists be included in business team meetings and activities.
Generally speaking, data scientists ought to be concentrating their efforts on the problems deemed most important by business leaders.5 But there is a caveat: Sometimes data science produces unexpected insights that should be brought to the attention of senior leaders, regardless of whether they align with current priorities.6 So, there is a line to be walked here. If an insight arises that does not fit current priorities and systems but nonetheless could deliver significant value to the company, it is incumbent upon data scientists to communicate this to management.
We found that to facilitate exploratory work, bank executives sometimes assigned additional data scientists to project teams. These data scientists did not colocate and were instructed not to concern themselves with team priorities. On the contrary, they were tasked with building alternative solutions related to the project. If these solutions turned out to be viable, the head of the data science unit pitched them to senior management. This dual approach recognizes the epistemic interdependence between the data science and business professionals — a scenario in which data science seeks to address today’s business needs as well as detect opportunities to innovate and transform current business practices.7 Both roles are important, if data science is to realize as much business value as possible.
Mistake 4: Right Tool, Wrong User
Sophia, a business analyst, worked with her team to develop a recommendation engine capable of offering accurately targeted new products and services to the bank’s customers. With assistance from the marketing team, the recommender was added to the bank’s mobile wallet app, internet banking site, and emails. But the anticipated new business never materialized: Customer uptake of the product suggestions was much lower than anticipated.
To discover why, the bank’s telemarketers surveyed a sample of customers who did not purchase the new products. The mystery was quickly solved: Many customers doubted the credibility of recommendations delivered through apps, websites, and emails.
Still looking for answers, Sophia visited several of the bank’s branches, where she was surprised to discover the high degree of trust customers appeared to place in the advice of relationship managers (RMs). A few informal experiments convinced her that customers would be much more likely to accept the recommendation engine’s suggestions when presented in the branch by an RM. Realizing that the problem wasn’t the recommender’s model but the delivery mode of the recommendations, Sophia met with the senior leaders in branch banking and proposed relaunching the recommendation engine as a tool to support product sales through the RMs. The redesigned initiative was a huge success.
The difficulties Sophia encountered highlight the need to pay attention to how the outputs of analytical tools are communicated and used. To generate full value for customers and the business, user experience analysis should be included in the data science design process. At the very least, user testing should be an explicit part of the data science project life cycle. Better yet, a data science practice could be positioned within a human-centered design frame. In addition to user testing, such a frame could mandate user research on the front end of the data science process.
While we did not see instances of data science embedded within design thinking or other human-centered design practices in this study, we did find that the shadowing procedures described above sometimes operated as a kind of user experience analysis. As data scientists shadowed other employees to understand the sources of data, they also gained an understanding of users and channels through which solutions could be delivered. In short, the use of shadowing in data science projects contributes to a better understanding of the processes that generate data, and of solution users and delivery channels.
Mistake 5: The Rocky Last Mile
The bank’s “win-back” initiative, which was aimed at recovering lost customers, had made no progress for months. And that day’s meeting between the data scientists and the product managers, which was supposed to get the initiative back on track, was not going well either.
Data scientists Dhara and Viral were focused on how to identify which lost customers were most likely to return to the bank, but product managers Anish and Jalpa wanted to discuss the details of the campaign to come and were pushing the data scientists to take responsibility for its implementation immediately. After the meeting adjourned without a breakthrough, Viral vented his frustration to Dhara: “If data scientists and analysts do everything, why does the bank need product managers? Our job is to develop an analytical solution; it’s their job to execute.”
By the next meeting, though, Viral seemed to have changed his mind. He made a determined effort to understand why the product managers kept insisting that the data scientists take responsibility for implementation. He discovered that on multiple occasions in the past, the information systems department had given the bank’s product managers lists of customers to target for win-back that had not resulted in a successful campaign. It turned out that using the lists had been extremely challenging, partly due to an inability to track customer contacts — so the product managers felt that being given another list of target customers was simply setting them up for another failure.
With this newfound understanding of the problem from the point of view of the product managers, Viral and Dhara added to their project plan the development of a front-end software application for the bank’s telemarketers, email management teams, branch banking staff, and assets teams. This provided them with a tool where they could feed information from their interactions with customers and make better use of the lists provided by the data science team. Finally, the project moved ahead.
Viral and Dhara’s actions required an unusual degree of empathy and initiative. They stepped out of their roles as data scientists and acted more like project leaders. But companies probably should not depend on data scientists in this way, and they may not want to — after all, the technical expertise of data scientists is a scarce and expensive resource.
Instead, companies can involve data scientists in the implementation of solutions. One bank in our study achieved this by adding estimates of the business value delivered by data scientists’ solutions to their performance evaluations. This motivated data scientists to ensure the successful implementation of their solutions. The bank’s executives acknowledged that this sometimes caused data scientists to operate too far outside their assigned responsibilities. However, they believed that ensuring value delivery justified the diversion of data science resources, and that it could be corrected on a case-by-case basis, if the negative impact on the core responsibilities of data scientists became excessive.
The mistakes we identified invariably occurred at the interfaces between the data science function and the business at large. This suggests that leaders should be adopting and promoting a broader conception of the role of data science within their companies — one that includes a higher degree of coordination between data scientists and employees responsible for problem diagnostics, process administration, and solution implementation. This tighter linkage can be achieved through a variety of means, including training, shadowing, colocating, and offering formal incentives. Its payoff will be fewer solution failures, shorter project cycle times, and, ultimately, the attainment of greater business value.
Read more: sloanreview.mit.edu