This point-of-view paper highlights selected papers and studies that applied machine learning to time series (event series) for the purpose of risk management in financial institutions. What varies between the studies selected are the data stream/time series and processing used. This paper is, by no means, an analysis of the state of art in research, rather it is a series of touchpoints that exposes:
the increasing comfort in using these techniques in the financial industry.
the relative importance of events with various attributes.
highlights of key techniques relevant to the Cerebri Value System.
Cerebri AI developed the Cerebri Value System, which is powered by machine learning and employs a radical new method of lifting customer success. Cerebri Values quantifies each customer’s commitment to a brand or product and also dynamically predicts “Next Best Actions” at scale, which helps large companies focus on the highest-ROI tactics for accelerating profitable growth. In the case of financial or insurance institutions, risk is an essential element of this design. It is in that context that linkage of these papers to Cerebri AI is highlighted.
Charpignon, Marie-Laure, Enguerrand Horel, and Flora Tixier, “Prediction of consumer credit risk.” Stanford University (2014)
Charpignon et al. used machine learning to predict consumer credit risk. They used four types of models (logistic regression, classification and regression trees, gradient boosting trees, and random forest), applied to a wide range of data points including: age of the borrower, number of dependents in family, monthly income, monthly expenditures, total credit card balance/total credit card limits, and payment statistics. As expected, machine learning was able to predict defaults with good accuracy. The relevant element of this analysis from a design perspective was the overfitting of the random forest algorithms used. This overfitting indicates that there are redundancies in the criteria used to do traditional risk analysis. These criteria are often computed from raw data rather being raw data. Computed data streams are more likely to be correlated with one another and more prone to overfitting. Using time series (aka customer journeys) of raw data points should avoid this overfitting and improve performance. Reinforcing learning systems should be able to exploit raw data properly as well as, potentially, integrate business rules.
Khandani, Amir E., Adlar J. Kim, and Andrew W. Lo. “Consumer credit-risk models via machine-learning algorithms.” Journal of Banking & Finance 34 (2010): 2767-2787
Khandani et al. applied machine learning to events (time series) to perform risk analysis for consumer credit default and delinquency. They used transaction-level, credit-bureau, and account-balance data for individual consumers. Some of the attributes used were the type of expenditures (discretionary vs. non-discretionary, car oriented, cash withdrawals and the like), going beyond the slower-varying traditional scores. Their forecasts were very accurate in predicting events up to one year in advance. It shows the promise of using raw event series modeling attributes of different natures. An important element highlighted was how to manage broad economic variations that impact/baseline consumer credits.
Sousa, Maria Rocha, Joao Gama, Elisio Brandao, “Introducing Time-Changing Economics into Credit Scoring,” FEP working papers n. 513 November 2013 ISSN: 0870-8541
Sousa et al. compared a traditional (e.g., not time-series based) framework with a framework using time-changing factors, including internal default analysis and macroeconomic trends. They concluded that introducing data streams is suitable for dealing with temporal degradation of credit scoring models and avoid specific drifts.
Morison, J.F. “Marrying Credit Scoring and Time-Series Data” published in Risk Management Association journal (May 2010)
Morrison integrated credit scoring information and macro-economic data organized as time series. Rather than using broad macro-economic data lifted from third source and risk aggregation bias, he aggregated the time series related to the behavior of consumers in the same geographic area, subjected thus to the same macro-economic behavior. A combination of macro derived data and aggregate data is probably the sturdiest approach going forward.
Events in consumer journeys do not, not the most past, follow a regular pattern. Payroll payments, automatic payments are obvious exceptions. Gaps in events create variability in the quality of the raw data. This variability can be handled by modeling the variance of the data (noisier data have larger variance). Variations/dispersion in data quality has been understood and modeled through changes in the variance of the data in ARCH (autoregressive conditional heteroscedasticity) models. ARCH models are commonly employed in modeling financial time series that exhibit time-varying volatility and are well suited for macro trends. Cerebri AI uses memory-based designs which is well-suited to process event series with time lags of unknown size and duration between important events from the consumer journeys.
The application of machine learning to event streams to predict risks is a maturing field. Cerebri has incorporated best-in-class techniques and its proprietary metric of customer commitment (the Cerebri Value) to deliver predictions, recommendations, and improve processes.
Alain Briançon is the Vice President of Data Science at Cerebri AI.