Financial data science is a subfield of data science that also employs data-centric analysis to answer specific business questions and forecast possible future financial scenarios. More generally, financial data science involves the application of new and established data science methods and techniques to business and financial data, to gain new insights into trends and relationships in the data that are not previously revealed in standard financial theories and models. Using financial data science techniques to solve financial problems is not a replacement for the solutions proposed by financial and economic theories. They are merely a set of complementary tools that allow us to extend the scope of our understanding of how financial variables behave and consequently derive better forecasts of their future directions.
Financial Data Science Complements Financial and Economic Theories
In reality, most financial theories do not work very well in the real world because they rely on assumptions that are often unrealistic or theories based on abstractions of the real world that are too limiting to function in modern societies. Financial data science simply allows us to synergistically combine the power of modern computing resources, statistics, engineering, and programming to understand and solve a myriad of financial problems. These include financial decision-making, asset allocation, trading strategies, performance evaluation, capital sourcing, capital expenditure analyses, revenue and profit optimization, risk measurement and risk management, operation optimization, product pricing, and placing investment bets.
Aspects of Financial Data Sciences
Financial data science is a catchall term for a range of data science techniques that are applied to financial and business data. In my upcoming book, I discuss and showcase the theory and applications of the five aspects of these techniques using real-world examples.
Descriptive Analytics:
This involves creating a summary of the key points in the data so that the end-user or decision-maker can quickly act on it (for example, dashboards and KPIs). Descriptive Analytics also allows us to convey the value of the information embedded in the data compellingly to invoke actions or decisions from the recipient of the information (for example, using heatmaps to convey the intensities of events or relationships). The sub-categories of this aspect are visual descriptive statistics (or visualizations) and numerical descriptive statistics (commonly known as summary statistics). In many domains of finance, speed, and succinctness are crucial to maximizing the operational value of financial information. Hence we often rely on descriptive analytics to convey such information (for example stock charts, technical indicators, measures, and). Chapter Two of the book presents examples of how visual and numerical descriptive statistics are used for portfolio evaluation. Chapter Three of the book presents examples of how visualizations can be used to diagnose and resolve data quality issues.
Diagnostic Analytics
Real-world phenomena often involve a lot of complex interactions between events, agents, and nature. To study these phenomena in a controlled setting, we often rely on models, which are typically simplified replicas of the phenomena. While other fields of science might employ physical replicas of the real world, financial data scientists rely on mathematical replicas. The process of creating mathematical replicas of the real world is simulations. In simulations, we generate synthetic data about the real world and then study that data to understand how the phenomena would behave if certain conditions were met. For example, financial advisors use simulation to estimate how much money you will have in your retirement account at retirement if you follow the recommended investment plan and the market conditions do not deviate significantly from the norm. Other uses of simulation include capital investment analysis, risk management, and decision-making. In reality, simulations are used in a wide range of products and services that you interact with. You are just not aware it is what is generating that aspect of the product or service (for example video games, thrill rides, lotteries, recommendation engines, training software, etc). Chapters Four and Five of the book highlights common types of stochastic processes that are used for building simulations in finance, and the applications of simulations and resampling techniques for solving financial problems.
Financial Econometric
Financial econometrics is a collection of inferential statistical methods that are used in financial and economic models of the real world. Econometrics involves the use of statistical, structural, and descriptive models to prove or disprove economic theories of real-world phenomena. In financial econometrics, we create a mathematical model of the real-world phenomenon based on the theory proposed. We then test if the postulation of the theory actually holds given the assumptions and conditions described in the theory (in other disciplines, this is commonly known as inferential statistics). For example, a popular theory in finance that is known as the efficient market hypothesis states in its weak form that it is impossible to consistently obtain superior investment performance (on a risk-adjusted basis) by trading on publicly market-related information. Certainly, we can test this theory out in an econometric model built using a trading rule that relies on publicly available market data. If the trading rule does not consistently outperform a passive investment strategy in multiple tests, then the postulation becomes an established theory. Unfortunately, we don’t have that many established theories in finance, because most financial theories don’t work very well in the real world. So many of the seminal questions raised in the finance profession still remain unsettled to this day. Nevertheless, you should note that financial and economic theories still form the core basis for many of the techniques that you will employ as a financial data scientist. Chapters Three and Six of the book present examples of the common types of econometric models that are used by financial data scientist
Predictive (Financial) Analytics
Financial analytics is different from financial econometrics in the sense that while financial econometrics is used to prove or disprove financial phenomena or relationships, financial analytics is used to find those financial phenomena and relationships that are opaque to the naked eye but may be buried in large amounts of data. The models we apply in predictive settings are comprehensive in the sense that they can be used to test out theories and predict the values of the phenomena, given what we know about the phenomena or the variables that may be connected to them. In general, predictive models fall into two broad categories: statistical or econometric models and machine-learning models. Although many statistical/econometric and machine models share similar mathematical frameworks, they differ distinctly in many ways. Econometric models typically do not require a large amount of data, whereas machine learning models require large amounts of data. Econometric models typically employ additive parametric specifications, while machine learning models may or may not employ additive parameters. The numbers of parameters in econometric models are usually small to medium, while machine learning models typically have large numbers of parameters. The accuracy of econometric models is judged by confidence intervals, while machine learning models use fit statistics. Econometric models use p-values to draw inferences, while machine learning models use prediction accuracy. Econometric models are typically interpretable, while machine learning models are not directly interpretable, hence the reliance on prediction accuracy.
Prescriptive Analytics (Financial Optimizations)
Prescriptive analytics involves the use of data and computational algorithms to identify the critical factors and optimal course of action to pursue when solving business problems. Although its origin is in operation research, prescriptive analytics is invariably a data science technique, in the sense that it exploits the numerical relationships between the choice variables and the impact of the fluctuations in these variables on the decision pattern to determine the optimal course of action. Prescriptive analytics are also sometimes needed to extend the insights acquired from predictive models. In finance, prescriptive analytics is mostly used to solve financial optimization problems. These include linear optimization problems such as revenue and profit maximization, non-linear optimization problems such as portfolio optimization, and stochastic and robust optimization problems, which are common in the risk management and governance domain. In reality, many of the estimation methods that are used in econometric and machine learning models involve the optimization of some type of likelihood or loss function. Therefore, you can regard them as special types of optimization problems. Chapter Six of the book presents common methods of estimation and their formulation as optimization problems. Chapter Eight of the book shows practical examples of how optimizations are used to solve various decision problems in finance, including bond portfolio allocations, asset-liabilities management (ALM), performance attribution, financial engineering, capital investment decisions, and profit maximization. Chapter Nine of the book focuses solely on the application of optimization to various types of portfolio problems, including mean-variance optimization (MVO), Black-Litterman optimization (BLO), Risk-Parity optimization (RPO), Stochastic portfolio optimization (Mean-VaR Optimization), and Robust portfolio optimization (Scenario-Based Optimizations).
Although some of these topics are fairly advanced mathematical concepts, the book provides non-technical explanations for many of the concepts, so that readers who are only interested in the practical applications of the concepts can review them and then proceed straight to the example sections.