Artificial intelligence (AI), machine learning, computation techniques and technology generally is increasing in sophistication at a rapid rate. Both scientists and investors are benefiting from these advances as they are more and more able to use technology to access, store and analyse huge amounts of data, in ways which would have been unimaginable just a decade ago.
Investment managers have always used traditional data sets—such as economic indicators, profit & loss statements, and balance sheets—to make decisions. However, being able to analyse bigger data sets—larger, more complex data, often from new sources—is positive, because it allows for more and more detailed insights. It is the rise of our ability to analyse alternative data sets in meaningful ways which has really begun to interest investment managers.
What do the terms ‘big data’ and ‘alternative data’ mean?
The term big data refers to data which comes in larger more complex sets, often from new sources. It is data which contains greater variety, and arrives in increasing volumes with higher velocity – four characteristics we refer to as the four Vs of big data.
- Volume refers to the amount of data: High volumes of often unstructured data of unknown value. For example, twitter data feeds and clicks on a webpage or a mobile app, for example. Data like this can be tens of terabytes or hundreds of petabytes.
- Velocity refers to the rate at which data is received and potentially acted on – increasingly in real or near real time
- Variety refers to the many types of data available – some structured, some unstructured such as text, audio or video.
- Veracity refers to the quality of the content from which the data is sourced. This is particularly important now when alternative data is exploding, but only a few data sets and providers offer sufficient quality.
Alternative data is often seen as the intersection of big data and investment research. In the financial industry, alternative data sets refer to information not contained in traditional sources like annual reports or P&L statements. Rather, it is data compiled from sources such as mobile devices, satellite images, public records, and websites. Data from these sources is often less structured, larger and more complex and cannot be managed or analysed using traditional data processing software. This is why scientific, quant-based investors, like CFM, must invest so heavily in technology to collect, clean, and analyse large structured and unstructured data sources. The aim is to generate predictive insights and investment returns.
In recent years, the landscape of alternative data has expanded beyond belief. Its variety and quantity is almost incomprehensible, but not all of it is useful or trustworthy. There are many alternative data sets which could prove to be very valuable, so long as we are confident in their quality, and in our ability to interpret them. However, many require complex algorithms and massive computation power to sort and interpret.
Because alternative data is a subset of big data, the two concepts are often discussed together. Our ability to understand and use both has evolved because of advances in technology, and the convergence of two things: greater access to a library of huge data sets, combined with the increasing sophistication and ease of computation.
Technology is the key to extracting big data
Technology lies at the heart of our ability to analyse big data. Satellite images which reveal numbers of cars in parking lots over time, size and types of crops, or planes on tarmacs in different areas, website usage, public records, email receipts, geolocation – these are all data points which can tell us something. But until very recently we have not had the ability to collect, store and find meaning in unstructured or ‘dirty’ data sets. Finding this meaning is what allows us to make investment decisions.
Artificial intelligence and machine learning is also key to our ability to analyse big data and alternative data sets. So while analysts have long listened to the tone and language of CEOs, and made predictions based on their understanding of the language used and its underlying meaning, the results of this kind of analysis are subjective and linked to specific listeners, they cannot be relied on as meaningful or ‘scientific’ in any consistent way. Now, AI and machine learning has made it possible to analyse this language in a sophisticated, objective and ‘scientific’ way.
Tools such as Google’s Bert, a neural network-based technique for natural language process, can be used to understand the nuances and context of words in searches in a consistent way. In addition, it is open source, which means it can be accessed more readily, and leveraged in the financial context, with some adaptation. It can be used effectively, for example, to examine announcements from central banks and other influential financial institutions – and make predictions about the future.
Alternative data in a pandemic
The ability to analyse alternative data sets can give important insights into the trajectory of COVID-19 as well as its impact on financial markets. According to Euromoney April 9 2020, alternative data can tell in near real time the story of economic and financial market disruption; the catch is that asset managers need artificial intelligence to read it.
The ability to understand real-time insights into how the pandemic and lockdowns are impacting populations in social as well as financial terms is important information, and in stark contrast to the usual backward looking numbers which can be out-of-date, almost as soon as they arrive. In the case of COVID-19, alternative data sets revealed that US travel sectors were down by up to 98 per cent, at the same time as grocery stores were experiencing a 97 per cent rise in year-on-year spending by March – before the first death was recorded in the country.
COVID-19 is a different kind of global crisis. It is health, rather than financial market based in origin, and it has unfolded with astounding speed, stopping global economies in their tracks almost overnight. On the other hand, financial crises are in senses similar – and unfortunately, occur regularly. This crisis has highlighted that, for scientific-based, quantitative investors, it is more important than ever to focus on data during a crisis and not to lose sight of research and process.
Why alternative data matters to investment
Regardless of investment strategy, active investment managers are all engaged in a hunt for alpha—or a return in excess of the market—which they hope to consistently produce for investors.
As quant-based, systematic investors, we do this by applying the quant and systematic tools and skill sets we have learned and honed in over 30 years of investing, to large data sets – this includes both traditional market data sets, as well as the growing number of alternative data sets. These skill sets include traditional data analysis methods, as well as newer ways of analysing vast quantities of data, including machine learning, artificial intelligence, natural language processing and neural networks, all of which help us analyse the vast quantities of data. Financial markets evolve constantly, but history shows us that they do have a degree of predictability. The aim of analysing alternative data sets is to identify the patterns that are material to the financial markets, and to turn these into trading signals, and therefore new sources of alpha.
For quant-based alternative data strategies, the focus should always be on the data itself, how clean it is, what its statistical quality is, and how certain we are that the data and providers we use are the best and most reliable available. Ultimately, it is the quality of the data and of the investment process which results in good outcomes for investors.