The real value of data requires a holistic view of the end-to-end data pipeline

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Ashok Srivastava on the emergence of machine learning and AI for enterprise applications.

In this episode of the Data Show, I spoke with Ashok Srivastava, senior vice president and chief data officer at Intuit. He has a strong science and engineering background, combined with years of applying machine learning and data science in industry. Prior to joining Intuit, he led the teams responsible for data and artificial intelligence products at Verizon. I wanted his perspective on a range of issues, including the role of the chief data officer, ethics in machine learning, and the emergence of AI technologies for enterprise products and applications.

Here are some highlights from our conversation:

Chief data officer

A chief data officer, in my opinion, is a person who thinks about the end-to-end process of obtaining data, data governance, and transforming that data for a useful purpose. His or her purview is relatively large. I view my purview at Intuit to be exactly that, thinking about the entire data pipeline, proper stewardship, proper governance principles, and proper application of data. I think that as the public learns more about the opportunities that can come from data, there’s a lot of excitement about the potential value that can be unlocked from it from the consumer standpoint, and also many businesses and scientific organizations are excited about the same thing. I think the CDO plays a role as a catalyst in making those things happen with the right principles applied.

I would say if you look back into history a little bit, you’ll find the need for the chief data officer started to come into play when people saw a huge amount of data coming in at high speeds with high variety and variability—but then also the opportunity to marry that data with real algorithms that can have a transformational property to them. While it’s true that CIOs, CTOs, and people who are in lines of business can and should think about this, it’s a complex enough process that I think it merits having a person and an organization think about that end-to-end pipeline.

Ethics

We’re actually right now in the process of launching a unified training program in data science that includes ethics as well as many other technical topics. I should say that I joined Intuit only about six months ago. They already had training programs happening worldwide in the area of data science and acquainting people with the principles necessary to use data properly as well as the technical aspects of doing it.
Continue reading “The real value of data requires a holistic view of the end-to-end data pipeline”

8 fintech trends for 2018

[A version of this post appears on the O’Reilly Radar.]

AI, blockchain, payment regionalization, and other fintech trends to watch.

2017 saw big changes, a lot of investment, and some regulatory challenges in fintech. What will 2018 bring? Here’s what we’ll be watching in the coming year.

1. AI will be implemented across the stack

AI is sweeping across all industry sectors, including financial services. AI touches customer interactions (voice services like Siri and dialog systems), fraud detection, trading, and risk management (machine learning), and is being used to automate many back-office tasks (robotic process automation). AI technologies are also giving rise to new fintech startups that use techniques like computer vision to unlock new datasets (e.g., aerial images).

2. New products will make advanced analytics easier

Talk to any vendor or startup in big data analytics or cloud computing and they probably have key customers in financial services. This means that many technology providers will create products tailored for finance (most likely products that comply with existing regulations), which lowers the barrier to using advanced analytics.
Continue reading “8 fintech trends for 2018”

Programming collective intelligence for financial trading

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Geoffrey Bradway on building a trading system that synthesizes many different models.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the Data Show, I spoke with Geoffrey Bradway, VP of engineering at Numerai, a new hedge fund that relies on contributions of external data scientists. The company hosts regular competitions where data scientists submit machine learning models for classification tasks. The most promising submissions are then added to an ensemble of models that the company uses to trade in real-world financial markets.

To minimize model redundancy, Numerai filters out entries that produce signals that are already well-covered by existing models in their ensemble. The company also plans to use (Ethereum) blockchain technology to develop an incentive system to reward models that do well on live data (not ones that overfit and do well on historical data).

Here are some highlights from our conversation:
Continue reading “Programming collective intelligence for financial trading”

Data science for humans and data science for machines

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show Podcast: Michael Li on the state of data engineering and data science training programs.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the O’Reilly Data Show, I spoke with Michael Li, cofounder and CEO of the Data Incubator. We discussed the current state of data science and data engineering training programs, Apache Spark, quantitative finance, and the misunderstanding around the term “data science.”

Here are some highlights from our conversation:
Continue reading “Data science for humans and data science for machines”

Is 2016 the year you let robots manage your money?

[A version of this post appears on the O’Reilly Radar.]

The O’Reilly Data Show podcast: Vasant Dhar on the race to build “big data machines” in financial investing.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.

In this episode of the O’Reilly Data Show, I sat down with Vasant Dhar, a professor at the Stern School of Business and Center for Data Science at NYU, founder of SCT Capital Management, and editor-in-chief of the Big Data Journal (full disclosure: I’m a member of the editorial board). We talked about the early days of A.I. and data mining, and recent applications of data science to financial investing and other domains.

Dhar’s first steps in applying machine learning to finance

I joke with people, I say, ‘When I first started looking at finance, the only thing I knew was that prices go up and down.’ It was only when I actually went to Morgan Stanley and took time off from academia that I learned about finance and financial markets. … What I really did in that initial experiment is I took all the trades, I appended them with information about the state of the market at the time, and then I cranked it through a genetic algorithm and a tree induction algorithm. … When I took it to the meeting, it generated a lot of really interesting discussion. … Of course, it took several months before we actually finally found the reasons for why I was observing what I was observing.

Robots as decision makers

The general research question I really ask is when do computers make better decisions than humans? That’s really sort of the core question. … I’ve applied it to finance, but there are other areas. I’m involved in a project on education, and one might ask the same thing. When do computers make better teachers than humans? It’s an equally interesting question. … Should you trust your money to a robot? The flip side of that question is when do computers make better decisions than humans?

One of the things I did was to break up the investment landscape into three different types of holding periods. On the one hand, you have high-frequency trading, and on the other extreme, you have very long-term investing. In high-frequency trading, your holding periods are sort of minutes to a day. In very long-term investing, your holding periods are months to years, that Warren Buffett style of investing. Then there’s sort of a space in the middle, which is the part I find most interesting, where there’s a lot of action, which is sort of days to weeks holding period. … The strategy one uses for these different horizons tends to be very different. In the high-frequency trading space, for example, humans don’t really stand a chance against computers, there’s just so much information.

Subscribe to the O’Reilly Data Show Podcast

Stitcher, TuneIn, iTunes, SoundCloud, RSS

Related resources:

Image via the Internet Archive on Wikimedia Commons.

Network Science Dashboards

Networks graphs can be used as primary visual objects with conventional charts used to supply detailed views

[A version of this post appears on the O’Reilly Data blog.]

With Network Science well on its way to being an established academic discipline, we’re beginning to see tools that leverage it. Applications that draw heavily from this discipline make heavy use of visual representations and come with interfaces aimed at business users. For business analysts used to consuming bar and line charts, network visualizations take some getting used. But with enough practice, and for the right set of problems, they are an effective visualization model.

In many domains, networks graphs can be the primary visual objects with conventional charts used to supply detailed views. I recently got a preview of some dashboards built using Financial Network Analytics (FNA). In the example below, the primary visualization represents correlations among assets across different asset classes1 (the accompanying charts are used to provide detailed information for individual nodes):

Financial Network Anlytics

Using the network graph as the center piece of a dashboard works well in this instance. And with FNA’s tools already being used by a variety of organizations and companies in the financial sector, I think “Network Science dashboards” will become more commonplace in financial services.

Network Science dashboards only work to the extent that network graphs are effective (networks graphs tend get harder to navigate and interpret when the number of nodes and edges get large2). One work around is to aggregate nodes and visualize communities rather than individual objects. New ideas may also come to the rescue: the rise of networks and graphs is leading to better techniques for visualizing large networks.

This fits one of the themes we’re seeing in Strata: cognitive augmentation. The right combination of data/algorithm(s)/interface allows analysts to make smarter decisions much more efficiently. While much of the focus has been on data and algorithms, it’s good to see more emphasis paid to effective interfaces and visualizations.

Related Content:


(0) This post is based on a recent conversation with Kimmo Soramäki, founder of Financial Network Analytics.
(1) Kimmo is an experienced researcher and policy-maker who has consulted and worked for several central banks. Thus FNA’s first applications are aimed at financial services.
(2) Traditional visual representations of large networks are pejoratively referred to as “hairballs”.

Financial analytics as a service

[A version of this post appears on the O’Reilly Strata blog.]

In relatively short order Amazon’s internal computing services has become the world’s most successful cloud computing platform. Conceived in 2003 and launched in 2006, AWS grew quickly and is now the largest web hosting company in the world. With the recent addition of Kinesis (for stream processing), AWS continues to add services and features that make it an attractive platform for many enterprises.

A few other companies have followed a similar playbook: technology investments that benefit a firm’s core business, is leased out to other companies, some of whom may operate in the same industry. An important (but not well-known) example comes from finance. A widely used service provides users with clean, curated data sets and sophisticated algorithms with which to analyze them. It turns out that the world’s largest asset manager makes its investment and risk management systems available to over 150 pension funds, banks, and other institutions. In addition to the $4 trillion managed by BlackRock, the company’s Aladdin Investment Management system is used to manage1 $11 trillion in additional assets from external managers.

BlackRock: Aladdin

Just as AWS has been adopted by e-commerce companies, some of Aladdin’s users are BlackRock’s peers in the asset-management industry. In the case of Aladdin2, asset managers have come to value it’s collection3 of high-quality historical data and analytics (including Monte-Carlo simulations and stress tests). In recent years, the amount of assets that rely on Aladdin grew by about $1 trillion per year. To put these numbers in context, the 6,000 computers that comprise Aladdin keep an eye on about “7% of the world’s $225 trillion of financial assets”. About 17,000 traders worldwide have access to Aladdin.

Continue reading “Financial analytics as a service”