Machine learning and the wisdom of the crowd

Research News

NSF CAREER awardee combines machine learning, economics to develop socially beneficial forecasts

April 8, 2016

After a consumer uses an e-commerce site, such as Amazon, it will often suggest other items the shopper might like. Such predictions arise from sets of information the site has collected about users and their buying decisions.

This technique, which uses computer algorithms to make forecasts, is known as machine learning. Machine learning is now in frequent use in peoples’ lives, from the facial recognition Facebook uses for tagging photos and speech dictation with Apple’s Siri, to hospitals’ medical diagnosis decisions, and many companies’ marketing and advertising strategies.

Jacob Abernethy, University of Michigan assistant professor of computer science, has been fascinated by an unlikely relationship between machine learning algorithms and market economy dynamics.

“Within the research on machine learning and statistics, the language of economics is conspicuously absent,” Abernethy says. “Researchers talk about parameter estimation, inference, optimization and prediction accuracy. In contrast, one rarely encounters terms such as marginal price, utility, equilibrium, risk aversion and such.”

With support from the National Science Foundation (NSF), the scientist is trying to bridge that gap by developing theoretical mathematical models that connect the two fields.

“This economics lens gives us a new way to understand and develop techniques for learning and prediction problems,” he says. “I am attempting to take established algorithmic concepts from statistics to see if these can be leveraged to study various aspects of financial markets. But connections can also be made in reverse, where tools for designing efficient markets can lead to new algorithms for synthesizing large datasets for complex learning tasks.”

Abernethy stresses he has no personal interest in predicting stock fluctuations or using these techniques as a way to make investments. Rather, he hopes to use these ideas to build new tools that potentially could provide huge societal benefits.

To do so, he has spent a great deal of time understanding what economists call “prediction markets,” which facilitate the buying and selling of betting contracts on future events.

“Such markets are of great interest to gamblers who want to bet on the outcome of, say, football matchups,” he says.

But prediction markets have also caught researchers’ attention in recent years. Increasingly, they see posted betting odds — set by supply and demand — as sources of highly accurate predictions about the probability of future events.

“Underlying this accuracy is that, while individual traders — gamblers — may not be well-informed or rational, in aggregate, the wisdom of crowds takes over,” Abernethy says.

Currently, prediction markets mostly play a role in entertainment, such as forecasting the outcomes of sporting matches, Academy Award winners or elections. But Abernethy thinks market mechanism benefits could go further.

“We could imagine a market for drug discovery. We could have interested participants betting on which types of compounds will lead to a treatment for a particular cancer, or which parts of the genome are important in certain biological functions,” he says.

That reality is approaching quickly, as markets to predict weather outcomes have emerged, driven by the need for farmers and others to hedge against economic pressures caused by excessive rain or drought.

This market-oriented perspective has benefits within computer science as well, as this approach could provide new insights to develop methods for decentralizing data-driven tasks, Abernethy says.

Machine learning algorithms typically assume all of the data is centralized on one machine that can process and spit out predictions.

“But in the real world, data is massive and it is never stored on a single device, but rather distributed across a large cluster of machines or even across geographically-distant data centers,” Abernethy said.

How can massive, distributed data be analyzed in a centralized fashion? Markets face the same challenge — but also suggest solutions.

“Despite the fact that information is widely dispersed, economic principles tell us we can expect markets to clear, and that prices will reflect all aggregate supply and demand,” he says.

Abernethy is conducting his research under an NSF Faculty Early Career Development (CAREER) award, which he received in 2015. The award supports junior faculty who exemplify the role of teacher-scholars through outstanding research, excellent education, and the integration of education and research within the context of the mission of their organization. NSF is funding his work with about $500,000 over five years.

He sees future applications in designing new techniques for crowdsourcing and labor decentralization via collaborative financial payment schemes, building off the success of, for example, Amazon’s Mechanical Turk or the Netflix Prize, a competition sponsored by Netflix to improve its accuracy in predicting what movies its subscribers would like, based on their viewing preferences.

Abernethy has also developed a new initiative, the Michigan Data Science Team (MDST), a competitive team at the University of Michigan, Ann Arbor, that competes against professional and amateur data scientists from around the world in online prediction challenges.

“There already is a market for solving prediction challenges, and more and more companies are advertising their data science problems publicly in a competitive environment,” he says. “I want to improve competitions, make them more driven by market forces. For example: let’s say I am a company or organization with a prediction problem, and I want to pay people who will design an algorithm to solve this problem. There are experts out there, but how can I incentivize them to help me?”

Universities face many of these same problems as well, such as, predicting how many students would enroll in a course, given the course description, or predicting a grade if you enroll in a course, based on data that includes the student’s prior grade record and the teacher’s grading habits.

“I’d like to create a structured framework for students who want to get into machine learning, that will encourage them to become active in solving these challenges,” he says.

—
Marlene Cimons

—
Aaron Dubrow,
NSF

703-292-4489 adubrow@nsf.gov

A visual description of a mathematical function used to solve an optimization problem.
Credit and Larger Version
The Michigan Data Science Team, a competitive team at the University of Michigan, Ann Arbor.
Credit and Larger Version
The “random walk” method is often used to model for real-world time series data.
Credit and Larger Version