How can we use AI to Predict the Stock Market? — Interview with Data Science Researcher Oscar Javier Hernandez

Article by Limarc Ambalina | June 06, 2019

Oscar Javier Hernandez is a published Ph.D. student at the University of British Columbia. With a background in theoretical physics and seven publications under his belt, Oscar has given numerous award-winning presentations and is now making the transition to specialize in data science. Some of his recent machine learning research projects include time series forecasting using neural networks and Twitter sentiment analysis of automotive brands which was carried out as a consulting project for ApiThinking.

Our interview with Oscar focuses on his work regarding the correlation between the sentiment of Twitter posts and stock market fluctuations of automotive companies.

Lionbridge AI: How did you become interested in machine learning and what made you decide to transition from physics to data science?

Oscar: I have always been curious about AI and machine learning, particularly from a theoretical point of view. It is a powerful tool that can address questions that I’m very curious about. I would say that the transition to data science was very natural. As a physicist, my work has involved reasoning from first principles and writing high-performance computing algorithms. This is followed by an extremely detailed analysis of the data to make sure it is valid. This scientific process is the same that you find in data science; you must be absolutely sure about your analysis and results. As Richard Feynman once said, “the first principle is that you must not fool yourself – and you are the easiest person to fool.”

L: Coming from a theoretical physics background, you have a unique perspective in the field, especially now that you are transitioning to data science. In terms of global progress and development in machine learning, how far are we from creating an artificially intelligent machine? Using a 100-meter race as an example, if the finish line is a true AI, how far are we from it?

Oscar: I think in this respect we are quite far from the finish line. The popular algorithms that are used in machine learning are not intended to simulate the cognitive processes in conscious beings, but rather to find signals in noisy data using statistics. On the other hand, there are other projects such as OpenWorm that attempt to simulate the brains and the motor functions of a real organism, the roundworm. However, since this worm only has 302 neurons and humans have about 80-90 Billion, it will take quite some time before we can simulate something with human-level intelligence. In terms of the original race, this amounts to a traveled distance of about 3 nanometers (in the range of extreme UV light) out of the total 100 meters of the race. A long way yet to go! But I’m hopeful things will accelerate over the upcoming years.


AI and the Stock Market

L: One of the most interesting things about machine learning is its seemingly endless applications. One of your projects was about how we might be able to use AI to predict stock market returns using Twitter sentiment analysis. Could you briefly explain the goals of that project?

Oscar: In the consulting project for ApiThinking, we were interested in understanding the role that the sentiments (positive or negative) from Twitter users about a company played in their stock returns. We hypothesized that having more positive tweets would correlate with increased stock returns and conversely for the negative tweets. To accomplish this, we first needed to set up a server to collect tweets, process the data, and conduct the statistical analysis to reach the conclusions.

L: Why did you decide to focus your research on Twitter, rather than Facebook, Instagram, or any other social media outlet?

Oscar: The main reason we chose Twitter for this project was because of the great API that can be easily accessed using python. Instagram is also possible but more difficult to process because the data are images that are more difficult to analyze. Facebook data is not easily accessible to third parties and I would need to develop a more sophisticated framework to scrape posted data. Twitter was definitely the easiest and cleanest solution for starting this project.

L: What did you learn through that project and how do you plan to use your findings for future projects?

Oscar: The main thing we learned is that there may be a small amount of sensitivity of the return of a stock to the Twitter data, but it is not always clear what that relation is ahead of time. Sometimes the stock will be very sensitive to positive twitter sentiments but at other times the stock will be more sensitive to negative twitter data.

For example, in the case of Tesla, Elon Musk announced on August 7th, 2018 that he was taking the company private which lead to lots of speculation on Twitter. Many of the tweets were negative during this time. However, after the announcement the stock went way up, indicating that the negative comments were correlated with positive stock gains around this time.

This sentiment correlation may change depending on the type of announcements or news coverage in the media and how people respond to it, so it’s important to track this over time. But during short periods, if you have some indication about how people are responding on Twitter to company news and what the correlation of their tweets is to the stock around that time, this knowledge may give you some predictive power about the market.

L: What was the biggest problem or difficulty you faced throughout the project and your analysis of the data?

Oscar: One of the most challenging aspects of the work was setting up the server as a tweet collector and making sure it was not crashing. In addition, the tweets required lots of processing before they were useful. Tweets with no usable information had to be removed which took us some time. The model training was actually quite fast and took the least amount of time out of the whole project.

L: Speaking more generally now, what are some of the biggest challenges AI developers and researchers face which are unique to machine learning?

Oscar: Most of the algorithms that are used in the field require a large enough dataset so that the algorithm learns the correct information.

One of the biggest challenges facing AI and ML developers is access to high-quality data.

Without the data, there is no “data science”. Nowadays many companies guard data more carefully than their algorithms and it can be very expensive for other companies to gain access to it. In addition, governments and public institutions are being more careful about the data they release and so it is harder to access their data. This is one of the most important and unique challenges for machine learning.


Can we use AI to Predict the Stock Market?

L: Lastly, based on your results and the difficulties you faced throughout this project, do you think it is possible to use AI to predict stock market fluctuations?

Oscar: Yes, I think this approach is very promising, there have been published papers that have also found correlations using similar approaches. This method could be made more powerful by collecting data from other social media sources, such as using news and improving the sentiment models. Aside from that, we are still collecting more data to conduct stronger statistics. It would also be interesting to use the sentiment signals that we found as a trading algorithm in the future. With the permission of ApiThinking, the proof-of-concept code has been made publicly available on Github for anyone to play with and improve. I think there are still many surprises waiting to be discovered from this approach.

If you’re looking for more reading from experts in the machine learning field, check out the rest of our interviews here. The summary of Oscar’s work on the Twitter sentiment analysis and the stock market can be found on ApiThinking. Want to keep up to date on Oscar’s work? Follow the blog on his website.


As Oscar said, “Without data, there is no data science,” so if you’re looking for AI training data, get in touch with Lionbridge. At Lionbridge AI, we use our multilingual crowd of over 500,000 curated workers to create custom AI training data to match your project’s needs.

Learn how it feels to offload your data annotation tasks
The Author
Limarc Ambalina

Limarc writes content for Lionbridge’s website as part of the marketing team. Born and raised in Canada, Limarc’s love of Japanese pop culture brought him to Japan in 2016 and living in Japan has been his dream come true. Apart from Lionbridge content, you can catch Limarc online writing about anime, video games, and other nerd culture.


    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.