Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

Understanding stock market instability via graph auto-encoders

Published in Arxiv, 2022

We propose to use the edge reconstruction accuracy of a graph auto-encoder (GAE) as an indicator for how spatially homogeneous connections between assets are, which, based on financial network literature, we use as a proxy to infer market volatility.

Recommended citation: Gorduza, Dong, Zohren. (2022). "Understanding stock market instability via graph auto-encoders". 1(1). https://arxiv.org/abs/2212.04974

Shared Causes, Shared Risks? Evidence from Economic News

Published in Comptext 2023 - International Interdisciplinary Conference on the Quantitative and Computational Analysis of Text-, Image- and Video-as-Data, 2023

In financial markets, common sources of risk are often rendered and measured in the form of correlation. While there exists a substantial body of literature on the structural impact of correlation in financial mar- kets, the fundamental intuition rests primarily on estimating the exposure to joint risk sources through the correlation matrix. However, price correlations alone fail to capture the entirety of the common risk envi- ronment within which firms operate. This paper argues that a more comprehensive understanding of joint risk exposure can be achieved by incorporating a causality network extracted from narrative parsing of news and financial filings in economic models.

Recommended citation: Gorduza, Ash (2023), Shared Causes, Shared Risks? Evidence from Economic News

Wisdom of the Crowds or Ignorance of the Masses? A data-driven guide to WSB

Published in Journal of Portfolio Management, 2024

A trite yet fundamental question in economics is: What causes large asset price fluctuations? A tenfold rise in the price of GameStop equity, between the 22nd and 28th of January 2021, demonstrated that herding behaviour among retail investors is an important contributing factor. This paper presents a data-driven guide to the forum that started the hype – WallStreetBets (WSB). Our initial experiments decompose the forum using a large language topic model and network tools. The topic model describes the evolution of the forum over time and shows the persistence of certain topics (such as the market / S\&P500 discussion), and the sporadic interest in others, such as COVID or crude oil. Network analysis allows us to decompose the landscape of retail investors into clusters based on their posting and discussion habits; several large, correlated asset discussion clusters emerge, surrounded by smaller, niche ones. A second set of experiments assesses the impact that WSB discussions have had on the market. We show that forum activity has a Granger-causal relationship with the returns of several assets, some of which are now commonly classified as `meme stocks', while others have gone under the radar. The paper extracts a set of short-term trade signals from posts and long-term (monthly and weekly) trade signals from forum dynamics, and considers their predictive power at different time horizons. In addition to the analysis, the paper presents the dataset, as well as an interactive dashboard, in order to promote further research.

Recommended citation: V Semenova, D Gorduza, W Wildi, X Dong, S Zohren, Wisdom of the Crowds or Ignorance of the Masses? A data-driven guide to WSB https://www.pm-research.com/content/iijpormgmt/50/4/88

Prudential Regulation Embedding Transformer (PRET) a domain-adapted model for prudential supervision

Published in ICAIF '24:5th ACM International Conference on AI in Finance, 2024

Analysis of unstructured text is a key aspect of everyday financial supervision operations run by regulators worldwide. The emergence of transformer-based language models have opened the possibility of improving the financial supervision process, by assisting supervisors with labour-intensive and time-consuming tasks of information retrieval across the wide range of complex corpora like financial rules and regulations. This paper introduces Prudential Regulation Embedding Transformer (PRET), a novel domain-adapted transformer encoder model tailored for information retrieval on topics relating to financial regulations. To train this model, we address the scarcity of high-quality training financial regulations text datasets with a dedicated pipeline to web-scrape and pre-process the Basel Framework into machine-readable format, which is then coupled with corresponding large language model (LLM) generated text as synthetic training data pairs for each rule in the Basel Framework. We evaluate the performance of our model on this domain-specific information retrieval task against commonly used state-of-the-art (SOTA) models. We show how our proposed model outperforms existing benchmarks while being substantially cheaper to train than previous methods. We discuss the implications of our findings for the design of better regulatory technology models across jurisdictions.

Recommended citation: Dragos Gorduza, Adam Muhtar Prudential Regulation Embedding Transformer (PRET) a domain-adapted model for prudential supervision https://openreview.net/pdf?id=zPgXjTnmfM

Extracting Alpha from Financial Analyst Networks

Published in ICAIF '24:5th ACM International Conference on AI in Finance, 2024

We investigate the effectiveness of a momentum trading signal based on the coverage network of financial analysts. This signal builds on the key information-brokerage role financial sell-side analysts play in modern stock markets. The baskets of stocks covered by each analyst can be used to construct a network between firms whose edge weights represent the number of analysts jointly covering both firms. Although the link between financial analysts coverage and co-movement of firmsÕ stock prices has been investigated in the literature, little effort has been made to systematically learn the most effective combination of signals from firms covered jointly by analysts in order to benefit from any spillover effect. To fill this gap, we build a trading strategy which leverages the analyst coverage network using a graph attention network. More specifically, our model learns to aggregate information from individual firm features and signals from neighbouring firms in a node-level forecasting task. We develop a portfolio based on those predictions which we demonstrate to exhibit an annualized returns of 29.44% and a Sharpe ratio of 4.06 substantially outperforming market baselines and existing graph machine learning based frameworks. We further investigate the performance and robustness of this strategy through extensive empirical analysis. Our paper represents one of the first attempts in using graph machine learning to extract actionable knowledge from the analyst coverage network for practical financial applications.

Recommended citation: Dragos Gorduza,Yaxuan Kong,Xiaowen Dong,Stefan Zohren https://camps.aptaracorp.com/ACM_PMS/PMS/ACM/ICAIF24/36/d49bbaf5-815e-11ef-ada9-16bb50361d1f/OUT/icaif24-36.html

talks

Natural Language Processing in Finance Tutorial

Published:

This is a tutorial on Natural Language Processing (NLP) for finance where participants were introduced to different Machine Learning approaches for financial tasks. The topics covered included general introductions to applications of financial NLP; looking at baseline models models and lastly how larger transformer models can help with NLP tasks.

Natural Language Processing in Finance Tutorial

Published:

This is a tutorial on Natural Language Processing (NLP) for Social Data Science within the NLP-SODAS summer 2023 conference, where participants were introduced to different Machine Learning approaches for financial & social science tasks. The topics covered included general introductions to applications of financial NLP; looking at baseline models models and lastly how larger transformer models can help with NLP tasks.

teaching

Fundamentals of Social Data Science in Python 2020

Teaching Assistantship, Oxford Internet Institute, Social Data Science Programme 2020-2021, 2020

This course is a four-week intensive primer to get people up to speed on programming in the python programming language for use with data science. To note, python is not the only programming language you will encounter in this course, let alone this degree programme, but it is a great place to start. In week 4 we will compare differences between Python and R (another very popular language in data science). The goal of this course is to get students acquainted with clean, reusable, documented code. Learning machine learning and big data tools will be secondary to this task and come in later modules.

Fundamentals of Social Data Science in Python 2021

Teaching Assistantship, Oxford Internet Institute, Social Data Science Programme 2021-2022, 2021

This course is a four-week intensive primer to get people up to speed on programming in the python programming language for use with data science. To note, python is not the only programming language you will encounter in this course, let alone this degree programme, but it is a great place to start. In week 4 we will compare differences between Python and R (another very popular language in data science). The goal of this course is to get students acquainted with clean, reusable, documented code. Learning machine learning and big data tools will be secondary to this task and come in later modules.

Introduction to Machine Learning 2021

Teaching Assistantship, Oxford Internet Institute, Social Data Science Programme 2021-2022, 2021

This course focuses on building the core understanding and implementation of Machine Learning methods for the social sciences.

Applied Machine Learning 2022

Teaching Assistantship, Oxford Internet Institute, Social Data Science Programme 2021-2022, 2022

This class covers advanced topics in Machine Learning for Social Data Science. We extend the course provided in the first term's Introduction to Machine Learning class. This will extend the mathematical foundations towards domains where we are uncertain about the right answer or best approach.

Applied Machine Learning 2023

Teaching Assistantship, Oxford Internet Institute, Social Data Science Programme 2022-2023, 2023

This class covers advanced topics in Machine Learning for Social Data Science. We extend the course provided in the first term's Introduction to Machine Learning class. This will extend the mathematical foundations towards domains where we are uncertain about the right answer or best approach.

Introduction to Natural Language Processing for the Social Sciences 2023

Teaching Assistantship, Oxford Internet Institute, Social Data Science Programme 2022-2023, 2023

Natural Language teaching assistantship working with Janet Pierrehumbert at the University of Oxford's Oxford Internet Institute. This class covers supervised and unsupervised language models to use for social data science. We show students how to build models inluding Naive Bayes, LSTM … and apply them to a wide array of tasks in social data science including hate speech detection, topic classifications.