01 Sep 2020

Denoising a signal with HMM

Most signals I deal with are noisy, reflecting noise of underlying prices, volume, vol of vol, etc. Many traditional strategies built on such indicators might either:

  • use signal to scale into position
    • such approaches have to deal with noise to avoid thrashing, adjusting position up and down with noise
  • consider specific levels of the signal to signify a state
    • for example: long {+1}, short {-1}, neutral {0}

Pictured below is an example...

15 Aug 2020

Feature Selection (3 / 3)

In the prior two posts, investigated:

In this post will evaluate feature importance as implemented by Random Forest and compare to Information Geometric approaches. Here is an outline of what would like to discuss:

  • similarities between Decision Trees and Information Geometric approaches for feature selection
  • some of the deficiencies of Decision Trees
  • some areas for improvements

As we will see, Random Forest’s approach to...

14 Aug 2020

Feature Selection (2 / 3)

As mentioned in the prior discussion feature selection (1/3), of primary interest is understanding the contribution of each feature in \(\vec{x}\) to the outcome or class labeling function \(f(\vec{x})\). One way to examine this is to understand how the distributions:

  • \(p(x_f)\), the probability distribution of feature f (without regard to label)
  • \(p(x_f\, \vert\, f(x) = y)\), the feature distribution conditional on class label

differ from each other. For a feature with no relationship to the outcome \(p(x_f)\)...

13 Aug 2020

Feature Selection (1 / 3)

I am often confronted with the problem of trying to reduce a high dimensional feature set to a, smaller, more effective one. Reducing dimension is important for machine learning models as:

  • the volume of the “search space” grows exponentially
    • at a minimum rate of \(2^{d}\) for binary categorical variables to a much higher exponential for continuous or n-ary categoricals.
  • the joint-distribution of high dimensional empirical spaces tends to be sparse and ill-defined
    • empirical distributions require...
03 Aug 2020

Buy / Sell Imbalance


It is fairly easy to recognize price momentum with price-based indicators ex-post or with lag. Price based momentum signals tend to have lag issues in recognizing the start and end of a price move as there is a tradeoff between noise and lag [1] that can’t be defeated without future information (due to principles from signal processing).

[1] For those interested see impulse-response and the relationship between response delay and degree of smoothing by a filter. Zero lag...

30 Jul 2020

Why ML → Finance is Hard (3 / 4)

Following on from the prior post, want to discuss the problem of sample independence. Many machine learning models in finance deal with timeseries data, where samples used in training may be close together in time and not be independent of one another. There are very few features in finance that do not make use of lookback periods. Most features do evaluate prior windows:

  • almost all technical indicators (SMA being the most basic example)
  • distribution based signals
  • decomposition based...
27 Jul 2020

Why ML → Finance is Hard (2 / 4)

Following on from the prior post, want to discuss the repercussions of the low signal / noise ratio and how it effects:

  • labeling / mis-labeling
  • patterns unsupported by features

How does this manifest and what might we do to ameliorate the issues it poses.


Financial timeseries appear to have a very low signal to noise ratio, where the variance (the power of the noise frequency) can be higher than the power of the overall signal....

27 Jul 2020

Why ML → Finance is Hard (1/4)

I have used machine learning in trading strategies over the past 10 years or so. However my use of ML has often played a relatively small role in the overall design and success of the strategies due to issues particular to financial data sets. I tend to use ML in specific signals or strategy sub-problems where the data / problem setup have attributes that lead to a robust statistical solution. This is as opposed to the “Nirvana” scenario where fundamental...

11 Jul 2020

Labeling Momentum & Trends

There are times when need to label a time series, identifying periods of momentum, trend, mean-reversion, etc. Directionaly labeling timeseries has a wide variety of applications:

  • labels can be used for supervised learning
  • analysis of microstructure around larger price moves
  • conditional analysis using label (pattern) sequences
  • testing online signals versus idealized ex-post labeled trend / momentum or MR targets

The Problem

The naive approach to labeling might just note the sign of individual returns in a series....

09 Dec 2017

Bitcoin Valuation Fundamentals

Bitcoin has entered the mainstream, though not in a way that is particularly useful. Many, including myself, are calling a bubble in Bitcoin. As with many bubbles when the “mom and pops” and non-professional investors get into a buying frenzy, historically this has been associated with the last stages of a bubble. Momentum from individual buying may persist for some time, potentially in cycles of buying dips, so would not want to get short, without a strong view on sentiment...

29 Oct 2017

Information In Volatility Structure [2]

In the prior post Information In Volatility Structure [1] applied the SABR model to fit noisy raw option price data of approximatelty 700 million prices across a 10 year history of 2700 stocks. The point was to examine a hypothesis:

  • does supply / demand imbalance in the options market express in terms of abnormal vol skew?
  • can abnormal vol skew point to forward market behavior?

First Application

I started by observing both put/call skew and skew...

02 Oct 2017

Roll Your Own RoboAdvisor [1]

I have two pools of capital, one for active trading and another for long-term investment / lower-risk capital preservation. I go through phases of actively managing investment capital and then phases where become too busy to do so properly. It would be convenient to hand off the management to one or more funds, invest and forget, but given the market uncertainties and what I know about wall street, trust is hard to come by. Indeed since the financial crisis, Hedge...

24 Sep 2017

Information In Volatility Structure [1]

I’ve developed signals based on the “spot” market, but had not really explored the options market as a source of information. In particular want to look at discrepancies in option demand / pricing that may relate to future returns or risk. In scenarios where there is an expected dislocation in price, there may be more demand for calls vs puts or vice-versa. Buying pressure on puts or calls will tend to impact the option price (and therefore implied vol), much...

02 May 2015

Market-Making Portfolio & Hedging

With market making we can try to be neutral by skewing prices in such a way as to maintain a neutral position.   To the extent that the market can become 1-sided (in momentum) or may have large sized requests (if offering at different sizes), one’s portfolio may require explicit hedging.

In a live market-making scenario we can determine how we want to hedge on a case-by-case basis and with a view on where it is cheapest to achieve the...

29 Mar 2015

Bitcoin, In its own Universe?

Investors are often looking for uncorrelated returns so as to better diversify.   If one looks at world indices &  equities, there is much less diversity between assets than there was a decade ago, indeed the cross-market correlations are remarkably high.

On the other hand, from a trading perspective, generally want to be able to reduce the risk by hedging or spreading against related assets.   For example in FX, when market making the G10 currencies, one typically offsets inventory risk with...

14 Mar 2015

Musings on HFT in Bitcoin

I have 4 Bitcoin L3 exchange feeds running smoothly out of a data center in California (which is slightly closer to Asian exchanges and Coinbase than the east coast).  It took a bit of error handling and exponential back-off, to handle the unreliability of connectivity with these exchanges, where connections can intermittently be overwhelmed (returning 502 / 503 errors due to the poor choice of a REST-based API).

I am thinking to add Bitstamp and Kraken to the mix, even though they are...

11 Feb 2015

Bitcoin: Needs Cross-Exchange "Prime Brokerage"

Ok, what I am going to say here is probably Bitcoin heresy, in that I am going to advocate more centralized clearing and management of assets wrt exchange trading.

I want to be able to scale trading in bitcoin and execute across multiple exchanges.  However have the following problems

  • lack of trust in (most) of the bitcoin exchanges
    • security of the exchange against attackers
    • degree of trust in the ownership re: my assets on deposit
  • inability to...
10 Feb 2015

Bitcoin L3 Feeds: Status

I have implemented 4 bitcoin exchange interfaces now that produce a live L3 stream of orderbook updates + trades of the form:


Given the above, can reconstitute the orderbook as it moves through time, and can likewise be used to create BBO quotes and bars of different granularities.   The status of the exchange implementations is:


I am looking to run this on a remote machine (preferably linux) and write to an efficient...

08 Feb 2015

Bitcoin Exchanges: State of the Market

In the previous post outlined intention to put together high quality L2/L3 feeds for the top 4-5 bitcoin exchanges, collect L3 data, and provide a consolidated live orderbook for trading.   So far have implemented OKCoin and been experimenting with the others to determine their API capabilities.

With the exception of OKCoin, what I’ve found so far is not good.  Here is a summary of the top-4 exchanges w/ respect to market data APIs (I also included Coinbase with the notion will...

28 Jan 2015

Consolidated Source of Data for Bitcoin

It seems like every other month there is a new bitcoin exchange.  For the purposes of trading research & backtesting it is important to have historical data across the most liquid exchanges.  My minimal list is:

  1. BTC/USD
    1. bitfinex (15%)
    2. bitstamp (5%)
    3. coinbase (new, but likely to garner market share)
  2. BTC/CNY
    1. okcoin (28%)
    2. btcn (44%)

(percentage volume sourced from http://bitcoincharts.com/charts/volumepie/).   Each of these exchanges not only has a unique protocol but also unique...

13 Dec 2014

Thompson Sampling

I recently attended a talk by David Simchi-Levi of MIT, where he discussed an approach to online price discovery for inventory, to maximize some objective, such as profit.   The scenario was where one could observe whether inventory was sold or not-sold to for each potential buyer in the marketplace, giving an empirical view of demand-behavior as a function of price.   The optimal setting in selling the inventory is one that  maximizes price x liquidation probability.

When we have no knowledge about the...

29 Sep 2012

Money Management

It has been almost a year since my last post.  I have been far too busy getting a new trading desk up and running.   I  thought to discuss money management, since am revisiting right now.


It is easy to think that trading signal is the most important aspect of a trading strategy, but money management (and execution) can be even more important.   Loosely defined, money management is a mechanism for position-level risk management.  The mechanism attempts...