Subscribe: Posts | Comments Preparednessdaily.com

## Guest Post: Deconstructing Algos, Part 1

Sat, Jun 25, 2011

From zero hedge – on a long enough timeline, the survival rate for everyone drops to zero

*Submitted by The World Complex*

**Deconstructing algos, part 1 **

The third part of the series on information theoretic

methods of analysis for dynamic systems is taking longer than

anticipated. Crunching the numbers is killing me. So I’ll take a break

from it and look a little farther forward–how we can use the methods I

have been describing so far to forensically examine the algorithms used

in various high-frequency trading events of the recent past.

As seen on Nanex and Zero Hedge, there has recently been a lot of strange, algorithmically driven behaviour in the pricing of natural gas and individual stock prices on very short time frames. In an earlier article I pointed out that the apparent simple chaos we observe in the natural gas price appeared to be an emergent property of at least two duelling algorithms.

In this series of articles we will begin analysis of the algorithms

involved. Today’s discussion will mostly focus on framing the issues

that must be addressed in order to study unknown algorithms on the basis

of their time-varying outputs. Future articles will present results

from the various analyses.

We begin by looking at the activity in the natural gas price on June 8, 2011:

Let us also consider the pricing action in CNTY on June 21, 2011:

In

both of these examples (many more such examples exist) there are three

time series of interest to us–the bid price, the ask price, and the

prices of trades. Additional information which may also be of use are

such things as volume, size of bids, size of asks, and so on. In

principal both the bid and ask prices form continuous series which are

prone to instantaneous changes. The actual trades form a discontinuous

time series with obsrevations at irregular intervals.We

don’t have access to the code involved in these

algorithms–nevertheless, we can learn something about the computational

processes involved, within certain limitations. Unfortunately, just as

is the case in studying time series recorded in rocks, we have to make

some assumptions, and the validity of our assumptions goes a long way

towards predicting the success of our endeavours.

Our first assumption is that the bid price and the ask price are

being set by competing interests. This assumption is extremely

important. It is possible that the bid and the ask are both being set by

a single entity, or by two closely related entities who are using them

to manipulate the natural gas price. We will go though in some detail

the reasoning behind our assumption that there are competing interests

involved below.

Secondly, we are approaching this problem assuming that prices are set

and changed discontinuously in time rather than continuously in time.

Subtleties of this assumption are discussed in the introduction of Bosi and Ragot (2010).

The methodologies we will explore are as follows:

Cross-correlation of the bid and ask series over selected windows. We

choose limited time intervals rather than the entire record because we

expect that each series will sometimes lead and sometimes follow. Peaks

here will show whether one of the series leads or trails the other

consistently or whether each one leads intermittently, which would

support the idea that these are distinct dueling algorithms. It seems

likely that the bid price will lead as both are declining, and the ask

will lead as both are climbing. We should test this hypothesis.

One goal of this analysis will be to see if we can detect trigger

points, where one stops following and begins leading. We will locate the

times and see if the trigger can be identified, which is only likely if

the trigger is some change in either price series, the price of a

trade, the volume of a trade. Unfortunately, many other triggers are

possible, and it may not be possible to identify them if they are, for

instance, a random number generator seeded by, say, the

thousandths-of-a-second digit at the instant of some distant event like

the first pitch of a Yankee’s game or when the secretary in the front

office misspells ‘the’.

Phase space reconstruction–the relevant time series (bid prices, ask

prices, trade prices) each represent one-dimensional data sets. If the

algorithms used can be visualized in higher-dimensional phase space, we

may be able to reconstruct the overall architecture.

The advantage of this approach is that in principle the dynamics of the

system will be contained no matter which output of the model we use. We

only have measurements of the bid price, but have no idea what other

outputs are generated by the same algorithm, even if these unknown

outputs are critical to the decision-making module of the algo. The

reconstructed phase space

The difficulties here are that 1) the function may change from leader to

follower so quickly that the resulting trajectory through phase space

is too short to interpret; 2) there may be multiple players on both the

bid and ask, meaning the reconstructed trajectory through phase space is

an amalgamation of two or more different functions, the instant of

joining of which may be impossible to determine; and 3) it may prove

impossible to properly define windows for the data, again creating an

amalgamation in phases space of two or more different functions.

Epsilon machine reconstruction–We will need to try to identify the

actual “work” done by these programs. How do they decide on a price? How

do they “decide” to drop or raise their offer? Do they change? How are

we to recognize when an algorithm changes its behaviour when all we have

to deal with is the output? Can we recognize when the structure of the

computation involved in the decision-making part of the algorithm

changes, given our extremely limited knowledge of that structure?

These questions may be addressed using the ε-machine reconstruction approach suggested by Crutchfield (1994).

The objective of this approach is to use an open-ended modeling scheme

to describe the computational structure objectively, so that different

practitioners working on the same data will come up with similar

(hopefully identical) constructs. By encouraging an heirarchical

architecture of undefined complexity, the method allows investigators to

identify changes in behaviour of the the system.

This particular approach is built around discrete computation, so is

amenable to data which are discrete rather than continuous in time. We

assume that the discrete outputs (the time series, or stream of values)

is the result of a computational process which is knowable. The data

have to be organized, and (this is the key) repeated *states* are identified. It is possible that these states will be identified from the reconstructed phase space portraits above;

alternatively they may be be defined by particular observations. These

states may be identified as key strings of data, or may be recognized in

complex functions by reconstructing the state space in a higher

dimension. The ordering of the states is significant, as the state that

appears first before another particular state is referred to as the *predictive state*, and the following state is the *successor state*.

The ε-machine is constructed by

identifying all the predictive and successor states and calculating the

probabilities of all of their observed relationships. If more than one ε-machine is inferred, the sequence of these first-order ε-machines can be used to build a higher-order ε-machine. Given sufficient data, you may construct ε-machines of arbitrary order.

Information theory–as seen in recent articles, information theory may be used to characterize the complexity of the ε-machine reconstruction and the probability density.

The yet-to-be completed third part of that series concerns methods of

using information theory to find the optimum window length for creating a

probability density plot of the reconstructed phase space. The

subsequent parts of this series will concern itself with the analyses

described above on the nat gas and CNTY algos, as well as others as they

are found.

Given the limitations of time and computing resources, I can’t guarantee

a timeline. I regret that my speed of analysis is six or seven orders

of magnitude slower than the incidents in real time.

## Leave a Reply

Name (required)

Mail (will not be published) (required)

Website

« Previous Entries Next Entries » ]]>

- Survivalist Mentality
- A&G Interview A Survivalist 1
- Long Term Food Storage Secret Many YouTube survivalist dont want you to know
- Yoders Bacon for Survival Food Storage and Daily Use
- Prepared #2 – Storing Food – Basics (Preparedness series)
- Food Storage Night
- The Survivalist’s GeoCache
- Survivalist Mental Conditioning
- Prepared #3 – Storing Knowledge (Preparedness series)
- Survival food stash and storage. Part 1

- SurvivalReport: Should probably take a gander at episode 14 if you…
- blowuputube: great stuff man, watched through #10 will catch th…
- CorpusChristi83: I agree, you can’t really always grow EVERYTHING w…
- merloutre7: I love that you talk about the weeds. Every time I…
- ggmorvaj: We keep 4-10 sheep(& their lambs for the spring/su…

]]>

### Recent Posts

- Two Letters Re: Hunter-Gatherer Mobile Survival
- Survival Gardening 10, self reliance emergency preparedness
- Consumer Spending, Personal Incomes “Weaker Than Expected”, Economists Optimistic for Second Half, I’m Not
- The Market Ticker – Oh, NatGas Isn’t What It Appears?
- Like A Swiss Watch The Daily Risk Spread Divergence Is Here, On Less Than Vapor Volume
- Canned Food Groups for Survival Storage