Building an Order Flow Research Pipeline: From Raw Tape to Tested Edges

What this project actually is

Most order flow content online stops at the chart. You get a footprint, a few arrows, and a story about who is in control. This project goes the other way: it treats order flow as data, builds the infrastructure to record it cleanly, and then asks the only question that matters for a trader. Does any of this survive contact with real trading costs and honest out-of-sample testing?

The full video tour walks through every piece below. The short version: a collector that records live market data around the clock, a visualizer that turns that data into footprint and microstructure charts, and a research hub where each idea becomes a tracked experiment with an equity curve and a statistical-power check.

What order flow is, briefly

Two things drive every price: the resting orders in the book, and the aggressive orders that cross the spread to hit them. Order flow is the record of that interaction. It tells you who was patient and who was in a hurry.

From that raw interaction you can extract a few families of ideas:

Imbalance and continuation. When one side is consistently more aggressive, the move often continues for a short horizon.
Absorption and reversal. When aggression hits a wall of passive orders and price stops moving, the move may be exhausting.
Liquidity and positioning. Open interest, funding, and liquidations show where leverage is building and where it gets flushed.

These are hypotheses, not rules. The project exists to test which ones hold.

The collector: recording the tape

You cannot research order flow you did not record. Exchanges stream it live and then it is gone. So the first piece is a collector running 24/7 on a small cloud server, recording from a major crypto venue across a handful of liquid markets.

It captures the building blocks of microstructure: executed trades with their aggressor side, order book updates, liquidations, and the open interest and funding signals that show leverage shifting. Everything is written to dated files so the dataset grows one clean day at a time.

At the time of the video this had accumulated roughly four weeks of continuous data. That sounds like a lot until you remember how few independent setups a month actually contains. More on that honesty problem below.

The visualizer: making the data readable

Raw order flow is millions of rows. The visualizer renders it the way a discretionary order flow trader would want to see it: footprint bars showing bid versus ask volume at each price, cumulative volume delta, funding, liquidation events, open interest, and a depth view of where size is resting.

This is not just for looks. Seeing the data is how you catch collection bugs, sanity-check a signal before you code it, and understand why an experiment behaved the way it did. A backtest you cannot visualize is a backtest you cannot trust.

The hub: turning ideas into honest experiments

The core of the project is a research hub. Every idea becomes a numbered experiment with the same scaffolding: a clear hypothesis, an out-of-sample test, an equity curve, and a verdict. The hub shows a leaderboard of what passed, what was killed, and what is still a lead, plus robustness and statistical-power views so a good-looking curve cannot hide a weak sample.

The discipline is deliberate. Anyone can find a curve that goes up on the data they trained on. The whole point of the hub is to make it hard to fool yourself.

Example one: order flow imbalance

The first family tested was imbalance. The intuition is simple and well documented: measure the net pressure arriving in the book over a short window, and lean in the direction of that pressure.

The signal is real. There is a clear short-horizon relationship between imbalance and the next move. The problem shows up the moment you add costs. Trading every flicker of imbalance churns through fees and spread faster than the edge accumulates. A more selective, windowed version of the signal got much closer to surviving costs and became a genuine lead, but on the current sample its statistical power is still too low to call it a tradable edge. A promising direction, not a finished product.

Example two: footprint absorption

The second example is the absorption idea from discretionary order flow trading, written down as a testable rule. When aggressive orders pile into a level and price refuses to advance, treat it as possible exhaustion and look for a reversal.

This one produced the most encouraging out-of-sample behavior in the project so far, with a clean look-ahead audit confirming it was not quietly peeking at future data. But it carries the same caveat: too few independent trades over four weeks to be statistically confident. It is the strongest lead on the board and the clearest argument for collecting more data before risking real size. The video walks through its equity curve in detail.

I am deliberately keeping the exact features, thresholds, and tuning out of this post. The point here is the method, not a recipe to copy.

Why I report leads, not edges

The most important habit in this project is refusing to promote a lead to an edge too early. A strong return over a short window with few trades is exactly what randomness looks like. The hub uses a deflated performance measure that penalizes both short samples and the number of variations tried, so a lucky curve gets discounted automatically.

That is also why none of this is investment advice or a signal to trade. It is a research log. Trading is hard, costs are real, and most signals that look great on a month of data do not repeat. Saying so plainly is the difference between research and marketing.

What is left to do

Three things, in order:

Collect more data. The single biggest lever right now is time. More weeks means more independent setups, which is the only way a promising lead earns enough statistical power to trust.
Combine signals as confluence. The most interesting next step is using order flow features not as standalone strategies but as filters on top of structural setups, where they tend to add the most.
Paper trade, then live in small size. A lead that survives more data graduates to forward testing on live markets before any meaningful capital is involved.

If you are doing your own order flow or machine learning research and want a second set of eyes on whether your edge is real or just well-fit, that is exactly the kind of work I do. The video below is the full tour of everything above.

Frequently asked questions

Can you actually predict price from order flow data?

Order flow carries real short-horizon information. Imbalance between resting and aggressive volume has a measurable relationship with the very next move. The hard part is not finding a signal, it is finding one that still pays after fees, spread, and slippage, and that holds up out of sample rather than fitting noise. Most raw order flow signals are real but too small to survive trading costs.

What is order flow imbalance (OFI)?

Order flow imbalance measures the net pressure arriving in the order book over a short window: how much size is being added or pulled on the bid side versus the ask side, plus the aggressor side of executed trades. A strong positive imbalance means buyers are pushing harder than sellers right now. It is one of the most studied microstructure features because it is fast, intuitive, and has a documented short-horizon relationship with price.

What is footprint absorption?

Absorption happens when one side keeps sending aggressive orders into a price level but the price refuses to move, because passive limit orders on the other side keep filling them. On a footprint chart you see heavy volume at a level with little price progress. The idea being tested is that visible absorption can mark exhaustion of a move and precede a reversal. It is a hypothesis to validate, not a guarantee.

Why does statistical power matter more than the headline return?

A backtest can show a strong return over a short window purely by luck, especially with few trades. The deflated Sharpe ratio adjusts the result for how many variations were tried and how short the sample is. A promising return with low statistical power is a lead worth collecting more data on, not a finished edge worth trading real size. Treating leads as edges is how most retail strategies blow up.

Worried your backtest is fitting noise?

Use our free Strategy Overfitting Score to estimate how much of your backtested edge is likely real versus a product of too many tries on too little data. Or book a free 30-minute diagnostic to talk through your research process.

Score your strategy for overfitting Book free diagnostic