• Focus: Enhancing the Almgren-Chriss model with a linear price impact assumption using a reinforcement learning (RL) approach.

  • Method: Q-learning is used to optimize the trading proportion of stocks that must be liquidated or traded within a fixed period.

  • Dataset: South African stocks.

  • Performance Measurement:

    • Comparison of the proposed algorithm's improvements over the traditional Almgren-Chriss model.
    • Evaluation of execution risk through the model's standard deviation.
  • Findings:

    • The proposed algorithm reduces trading costs.
    • The standard deviation increases, indicating higher execution risk.
    • Minimizing trading costs involves accepting a certain level of execution risk.
  • Price Impact in Stock Trading:

    • Temporary Price Impact: Transient price changes.
    • Permanent Price Impact: Affects prices until liquidation is complete.
  • Price and Trading Representation:

    • Stock prices modeled as discrete arithmetic random walk.

    • Price adjusted by volatility and permanent impact based on trading rate.

    • Actual price reflects temporary impact using the h function.

      hh

  • Capture of Trajectory:

    Untitled

    • Sum of actual prices multiplied by the number of shares sold at each step.
    • Used to calculate total trading cost (implementation shortfall).
  • Objective Function:

    Untitled

    • Aims to reduce execution cost and variance.
    • Balances speed of execution and price movement risk.
    • Solution involves hyperbolic sine and cosine functions.
    • Solution decreases with parameter k.
  • Reinforcement Learning Approach:

    • Q-learning used to find optimal value function, policy, and action-value function.
    • Actions based on current state, rewards reflected, Q-values updated.
    • Finite horizon MDP defined to reflect stock market characteristics.
    • Novel transformation of N-step non-stationary MDP into infinite horizon.
    • Artificial reward-free absorbing state added.
  • Implementation of Optimal Liquidation:

    Untitled

    • Observable attributes represented as vectors.
    • Agent increases trading when spread is narrow, volume is high.
    • Action involves determining trade proportion.
    • Reward function based on Almgren-Chriss' implementation shortfall.
    • Considers splitting into child orders and order book travel.
  • Model Assumptions and Results:

    Untitled

    • Limit order book resilient to temporary price impacts.
    • Permanent impact parameter set to zero.
    • Parameters set for AC model, Q-matrix updated to minimize IS.
    • Improvement in IS, especially over shorter periods (e.g., 10.3% over 4 days).
    • Bias due to absence of permanent price impact.
    • Significant improvements during high-volume times.
  • Conclusion:

    • RL model shows overall improvement over Almgren-Chriss model.
    • Further research needed to include variance of execution in reward function for statistical verification.