Paper Review on Optimal Execution of Portfolio Transactions by Almgren and Chriss

Focus: Enhancing the Almgren-Chriss model with a linear price impact assumption using a reinforcement learning (RL) approach.
Method: Q-learning is used to optimize the trading proportion of stocks that must be liquidated or traded within a fixed period.
Dataset: South African stocks.
Performance Measurement:
- Comparison of the proposed algorithm's improvements over the traditional Almgren-Chriss model.
- Evaluation of execution risk through the model's standard deviation.
Findings:
- The proposed algorithm reduces trading costs.
- The standard deviation increases, indicating higher execution risk.
- Minimizing trading costs involves accepting a certain level of execution risk.
Price Impact in Stock Trading:
- Temporary Price Impact: Transient price changes.
- Permanent Price Impact: Affects prices until liquidation is complete.
Price and Trading Representation:
- Stock prices modeled as discrete arithmetic random walk.
- Price adjusted by volatility and permanent impact based on trading rate.
- Actual price reflects temporary impact using the h function.
  
  hh
Capture of Trajectory:
- Sum of actual prices multiplied by the number of shares sold at each step.
- Used to calculate total trading cost (implementation shortfall).
Objective Function:
- Aims to reduce execution cost and variance.
- Balances speed of execution and price movement risk.
- Solution involves hyperbolic sine and cosine functions.
- Solution decreases with parameter k.
Reinforcement Learning Approach:
- Q-learning used to find optimal value function, policy, and action-value function.
- Actions based on current state, rewards reflected, Q-values updated.
- Finite horizon MDP defined to reflect stock market characteristics.
- Novel transformation of N-step non-stationary MDP into infinite horizon.
- Artificial reward-free absorbing state added.
Implementation of Optimal Liquidation:
- Observable attributes represented as vectors.
- Agent increases trading when spread is narrow, volume is high.
- Action involves determining trade proportion.
- Reward function based on Almgren-Chriss' implementation shortfall.
- Considers splitting into child orders and order book travel.
Model Assumptions and Results:
- Limit order book resilient to temporary price impacts.
- Permanent impact parameter set to zero.
- Parameters set for AC model, Q-matrix updated to minimize IS.
- Improvement in IS, especially over shorter periods (e.g., 10.3% over 4 days).
- Bias due to absence of permanent price impact.
- Significant improvements during high-volume times.
Conclusion:
- RL model shows overall improvement over Almgren-Chriss model.
- Further research needed to include variance of execution in reward function for statistical verification.