π Market Snapshot
π Run Backtest
π§ Model Explanation
Architecture: This is a hybrid system combining Deep Learning and Deep Reinforcement Learning.
LSTM (Deep Learning): The LSTM model predicts short-term market returns using sequential price and technical indicators. Training converged smoothly, with both training and validation loss stabilizing around 0.0025, indicating strong generalization and minimal overfitting.
PPO (Reinforcement Learning): The PPO agent uses the LSTMβs predictions as part of its state and learns optimal trading actions (Buy / Sell / Hold). Training showed low KL divergence and stable entropy, reflecting controlled and stable learning behavior in a stochastic market.
Training Outcome: Over 300,000 timesteps, the PPO agent achieved an approximate +44% portfolio return during training.
Key Idea: LSTM predicts what may happen next, PPO decides what action to take. This separation of prediction and decision-making creates a robust and realistic trading framework.