This paper presents a systematic exploration of deep reinforcement learning (RL) for portfolio optimization and compares various agent architectures, such as the DQN, DDPG, PPO, and SAC. We evaluate these agents’ performance across multiple market signals, including OHLC price data and technical indicators, while incorporating different rebalancing frequencies and historical window lengths. This study uses six major financial indices and a risk-free asset as the core instruments. Our results show that CNN-based feature extractors, particularly with longer lookback periods, significantly outperform MLP models, providing superior risk-adjusted returns. DQN and DDPG agents consistently surpass market benchmarks, such as the S&P 500, in annualized returns. However, continuous rebalancing leads to higher transaction costs and slippage, making periodic rebalancing a more efficient approach to managing risk. This research offers valuable insights into the adaptability of RL agents to dynamic market conditions, proposing a robust framework for future advancements in financial machine learning.