We propose a general framework of risk-averse reinforcement learning for algorithmic trading. Our approach is tested in an experiment based on 1.5 years of millisecond time-scale limit order data from NASDAQ, which contain the data around the 2010 flash crash. The results show that our algorithm...