# 1 overview

### What is RL Trading?​

• RL Trading is a side project of mine, it started with my idol Elon Musk advertising Doge Coin.
• As a Physicist and programmer, the first day I stepped into the realm of investing, I had simulations and statistical analysis in mind.
• At the beginning, I wanted to see if Reinforcement learning is good for trading. Unfortunately, it performed much worse than algorithms designed by myself. This reminds me of my work on Quantum Optimal Control: classical methods that we know exactly how they work perform much better than RL. Deep Reinforcement Learning Doesn't Work Yet
• However, I would still like to keep the name RL Trader. You may understand it as Riemann-Lagrange Trader, tribute to my love for Physics.

### Backtesting​

• BTC trend for the backtesting period (I chose this period on purpose because anybody can make money during big bulls):
• 2021-04-30 to 2022-06-12, 5 minutes resolution
• I trade 327 different cryptocurrencies on Binance Spot (trading fee 0.1% + 0.1%) without filtering out the "shitty" coins. This is to see how robust the algorithm is against "shitty" coins. Of course I can achieve much better results if I just filter out the bad coins, but no one can guarantee "good" coins like Bitcoin won't one day collapse like LUNA.
• I also tried shorting on Binance Futures (137 "good" pairs, trading fee 0.02% + 0.04%) with leverage. But the result suggests Spot is more profitable and less risky.
• The above translates to 32,467,184 steps for each simulation • Backtesting result:
• 0.01 means 1%
• 5 slots, no leverage. For example, if I buy BTC and the price go up 20%, I make 20% / 5 = 4% • Relationship between cumsum and cumprod:
\begin{align*} & \text{let } p \text{ be the average profit, which is small} \\ & \text{then cumsum } S_n \approx \sum_{k=1}^n p = n p \\ & \text{and cumprod } P_n \approx \prod_{k=1}^n (1+p) = (1+p)^n = (1+ {1\over 1/p})^{(1/p) (np)} \approx e^{S_n} \\ & \text{using } \lim_{x\to \infty} (1+{1\over x})^x = e \end{align*}
• drawdown_cumsum and drawdown_cumprod are defined as:
def drawdown_cumsum(cumsum):    max_cumsum = -1    drawdown = np.zeros_like(cumsum)    for i,s in enumerate(cumsum):        max_cumsum = max(max_cumsum, s)        drawdown[i] = s - max_cumsum    return drawdown def drawdown_cumprod(cumprod):    max_cumprod = 0    drawdown = np.zeros_like(cumprod)    for i,p in enumerate(cumprod):        max_cumprod = max(max_cumprod, p)        drawdown[i] = p / max_cumprod - 1    return drawdown class RandomTrader:    x = {        'slots': [5, Integer(3, 10)],        'take_profit': [0.01, Real(0.01, 0.1)],        'stop_loss': [-0.2, Real(-0.5, -0.1)],        'timeout': ,    }    def make_signals(s, df):        r = np.random.rand(len(df['close']))        df['buy'] = r > 0.99        df['sell'] = r < 0.01        return df 