**A brief video introduction to the MOOC**

**Course Curriculum**

- Welcome to Chapter 1! (3:39)
- OpenAI Gym Installation (12:43)
- Jupyter Installation (9:32)
- Setting Up a RL Problem (12:54)
- Exercise: Set Up the MountainCar-v0 Environment
- The Agent and Its Environment (15:37)
- Exercise: Investigate Observations in the MountainCar-v0 Environment
- Actions (18:21)
- Exercise: Actions in the MountainCar Environment
- Rewards (19:09)
- Exercise: Investigate Rewards in the MountainCar Environment
- Goals and Corresponding Reward Functions (16:03)
- Episodes (23:55)
- Exercise: What are the Terminal States in MountainCar?
- Exercise: Calculate Average Total Reward Per Episode
- See You in Chapter 2! (3:29)

- Markov Decision Process (22:45)
- RL vs. Other Forms of ML (Supervised/Unsupervised Learning) (11:38)
- Policy (20:34)
- Exercise: Implement the Sampling Function for the Epsilon Pole Direction Policy
- Model Based vs. Model Free Learning (6:57)
- Modifying Gym Environments With Wrappers (56:40)
- Exercise: Modify the CartPole-v0 Environment to Return Rounded Observations
- Value Function (20:50)
- Calculating Value Function Samples (25:56)
- Exercise: Calculate Value Samples for Pole Direction Policy
- Discounted Reward Sum (34:33)
- Exercise: Calculate Value Samples Using Discounted Reward Sum
- Action Value Function (20:35)
- Calculating Q Values (28:06)
- Exercise: Calculate Average Values of States Over Many Episodes
- Bellman Expectation Equation for the Value Function (21:09)
- Exercise: Verify the Bellman Expectation Equation for the Q-Value Function
- Comparing Policies (14:22)
- Policy Improvement (25:22)
- Greedy Policy Improvement in CartPole-v0 (22:51)
- Exercise: Implement Greedy Policy Sampling Function with Random Tie Breaking
- Exercise: Implementing Sampling Functions for any Environment with Discrete Actions
- Optimal Policy (4:46)
- Exploration vs. Exploitation (20:27)
- Exercise: Plot growth of state-action pairs in exploration mode
- Epsilon Greedy Policy (15:40)
- Iterative Epsilon Greedy Policy Improvement (19:06)
- Exercise: Implement an Exponential Schedule for Epsilon
- Important Announcement

**What's inside?**

**Implement famous Deep Reinforcement Learning algorithms**

We will start with classical techniques like *SARSA* and end by implementing famous Deep Reinforcement Learning algorithms like *Deep Q Network* and *Proximal Policy Optimization*.

**Perfect for beginners**

The course is perfect for beginners who are just starting out with Reinforcement Learning. Prerequisites are kept to a minimum. I only assume that you know high school math (probability, calculus), Object Oriented Python programming and a bit of NumPy. No prior Machine Learning or Deep Learning knowledge is needed.

**Learn while coding**

- You will learn Reinforcement Learning while your hands are on the keyboard.
*Implementation details*of complex algorithms are covered end-to-end, so that you learn the theory and can translate it to powerful RL agents.

**Solve challenging practical projects**

By the end of the course, you will have solved 5 *OpenAI Gym environments**.* You will* *train bots to walk, play games, and much more.

**Practice with coding exercises**

Watched the videos but want to confirm that you learned something? The course is peppered with coding exercises in the form of Jupyter Notebooks so that you can test your knowledge of Reinforcement Learning tools and concepts.

**Best practices included**

You will learn how to

- write modular and extensible PEP8 compatible Python code
- make your RL experiments reproducible
- log/monitor training and testing sessions
- compare performance with different hyperparameters

These best practices will increase your chances of success in solving problems and communicating results.

**Use powerful Python frameworks and packages**

You will learn to use *Python*, *Keras*, *OpenAI Gym* and *Google Cloud* to implement Reinforcment Learning algorithms.

- This RL toolchain lets you avoid boring boilerplate code, ensuring you can code RL bots fast.
- Yet, they are powerful and customizable enough to solve almost any Reinforcement Learning problem.

**Projects in this course**

**CartPole-v0**

Balance a pole on a cart.

**MountainCar-v0**

Drive up a big hill with an underpowered car.

**LunarLander-v2**

Navigate a spacecraft on its landing pad.

**BipedalWalker-v2**

Train a bipedal robot to walk.

**Pong-v0**

Maximize score in the game Pong, with screen images as input

**About the instructor**

Hi, I am Dibya. I am a Senior Python Programmer and Python community leader based in the beautiful city of Munich, Germany. I wear many hats.

- Test Automation Engineer in the car industry
- Reinforcement learning Engineer in the finance sector
- Web Backend Developer
- Guitarist in an alternative rock band.

I love the Python programming language for its simplicity and its vibrant community.

- I myself co-lead PyMunich, a Python community with more than 3000 enthusiastic Python developers.
- I also like teaching. Other than this course, I made a course on Unit Testing for Data Scientists at DataCamp.

I am fascinated by Reinforcement Learning and how much it resembles how we ourselves learn. Deep Reinforcement Learning is a young field and I was also a beginner in this topic not so long ago.

Believe me, I know how it feels to be a beginner and the struggles that beginners go through in such an academic subject. I hope to talk at your level, cover difficult topics with the necessary depth and patience, and always root the lectures in practical examples and code, so that you also feel that Reinforcement Learning is easy, fun and exciting!