This post will organize some ideas and the topology of advanced reinforcement learning. Once you understand the investment needed for current ML solution it becomes apparent how important this is.

The problem with so many deployed ML solutions is fragility and data consumption. These problems have been known and a variety of solution explored. The obvious and intuitive thought process that I had and it appears everyone else as well is that stochastically acting and backproping through carefully balanced datasets is an intermediate soution. The next step is to model the physics and predict outcomes that will lead to success.

The end goal is sample efficiency in which out system can get things right the first time. Sample efficiency IMO includes not only less data but less annotations, less “tuning” etc, less contrived rewards functions. Your deployed system needs to predict what will happen, how certain it is that those prediction are correct and what information is missing to help those predictions.

To start I’m going to list some interesting reads with a quick summary

Dyna Framework – 1993 this is an early paper that digs into these issues

Pilco – This is a 2011 low data model based RL paper that uses probability distributions to make one step predictions.

Learning Neural Network Policies – 2013 builds on pilco for low latent space, low sample size RL tasks

Deep Pilco – 2016

I2A – The start of higher dimension model based RL

Deep reinforcement learning with handful of trials – 2018 havent even skimmed it yet

ME-TRPO 2018 Its trpo with a Model Ensemble physics simulator. Model ensemble is a technique used in other areas and applied to building a physics model.

MB-MPO 2018 model based meta policy optimization. Uses model ensemble with a focus on asymptotic performance

Learning accurate Long Term dynamics – 2020 paper that focuses on moving away from iterative one step models to long term predictions. Plus their youtube channel.

Dynamic planning networks – 2019

Dream to control and previous – 2020 Here is a link to this guy blog. Toronto and google research on model based RL that uses high dimensionality reduction to latent space and uses a mix of networks to make certain predictions. <- newest <-example of older research

Here are some cool up to date blogs and video collections

Berkley has a blog and RL learning course on youtube. <-videos <- blog

This guys blog is good. Same from above posts

Leave a comment

Your email address will not be published. Required fields are marked *