Advanced RL

This post will organize some ideas and the topology of advanced reinforcement learning. Once you understand the investment needed for current ML solution it becomes apparent how important this is.

The problem with so many deployed ML solutions is fragility and data consumption. These problems have been known and a variety of solution explored. The obvious and intuitive thought process that I had and it appears everyone else as well is that stochastically acting and backproping through carefully balanced datasets is an intermediate soution. The next step is to model the physics and predict outcomes that will lead to success.

The end goal is sample efficiency in which out system can get things right the first time. Sample efficiency IMO includes not only less data but less annotations, less “tuning” etc, less contrived rewards functions. Your deployed system needs to predict what will happen, how certain it is that those prediction are correct and what information is missing to help those predictions.

To start I’m going to list some interesting reads with a quick summary

Dyna Framework – 1993 this is an early paper that digs into these issues

https://www.researchgate.net/publication/2342119_Efficient_Learning_and_Planning_Within_the_Dyna_Framework

Pilco – This is a 2011 low data model based RL paper that uses probability distributions to make one step predictions.

https://www.researchgate.net/publication/221345233_PILCO_A_Model-Based_and_Data-Efficient_Approach_to_Policy_Search

Learning Neural Network Policies – 2013 builds on pilco for low latent space, low sample size RL tasks

https://people.eecs.berkeley.edu/~svlevine/papers/mfcgps.pdf

Deep Pilco – 2016

http://mlg.eng.cam.ac.uk/yarin/PDFs/DeepPILCO.pdf

I2A – The start of higher dimension model based RL

https://arxiv.org/pdf/1707.06203v1.pdf

Deep reinforcement learning with handful of trials – 2018 havent even skimmed it yet

https://arxiv.org/abs/1805.12114

ME-TRPO 2018 Its trpo with a Model Ensemble physics simulator. Model ensemble is a technique used in other areas and applied to building a physics model.

https://arxiv.org/abs/1802.10592

MB-MPO 2018 model based meta policy optimization. Uses model ensemble with a focus on asymptotic performance

https://arxiv.org/pdf/1809.05214.pdf

Learning accurate Long Term dynamics – 2020 paper that focuses on moving away from iterative one step models to long term predictions. Plus their youtube channel.

https://arxiv.org/pdf/2012.09156.pdf

https://www.youtube.com/channel/UCfC6OEA7QQi5fvu46PVxc4A/videos

Dynamic planning networks – 2019

https://arxiv.org/pdf/1812.11240.pdf

Dream to control and previous – 2020 Here is a link to this guy blog. Toronto and google research on model based RL that uses high dimensionality reduction to latent space and uses a mix of networks to make certain predictions.

https://danijar.com/project/dreamer/ <- newest

https://arxiv.org/pdf/1811.04551.pdf <-example of older research

Here are some cool up to date blogs and video collections

Berkley has a blog and RL learning course on youtube.

https://people.eecs.berkeley.edu/~pabbeel/cs287-fa19/ <-videos

https://bair.berkeley.edu/blog/ <- blog

This guys blog is good. Same from above posts

https://danijar.com/