
This post will organize some ideas and the topology of advanced reinforcement learning. Once you understand the investment needed for current ML solution it becomes apparent how important this is.
The problem with so many deployed ML solutions is fragility and data consumption. These problems have been known and a variety of solution explored. The obvious and intuitive thought process that I had and it appears everyone else as well is that stochastically acting and backproping through carefully balanced datasets is an intermediate soution. The next step is to model the physics and predict outcomes that will lead to success.
The end goal is sample efficiency in which out system can get things right the first time. Sample efficiency IMO includes not only less data but less annotations, less “tuning” etc, less contrived rewards functions. Your deployed system needs to predict what will happen, how certain it is that those prediction are correct and what information is missing to help those predictions.
To start I’m going to list some interesting reads with a quick summary
Dyna Framework – 1993 this is an early paper that digs into these issues
Pilco – This is a 2011 low data model based RL paper that uses probability distributions to make one step predictions.
Learning Neural Network Policies – 2013 builds on pilco for low latent space, low sample size RL tasks
https://people.eecs.berkeley.edu/~svlevine/papers/mfcgps.pdf
Deep Pilco – 2016
http://mlg.eng.cam.ac.uk/yarin/PDFs/DeepPILCO.pdf
I2A – The start of higher dimension model based RL
https://arxiv.org/pdf/1707.06203v1.pdf
Deep reinforcement learning with handful of trials – 2018 havent even skimmed it yet
https://arxiv.org/abs/1805.12114
ME-TRPO 2018 Its trpo with a Model Ensemble physics simulator. Model ensemble is a technique used in other areas and applied to building a physics model.
https://arxiv.org/abs/1802.10592
MB-MPO 2018 model based meta policy optimization. Uses model ensemble with a focus on asymptotic performance
https://arxiv.org/pdf/1809.05214.pdf
Learning accurate Long Term dynamics – 2020 paper that focuses on moving away from iterative one step models to long term predictions. Plus their youtube channel.
https://arxiv.org/pdf/2012.09156.pdf
https://www.youtube.com/channel/UCfC6OEA7QQi5fvu46PVxc4A/videos
Dynamic planning networks – 2019
https://arxiv.org/pdf/1812.11240.pdf
Dream to control and previous – 2020 Here is a link to this guy blog. Toronto and google research on model based RL that uses high dimensionality reduction to latent space and uses a mix of networks to make certain predictions.
https://danijar.com/project/dreamer/ <- newest
https://arxiv.org/pdf/1811.04551.pdf <-example of older research
Here are some cool up to date blogs and video collections
Berkley has a blog and RL learning course on youtube.
https://people.eecs.berkeley.edu/~pabbeel/cs287-fa19/ <-videos
https://bair.berkeley.edu/blog/ <- blog
This guys blog is good. Same from above posts