Dynamic Treatment Regimes
Dynamic treatment regimes (DTRs) arise in situations where each patient receives a series of treatments at different time points (stages). This situation often arises in treatment of chronic diseases such as sepsis or HIV. The goal is to find the "best" (optimal) treatment regime.
Semi-supervised DTRs with estimation based approach:
In reality, precise information on the treatment outcome can be scarce, but a large amount of outcome related information can still be cheaply available. This gives rise to the so-called semi-supervised setting. We construct semi-supervised DTRs using linear Q-learning, and rigorously establish that good quality additional information on the outcome translates to increased stability of the estimators in large samples. For more information, see our paper "Semi-Supervised Off Policy Reinforcement Learning".
Semi-supervised DTR setting in a nutshell
Fisher consistent surrogate loss for identifying optimal DTR:
Identification of the optimal DTR can be framed as a classification problem, giving rise to the recently popularized direct search methods. As a rule of thumb, classification methods replace the zero-one type losses with well-behaved, preferably convex, surrogate losses whenever the replacement can be justified (Fisher-consistency). We show that a wide class of convex surrogates fail under the DTR classification setting. We then construct a clsss of smooth non-convex Fisher consistent surrogates, which not only facilitates fast and scalable optimization by lending itself to stochastic gradient descent but also has sharp statistical garuantees. For more information, see our paper Finding the Optimal Dynamic Treatment Regime Using Smooth Fisher Consistent Surrogate Loss.
Bivariate zero-one loss function (left) and some examples of smooth Fisher consistent surrogates (right) for the simultaneous DTR optimization problem.