Channels Resources Recent Items Reading list HomeRegisterLoginSupportContact

Authors: Aaron Wilson Alan Fern Prasad Tadepalli
Details: | Google Scholar CiteSeer X DBLP Database
View PDF
We consider the problem of learning control policies via trajectory preference queries to an expert. In particular, the agent presents an expert with short runs of a pair of policies originating from the same state and the expert indicates which trajectory is preferred. The agent's goal is to elicit a latent target policy from the expert with as few queries as possible. To tackle this problem we propose a novel Bayesian model of the querying process and introduce two methods that exploit this model to actively select expert queries. Experimental results on four benchmark problems indicate that our model can effectively learn policies from trajectory preference queries and that active query selection can be substantially more efficient than random selection.
Item Details
Status: updated [Success]
Update: last updated 12/09/2012, 09:38 PM

2139 users, 707 channels, 351 resources, 60029 items