Summaries - Office of Research & Innovation
Back Testing Multiple Credit/Blame Assignment Methods for Learning
|Division||Research & Sponsored Programs|
|Department||NPS Naval Research Program|
|Investigator(s)||Rowe, Neil C.|
|Sponsor||NPS Naval Research Program (Navy)|
The recent success of the Libratus poker-playing program which uses "regret" minimization has suggested that this approach to reinforcement learning has some advantages over traditional methods of credit and blame assignment using reinforcement learning. Regret focuses on the amount of change between the choice taken and the choice missed and thus provides a double weighting compared to a traditional weighting based on the difference in evaluation values.
We propose to explore this idea for realistic mission-planning scenarios where adversaries counterplan in response to our plans. We will build game-theoretic models for both us and the adversary and play a large number of games with random variations in parameters. These will take advantage of our software for building plans using top-down goal-directed reasoning. Some games will use traditional reinforcement learning and some will use regret minimization. We will compare their performance statistically.
The application area for testing will be planning for cyberwarfare. Cyberwarfare provides many untried options, so it makes a good testbed. Our previous work (in Introduction to Cyberdeception, Springer, 2016) built software for planning deceptions and analyzed decision trees to find the most effective plans. The planning machinery can generate realistic attack plans such as rootkit installation. We can assign probabilities and costs to various attacker responses to deception used against those attack plans, and use this to plan the best series of deceptions against them. We will apply regret minimization to these examples we have already developed and see if it improves on a classic approach.
Deliverables include software for implementing regret minimization for games and a report summarizing our experimental results.
|Publications||Publications, theses (not shown) and data repositories will be added to the portal record when information is available in FAIRS and brought back to the portal|
|Data||Publications, theses (not shown) and data repositories will be added to the portal record when information is available in FAIRS and brought back to the portal|