Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden,
ABSTRACT (maximum 200 words)We consider a scenario in which a single Red wishes to shoot at a collection of Blue targets, one at a time, to maximise some measure of return obtained from Blues killed before Red's own (possible) demise. Such a situation arises in various military contexts such as the conduct of air defence by Red in the face of Blue SEAD (suppression of enemy air defences). A class of decision processes called multi-armed bandits has been previously deployed to develop optimal policies for Red in which she attaches a calibrating (Gittins) index to each Blue target and optimally shoots next at the Blue with largest index value. The current paper seeks to elucidate how a range of developments of index theory are able to accommodate features of such problems which are of practical military import. Such features include levels of risk to Red which are policy dependent, Red having imperfect information about the Blues she faces, an evolving population of Blue targets and the possibility of Red disengagement. The paper concludes with a numerical study which both compares the performance of (optimal) index policies to a range of competitors and also demonstrates the value to Red of (optimal) disengagement.
NUMBER OF PAGES 2714. SUBJECT TERMS multi-armed bandits, Gitten Indices, suppression of enemy air defense
PRICE CODE
SECURITY CLASSIFICATION OF REPORT Unclassified
SECURITY CLASSIFICATION OF THIS PAGE Unclassified
SECURITY CLASSIFICATION OF ABSTRACT Unclassified
LIMITATION OF ABSTRACT
UL iiIndex policies for shooting problems
AbstractWe consider a scenario in which a single Red wishes to shoot at a collection of Blue targets, one at a time, to maximise some measure of return obtained from Blues killed before Red's own (possible) demise. Such a situation arises in various military contexts such as the conduct of air defence by Red in the face of Blue SEAD (suppression of enemy air defences). A class of decision processes called multi-armed bandits has been previously deployed to develop optimal policies for Red in which she attaches a calibrating (Gittins) index to each Blue target and optimally shoots next at the Blue with largest index value. The current paper seeks to elucidate how a range of developments of index theory are able to accommodate features of such problems which are of practical military import. Such features include levels of risk to Red which are policy dependent, Red having imperfect information about the Blues she faces, an evolving population of Blue targets and the possibility of Red disengagement. The paper concludes with a numerical study which both c...