Policy Gradient

Definition

The objective of a Reinforcement Learning Policy Gradient agent is to maximize the “expected” reward when following a policy

References

Policy Gradients in a Nutshell. Towards Data Science. Link.

json

D3FEND^™

A knowledge graph of cybersecurity countermeasures

1.4.0

Use of the MITRE D3FEND™ Knowledge Graph and website is subject to the Terms of Use. Use of the MITRE D3FEND website is subject to the MITRE D3FEND Privacy Policy. MITRE D3FEND is funded by the National Security Agency (NSA) Cybersecurity Directorate and managed by the National Security Engineering Center (NSEC) which is operated by The MITRE Corporation. MITRE D3FEND; and the MITRE D3FEND logo are trademarks of The MITRE Corporation. This software was produced for the U. S. Government under Basic Contract No. W56KGU-18-D-0004, and is subject to the Rights in Noncommercial Computer Software and Noncommercial Computer Software Documentation Clause 252.227-7014 (FEB 2012)
© 2025 The MITRE Corporation.
Approved for Public Release; Distribution Unlimited #20-2338 and #23-1207.

UI: 0.23.1