CMA_MOMAB
Upper confidence bound (UCB) is a successful multiarmed bandit for regret minimization. The covariance matrix adaptation (CMA) for Pareto UCB (CMA-PUCB) algorithm considers stochastic reward vectors with correlated objectives. We upper bound the cumulative pseudoregret of pulling suboptimal arms for the CMA-PUCB algorithm to logarithmic number of arms K, objectives D, and samples n, O(ln(nDK) ∑i (||Σi||²/Δi)), using a variant of Berstein inequality for matrices, where Δi is the regret of pulling the suboptimal arm i. For unknown covariance matrices between objectives Σi, we upper bound the approximation of the covariance matrix using the number of samples to O(nln(nDK) + ln²(nDK) ∑i (1/Δi)). Simulations on a three objective stochastic environment show the applicability of our method.
Zitieren als
Drugan, Madalina. “Covariance Matrix Adaptation for Multiobjective Multiarmed Bandits.” IEEE Transactions on Neural Networks and Learning Systems, Institute of Electrical and Electronics Engineers (IEEE), 2019, pp. 1–10, doi:10.1109/tnnls.2018.2885123.
Kompatibilität der MATLAB-Version
Plattform-Kompatibilität
Windows macOS LinuxKategorien
- MATLAB > Mathematics > Sparse Matrices >
Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Live Editor erkunden
Erstellen Sie Skripte mit Code, Ausgabe und formatiertem Text in einem einzigen ausführbaren Dokument.
Versionen, die den GitHub-Standardzweig verwenden, können nicht heruntergeladen werden
Version | Veröffentlicht | Versionshinweise | |
---|---|---|---|
1.0.3 | Contains a Readme file |
|
|
1.0.2 | Comparison with uniform sampling
|
||
1.0.1 | A bug was detected
|
||
1.0.0 |