MURAL - Maynooth University Research Archive Library



    A Least-Squares Temporal Difference based method for solving resource allocation problems


    Forootani, Ali and Tipaldi, Massimo and Zarch, Majid Ghaniee and Liuzza, Davide and Glielmo, Luigi (2020) A Least-Squares Temporal Difference based method for solving resource allocation problems. IFAC Journal of Systems and Control, 13. p. 100106. ISSN 24686018

    [img]
    Preview
    Download (2MB) | Preview


    Share your research

    Twitter Facebook LinkedIn GooglePlus Email more...



    Add this article to your Mendeley library


    Abstract

    Value function approximation has a central role in Approximate Dynamic Programming (ADP) to overcome the so-called curse of dimensionality associated to real stochastic processes. In this regard, we propose a novel Least-Squares Temporal Difference (LSTD) based method: the ‘‘Multi-trajectory Greedy LSTD’’ (MG-LSTD). It is an exploration-enhanced recursive LSTD algorithm with the policy improvement embedded within the LSTD iterations. It makes use of multi-trajectories Monte Carlo simulations in order to enhance the system state space exploration. This method is applied for solving resource allocation problems modeled via a constrained Stochastic Dynamic Programming (SDP) based framework. In particular, such problems are formulated as a set of parallel Birth–Death Processes (BDPs). Some operational scenarios are defined and solved to show the effectiveness of the proposed approach. Finally, we provide some experimental evidence on the MG-LSTD algorithm convergence properties in function of its key-parameters. © 2020 Elsevier Ltd. All right

    Item Type: Article
    Additional Information: Cite as: Ali Forootani, Massimo Tipaldi, Majid Ghaniee Zarch, Davide Liuzza, Luigi Glielmo, A Least-Squares Temporal Difference based method for solving resource allocation problems, IFAC Journal of Systems and Control, Volume 13, 2020, 100106, ISSN 2468-6018, https://doi.org/10.1016/j.ifacsc.2020.100106.
    Keywords: Least-squares temporal difference; Approximate dynamic programming; Monte Carlo simulations; Markov decision process; Birth–death process;
    Academic Unit: Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Item ID: 16105
    Identification Number: https://doi.org/10.1016/j.ifacsc.2020.100106
    Depositing User: Ali Forootani
    Date Deposited: 15 Jun 2022 11:54
    Journal or Publication Title: IFAC Journal of Systems and Control
    Publisher: Elsevier
    Refereed: Yes
    URI:
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only(login required)

    View Item Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads