Appendix¶

pseudocode algorithms¶

Fig. 12 algorithm for the policy iteration¶

Fig. 13 algorithm for the value iteration¶

Fig. 14 algorithm for MC method on-policy¶

Fig. 15 algorithm for TD(0)¶

ADBB17: Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866, 2017.
AKAM+21: Ahmad Taher Azar, Anis Koubaa, Nada Ali Mohamed, Habiba A Ibrahim, Zahra Fathy Ibrahim, Muhammad Kazim, Adel Ammar, Bilel Benjdira, Alaa M Khamis, Ibrahim A Hameed, and others. Drone deep reinforcement learning: a review. Electronics, 10(9):999, 2021.
BI99: Leemon C Baird III. Reinforcement learning through gradient descent. Technical Report, CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE, 1999.
BHB+20: André Barreto, Shaobo Hou, Diana Borsa, David Silver, and Doina Precup. Fast reinforcement learning with generalized policy updates. Proceedings of the National Academy of Sciences, 117(48):30079–30087, 2020.
BFH17: Qianwen Bi, Michael Finke, and Sandra J Huston. Financial software use and retirement savings. Journal of Financial Counseling and Planning, 28(1):107–128, 2017.
BDTS20: Rachel Qianwen Bi, Lukas R Dean, Jingpeng Tang, and Hyrum L Smith. Limitations of retirement planning software: examining variance between inputs and outputs. Journal of Financial Service Professionals, 2020.
Blo18: Daniel Alexandre Bloch. Machine learning: models and algorithms. Machine Learning: Models And Algorithms, Quantitative Analytics, 2018.
BS11: Kenneth Bruhn and Mogens Steffensen. Household consumption, investment and life insurance. Insurance: Mathematics and Economics, 48(3):315–325, 2011.
CL20: Shou Chen and Guangbing Li. Time-inconsistent preferences, consumption, investment and life insurance decisions. Applied Economics Letters, 27(5):392–399, 2020.
DPMSNR14: Albert De-Paz, Jesus Marin-Solano, Jorge Navas, and Oriol Roch. Consumption, investment and life insurance strategies with heterogeneous discounting. Insurance: Mathematics and Economics, 54:66–75, 2014.
DH20: Matthew Dixon and Igor Halperin. G-learner and girl: goal based wealth management with reinforcement learning. arXiv preprint arXiv:2002.10990, 2020.
DHB20: Matthew F Dixon, Igor Halperin, and Paul Bilokon. Machine Learning in Finance: From Theory to Practice. Springer International Publishing AG, Cham, 2020. ISBN 9783030410674.
Dol10: Victor Dolk. Survey reinforcement learning. Eindhoven University of Technology, 2010.
DMBE18: Taft Dorman, Barry S Mulholland, Qianwen Bi, and Harold Evensky. The efficacy of publicly-available retirement planning tools. Available at SSRN 2732927, 2018.
EWC21: Maria K Eckstein, Linda Wilbrecht, and Anne GE Collins. What do reinforcement learning models measure? interpreting model parameters in cognition and neuroscience. Current Opinion in Behavioral Sciences, 41:128–137, 2021.
FPT15: Roy Fox, Ari Pakman, and Naftali Tishby. Taming the noise in reinforcement learning via soft updates. arXiv preprint arXiv:1512.08562, 2015.
FranccoisLHI+18: Vincent François-Lavet, Peter Henderson, Riashat Islam, Marc G Bellemare, and Joelle Pineau. An introduction to deep reinforcement learning. arXiv preprint arXiv:1811.12560, 2018.
GP13: Matthieu Geist and Olivier Pietquin. Algorithmic survey of parametric value function approximation. IEEE Transactions on Neural Networks and Learning Systems, 24(6):845–867, 2013.
GulerLP19: Batuhan Güler, Alexis Laignelet, and Panos Parpas. Towards robust and stable deep learning algorithms for forward backward stochastic differential equations. arXiv preprint arXiv:1910.11623, 2019.
Ham18: Ahmad Hammoudeh. A concise introduction to reinforcement learning. 2018.
HJ+20: Jiequn Han, Arnulf Jentzen, and others. Algorithms for solving high dimensional pdes: from nonlinear monte carlo to machine learning. arXiv preprint arXiv:2008.13333, 2020.
HJW17: Jiequn Han, Arnulf Jentzen, and E Weinan. Overcoming the curse of dimensionality: solving high-dimensional partial differential equations using deep learning. arXiv preprint arXiv:1707.02568, pages 1–13, 2017.
HJW18: Jiequn Han, Arnulf Jentzen, and E Weinan. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
Her11: Hal E Hershfield. Future self-continuity: how conceptions of the future self transform intertemporal choice. Annals of the New York Academy of Sciences, 1235:30, 2011.
KSL19: Sung-Kyun Kim, Oren Salzman, and Maxim Likhachev. Pomhdp: search-based belief space planning using multiple heuristics. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 29, 734–744. 2019.
KC21: Yeo Jin Kim and Min Chi. Time-aware q-networks: resolving temporal irregularity for deep reinforcement learning. arXiv preprint arXiv:2105.02580, 2021.
KS15: Morten Tolver Kronborg and Mogens Steffensen. Optimal consumption, investment and life insurance with surrender option guarantee. Scandinavian Actuarial Journal, 2015(1):59–87, 2015.
Leu94: Siu Fai Leung. Uncertain lifetime, the theory of the consumer, and the life cycle hypothesis. 1994.
Lev18: Sergey Levine. Reinforcement learning and control as probabilistic inference: tutorial and review. arXiv preprint arXiv:1805.00909, 2018.
MVHS14: Ashique Rupam Mahmood, Hado Van Hasselt, and Richard S Sutton. Weighted importance sampling for off-policy learning with linear function approximation. In NIPS, 3014–3022. 2014.
Mer69: Robert C Merton. Lifetime portfolio selection under uncertainty: the continuous-time case. The review of Economics and Statistics, pages 247–257, 1969.
Mer75: Robert C Merton. Optimum consumption and portfolio rules in a continuous-time model. In Stochastic Optimization Models in Finance, pages 621–661. Elsevier, 1975.
MBJ20a: Thomas M Moerland, Joost Broekens, and Catholijn M Jonker. A framework for reinforcement learning and planning. arXiv preprint arXiv:2006.15009, 2020.
MBJ20b: Thomas M Moerland, Joost Broekens, and Catholijn M Jonker. Model-based reinforcement learning: a survey. arXiv preprint arXiv:2006.16712, 2020.
MJ20: Amit Kumar Mondal and N Jamali. A survey of reinforcement learning techniques: strategies, recent development, and future directions. arXiv preprint arXiv:2001.06921, 2020.
NRC20: Muddasar Naeem, S Tahir H Rizvi, and Antonio Coronato. A gentle introduction to reinforcement learning and its application in different fields. IEEE Access, 2020.
NZKN19: Farzad Niroui, Kaicheng Zhang, Zendai Kashino, and Goldie Nejat. Deep reinforcement learning robot for search and rescue applications: exploration in unknown cluttered environments. IEEE Robotics and Automation Letters, 4(2):610–617, 2019. doi:10.1109/LRA.2019.2891991.
PRD96: Elena Pashenkova, Irina Rish, and Rina Dechter. Value iteration and policy iteration algorithms for markov decision problem. In AAAI’96: Workshop on Structural Issues in Planning and Temporal Reasoning. Citeseer, 1996.
PVW11: James M Poterba, Steven F Venti, and David A Wise. Were they prepared for retirement? financial status at advanced ages in the hrs and ahead cohorts. In Investigations in the Economics of Aging, pages 21–69. University of Chicago Press, 2011.
Rai18: Maziar Raissi. Forward-backward stochastic neural networks: deep learning of high-dimensional partial differential equations. arXiv preprint arXiv:1804.07010, 2018.
Ric75: Scott F Richard. Optimal consumption, portfolio and life insurance rules for an uncertain lived individual in a continuous time model. Journal of Financial Economics, 2(2):187–203, 1975.
RMM18: Lev Rozonoer, Boris Mirkin, and Ilya Muchnik. Braverman readings in machine learning. In Key Ideas from Inception to Current State: International Conference Commemorating the 40th Anniversary of Emmanuil Braverman's Decease, Boston, MA Invited Talks. Cham: Springer International Publishing. Springer, 2018.
San21: Nimish Sanghi. Deep Reinforcement Learning with Python: With Pytorch, TensorFlow and OpenAI Gym. Apress L. P, Berkeley, CA, 2021. ISBN 1484268083.
SW16: Yang Shen and Jiaqin Wei. Optimal investment-consumption-insurance with random parameters. Scandinavian Actuarial Journal, 2016(1):37–62, 2016.
SBLL19: Joohyun Shin, Thomas A Badgwell, Kuang-Hung Liu, and Jay H Lee. Reinforcement learning–overview of recent progress and implications for process control. Computers & Chemical Engineering, 127:282–294, 2019.
SB18: Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
VOW12: Martijn Van Otterlo and Marco Wiering. Reinforcement learning and markov decision processes. In Reinforcement learning, pages 3–42. Springer, 2012.
WZL09: Fei-Yue Wang, Huaguang Zhang, and Derong Liu. Adaptive dynamic programming: an introduction. IEEE computational intelligence magazine, 4(2):39–47, 2009.
WZZ19: Haoran Wang, Thaleia Zariphopoulou, and Xun Yu Zhou. Exploration versus exploitation in reinforcement learning: a stochastic control approach. Available at SSRN 3316387, 2019.
WCJW20: Jiaqin Wei, Xiang Cheng, Zhuo Jin, and Hao Wang. Optimal consumption–investment and life-insurance purchase strategy for couples with correlated lifetimes. Insurance: Mathematics and Economics, 91:244–256, 2020.
WHJ17: E Weinan, Jiequn Han, and Arnulf Jentzen. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in Mathematics and Statistics, 5(4):349–380, 2017.
Yaa65: Menahem E Yaari. Uncertain lifetime, life insurance, and the theory of the consumer. The Review of Economic Studies, 32(2):137–150, 1965.
YLL+19: Niko Yasui, Sungsu Lim, Cam Linke, Adam White, and Martha White. An empirical and conceptual categorization of value-based exploration methods. ICML Exploration in Reinforcement Learning Workshop, 2019.
Ye06: Jinchun Ye. Optimal life insurance purchase, consumption and portfolio under an uncertain life. University of Illinois at Chicago, 2006.

Future Financial Planning Tools for Consumers

Appendix¶

pseudocode algorithms¶