A Malliavin-based Monte-Carlo Approach for Numerical Solution of Stochastic Control Problems: Experiences from Merton's Problem
The problem of choosing optimal investment and consumption strategies has been widely studied. In continuous time theory the pioneering work by Merton (1969) is a standard reference. In his work, Merton studied a continuous time economy with constant investment opportunities. Since then Merton's problem has been extended in many ways to capture empirically observed investment and consumption behavior. As more realism is incorporated into a model, the problem of optimal investment and consumption becomes harder to solve. Only rarely can analytical solutions be found, and only for problems possessing nice characteristics. To solve problems lacking analytical solutions we must apply numerical methods. Many realistic problems, however, are difficult to solve even numerically, due to their dimensionality. The purpose of this paper is to present a numerical procedure for solving high-dimensional stochastic control problems arising in the study of optimal portfolio choice. For expositional reasons we develop the algorithm in one dimension, but the mathematical results needed can be generalized to a multi-dimensional setting. The starting point of the algorithm is an initial guess about the agent's investment and consumption strategies at all times and wealth levels. Given this guess it is possible to simulate the wealth process until the investment horizon of the agent. We exploit the dynamic programming principle to break the problem into a series of smaller one-period problems, which can be solved recursively backwards. To be specific we determine first-order conditions relating the optimal controls to the value function in the next period. Starting from the final date we now numerically solve the first-order conditions for all simulated paths iteratively backwards. The investment and consumption strategies resulting from this procedure are used to update the simulated wealth paths, and the procedure can be repeated until it converges. The numerical properties of the algorithm are analyzed by testing it on Merton's optimal portfolio choice problem. The reason for this is that the solution to Merton's problem is explicitly known and can therefore serve as a benchmark for the algorithm. Our results indicate that it is possible to obtain some sort of convergence for both the initial controls and the distribution of their future values. Bearing in mind that we intend to apply the algorithm to a multi-dimensional setting, we also consider the possible complications that might arise. However, the state variables added will in most cases be exogenous non-controllable processes, which does not complicate the optimization routine in the proposed algorithm. Problems with computer storage could arise, but they should be solvable with clever computer programming