# Some Observations for Portfolio Management Applications of Modern Machine Learning Methods

## Article information

Int. J. Fuzzy Log. Intell. Syst Vol. 16, No. 1, 44-51, March, 2016
Publication date ( electronic ) : 2016 March 31
doi : https://doi.org/10.5391/IJFIS.2016.16.1.44
1Department of Control & Instrumentation Engineering, Korea University, Sejong City 339-700, Korea
2Department of Mathematics, Korea University, Seoul 136-713, Korea
Correspondence to: Jooyoung Park (parkj@korea.ac.kr)
received : 2016 March 01, rev-recd : 2016 March 15, accepted : 2016 March 24.

## Abstract

Recently, artificial intelligence has reached the level of top information technologies that will have significant influence over many aspects of our future lifestyles. In particular, in the fields of machine learning technologies for classification and decision-making, there have been a lot of research efforts for solving estimation and control problems that appear in the various kinds of portfolio management problems via data-driven approaches. Note that these modern data-driven approaches, which try to find solutions to the problems based on relevant empirical data rather than mathematical analyses, are useful particularly in practical application domains. In this paper, we consider some applications of modern data-driven machine learning methods for portfolio management problems. More precisely, we apply a simplified version of the sparse Gaussian process (GP) classification method for classifying users’ sensitivity with respect to financial risk, and then present two portfolio management issues in which the GP application results can be useful. Experimental results show that the GP applications work well in handling simulated data sets.

## 1. Introduction

Recently, artificial intelligence has reached the level of top information technologies that will have significant influence over many aspects of our future lifestyles. In particular, in the fields of machine learning technologies for classification and decision-making, there have been a lot of research efforts for solving estimation and control problems that appear in the various kinds of portfolio management problems via data-driven approaches. Note that these modern data-driven arpproaches, which try to find solutions to the problems based on relevant empirical data rather than mathematical analyses, are useful particularly in practical application domains.

In this paper, we consider the problem of applying kernel methods together with some other optimization methods for portfolio management. As well-known, kernel methods have attracted great interests in the areas of pattern classification, function approximation, and anomaly detection [19], and recently Gaussian process played an important role in the field of machine learning as a tool for probabilistic kernel methods [10]. We apply a simplified version of the sparse Gaussian process (GP) classification method, which is a direct result of two recent remarkable Gaussian process papers [25, 26], for performing risk sensitivity classification in dealing with financial portfolio management. Since portfolio management problems are optimal decision-making problems that rely on actual empirical data, theoretical and practical solutions can be formulated via many of recent machine learning and control advancements: the traditional mean-variance efficient portfolio problem [11]; index tracking portfolio formulation [1215]; risk-adjusted expected return maximizing strategy [1618]; trend following strategy [1923]; long-short trading strategy (including the pairs trading strategy) [20, 24], etc. In this paper, we also raise two important portfolio management issues in which the GP application results can be useful.

This paper is organized as follows: In Section 2, we briefly describe relevant GP preliminaries. Applying a simplified version of the sparse Gaussian process (GP) classification method for performing risk sensitivity classification as well as their possible applications to portfolio management issues are presented in Section 3. Finally, in Section 4, we present our concluding remarks.

## 2. Preliminaries

Probabilistic kernel methods, which include Gaussian processes, have recently attracted great interests in the areas of pattern classification, function approximation, and anomaly detection. In this section, we briefly describle some preliminaries on Gaussian processes, which plays an important role in our portfolio management applications. For more details on the Gaussian processes, please refer to, e.g., [10]. Gaussian process, {f(x)}, is an indexed family of random variables with index xRd such that for any finite indices, x1, · · ·, xN, f(x1), · · ·, f(xN) are jointly Gaussian. Gaussian processes can be characterized by their mean functions and covariance (or kernel) functions, which are defined as follows, respectively:

(1) m(x)=E[f(x)],
(2) k(x,x)=E[(f(x)-m(x))(f(x)-m(x))].

Gaussian processes with mean function m(x) and covariance function k(x, x′) are often denoted by

(3) f(x)~GP(m(x),k(x,x)).

One can see that with the so-called kernel trick, Bayesian linear models defined on the feature space can be viewed as Gaussian processes. More specifically, let’s suppose that f(x) is described by φ(x)Tw, where the prior distribution of the random vector w is N(0,w). Here, φ(x) is the feature vector, which is the result of mapping the input vector x into the (possiably high-dimensional) feature space F. Note that in this situation, the expectation of f(x) is

(4) E[f(x)]=E[φ(x)Tw]=φ(x)TE[w]=0,

and the covariance between f(x) and f(x′) satisfies

(5) E[f(x)f(x)]=E[φ(x)TwwTφ(x)]=φ(x)TE[wwT]φ(x)=φ(x)TΣwφ(x).

Thus defining the kernel function, k, by the kernel trick

(6) k(x,x)=φ(x)Σwφ(x)

enables us to compute the covariance, Cov[f(x), f(x′)], directly on the input space using the kernel, i.e., Cov[f(x), f(x′)] = k(x, x′). Therefore, k can be conveniently interpreted as both a kernel function (in the sense of kernel methods) and a covariance function. In general, the mean function, m(x), is assumed to be the zero function, and the assumption is thought to be without loss of generality. The task of obtaining the predictive distribution for any test point in the Gaussian process framework can be summarized as follows: Consider the training data set D={(xn,yn)}n=1N, where X={xnRd}n=1N is the set of the input values of the training data, and y={ynRd}n=1N is the set of the corresponding target values. Since the Gaussian process f(x) has the zero mean function and the kernel function, k(x, x′), the joint distribution of the random vector f = [f(x1), · · ·, f(xN)]T can be written as

(7) p(fX)=N(f0,K(X)).

Here by N(f|m, V ), we mean the multi-variate Gaussian distribution with mean vector m and covariance matrix V. Also, K(X) is an N × N matrix, whose (i, j)-th element is k(xi, xj). For notational convenience, we often use K instead of K(X). In this paper, we consider the following squared exponential (SE) kernel, which is one of the most widely used choices in the kernel method community:

(8) k(xi,xj)=σf2exp[-12l2(xi-xj)T(xi-xj)].

Here σf and l, which charaterize the shape of the kernel function, are called hyper-parameters, and the vector consisting of hyper-parameters is denoted as θ. In the Gaussian process regression, the disturbance which occurs in the process of the data observation is taken into account too, and it is characterized by means of a Gasussian noise model:

(9) p(yf)=N(yf,σn2I).

Hence, by combining p(f|X) and p(y|f), one can obtain the following marginal likelihood for the regression problem: p(yX)=N(y0,K+σn2I). Also, the log marginal likelihood for the whole training data D can be written as follows:

(10) log p(yX)=-12yT(K+σn2I)-1y-12logK+σn2I-N2log(2π).

Finding the optimal hyper-parameter vector can be achieved by maximizing the above log marginal likelihood function with respect to θ. Also, the predictive distribution of the ouput y* for the test input point x* can be obtained by applying the conditional density formula for the multi-variate Gaussian distributions [10], i.e.,

(11) p(y*x*,D)=N(y*k*T(K+σn2I)-1y,k**-k*T(K+σn2I)-1k*+σn2).

Here, k* and k** are used for notational convenience, and they mean the following, respectively:

(12) k*=[k(x1,x*),,k(xn,x*)]T,
(13) k**=k(x*,x*).

Finally, note that the point esimate k*T(K+σn2I)-1y, which is the mean of y*, can be further written as

(14) y^*=i=1Nαik(xi,x*),

where α=[α1,,αN]T=(K+σn2I)-1y, and that (14) can be viewed as a result of the representer theorem [13] of the kernel methods.

## 3. Applications

In this section, we present some observations for portfolio management applications of Gaussian processes, natural evolution strategy, and Hamilton-Jacobi-Bellman (HJB) equations. Our observations consist of two parts. In the first part, we consider the applicability of a simplified sparse Gaussian process classification (GPC) method, which is a direct result of two recent remarkable Gaussian process papers [25, 26], for the task of classifying individuals’ sensitivity with respect to financial risk. Derivation of the simplified sparse GPC method can be summarized as follows: We consider the input data set X={xn}n=1N together with the target data set Y={yn}n=1N, where xnRd and yn ∈ {1, · · ·, C}. Note that the n-th observation, yn, is a categorical variable that can be transformed into the one-hot-encoding format. Also, note that for the observation yn, one can use a multinomial distribution whose probabilities are defined by softmax having intensities fn = (fn1, · · ·, fnC). The k-th intensity of fn, fnk, is the ouput of the Gaussian process Fk(xn). To achieve a sparse representation, the so-called inducing points, ZRM×d, are introduced. Note that in classification problems, the marginal log-likelihood is not tractable, contrary to the case of Gaussian process regression of Section 2. Hence, we need to rely on a variational approximation. With q(f, U) = q(U)p(f|X, U) and Jensen’s inequality [10], we have

(15) log p(Y)=logp(U)p(fX,U)p(Yf)dfdUq(U)p(fX,U)logp(U)p(fX,U)p(Yf)q(U)p(fX,U)dfdU=-KL[q(U)p(U)]+n=1Nq(U)p(fnxn,U)log p(ynfn)dfndU,

where KL stands for Kullback-Leibler divergence [10]. Note that with q(U) = N(U|m, S), we have

(16) p(fnxn,U)=k=1CN(fnkanTuk,bn),

where an=KMM-1KMn,bn=Knn-KnMKMM-1KMn. In this paper, we consider the class of diagonal covariance matrices for S and called the resultant GPC a simplified sparse Gaussian process classification. Since the integration of log p(yn|fn) in the right hand side of (15) is not tractable, we rely on the sampling-based approximation of [27, 28]. In this paper, we propose to use the simplified sparse GPC as a framework of classifying users’ sensitivity with respect to financial risk. The categorical target variable in the framework describes the sensitivity level (e.g., very sensitive to risk, moderately sensitive to risk, only a little sensitive to risk, etc). The questions for providing inputs along the line may include the following kinds [3235]:

1. What is your current age and planned age of retirement?

2. Your annual before-tax income is $_______. 3. Your future income until your retirement will be ____. 1. Increasing 2. The same 3. Decreasing 4. Unpredictable 4. Total value of your cash and other liquid securities is$ _______.

5. Your investment horizon is _______ years.

6. Your primary investment objective is _____.

1. Investing for comfortable retirement

2. General investing for wealth accumulation

3. Securing an emergency fund

4. Saving for a specific purpose (for example college education for kids)

7. Your tolerance for risk taking when investing is _____.

1. Defensive - You accept lower returns to protect your initial investment.

2. Moderate - You want balance between the stability and long-term return.

3. Assertive - You are prepared to accept higher volatility to accumulate assets over long term.

8. If your entire investment portfolio lost 10% of its value in a month during a market decline, what would you do?

1. Liquidate all the investment

2. Sell half of the portfolio

3. Keep the portfolio

4. Invest more

9. What return do you expect to achieve from your investments?

1. Return without losing original money

2. 3–6% per annum

3. 7–10% per annum

4. 11–15% per annum

5. Over 15% per annum

In order to evaluate the validity and strengths of the simlified sparse GPC method, we performed experiments for the simulated data (see Figs. 19). From the classification results, one can see that the simplified sparse GPC work well with relatively small number of inducing points. Also, Figs. 25 show that the GPC can achieve sparsity somewhat more efficiently compared to the standard SVM approach (for which we used fitcsvm of MATLAB).

Training data considered for binary classification.

Classification results of the simplified sparse GPC (with 20 inducing points).

Classification results of the simplified sparse GPC (with 10 inducing points).

Classification results of the simplified sparse GPC (with 5 inducing points).

Classification results of SVM(with 392 support vectors).

Training data considered for multi-class classification.

Classification results of the simplified sparse GPC (with 20 inducing points).

Classification results of the simplified sparse GPC (with 10 inducing points).

Classification results of the simplified sparse GPC (with 5 inducing points).

In the second part of our observations, we present two portfolio management issues that can utilize the simplified sparse GPC method. The issues covered along the line are the trend-following problem [19, 20, 29] and the portfolio optimization problem [30]. In the first issue, we consider an exponential natural evolution strategy (NES) [31] based solution to find an efficent trend following strategy (For details, please refer to [20, 29]), and propose the strategy of using the transaction cost, K, as a tuning parameter that can vary according to the GPC results (Fig. 10). In the second issue, we consider a HJB equation based portfolio optimization problem [30], where a user has the choice of investing in the stock market or saving in a bank account, the stock market is modelled as geometric Brownian motion, and dynamics for the factor and the volatility are also modelled with appropriate stochastic differencial equations (For details, please refer to [30]), and propose the strategy of using the coefficient of risk aversion, γ, as a tuning parameter that can vary according to the GPC results (Fig. 11). We expect that in our future works, these two issues will ultimately lead to a set of fundamental building blocks for efficient personal financial planning packages.

Conceptual diagram of the first issue.

Conceptual diagram of the second issue.

## 4. Conclusion

Modern data-driven machine learning approaches, which try to find solutions to the problems based on relevant empirical data rather than mathematical analyses, are useful particularly in practical application domains. In this paper, we apply a simplified version of the sparse Gaussian process (GP) classification method to two portfolio management issues (NES-based trend-following, HJB-based porfolio optimization). Experimental results showed the applicability of the simplified sparse GPC in simulated data sets. For future works, we are planning to consider more extensive simulation studies, which will show the strengths and weaknesses of the proposed idea, and applications of our methods to an integral package that can deal with personal financial planning problems.

## Acknowledgement

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2011-0021188).

## References

1. Shawe-Taylor, J., and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press; 2004. 10.1017/CBO9780511809682.
2. Cristianini, N., and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press; 2000. 10.1017/CBO9780511801389.
3. Schőlkopf, B., and A.J. Smola. Learning with Kernels. MIT Press; 2002.
4. Muller, K-R, S. Mika, G. Ratsch, K. Tsuda, and B. Schlkopf. An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks. 12(2):181–201. 2001. 10.1109/72.914517.
5. Kwok, J.T.. The evidence framework applied to support vector machines. IEEE Transactions on Neural Networks. 11(5):1162–1173. 2000. 10.1109/72.870047.
6. Schőlkopf, B., J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation. 13:1443–1471. 2001. 10.1162/089976601750264965.
7. Schőlkopf, B., A. Smola, R. Williamson, and P.L. Bartlett. New support vector algorithms. Neural Computation. 12(5):1207–1245. 2000. 10.1162/089976600300015565.
8. Park, J., J. Kim, H. Lee, and D. Park. One-class support vector learning and linear matrix inequalities. International Journal of Fuzzy Logic and Intelligent Systems. 3(1):100–104. June. 2003. 10.5391/IJFIS.2003.3.1.100.
9. Park, J., and D. Kang. A Modified approach to density-induced support vector data description. International Journal of Fuzzy Logic and Intelligent Systems. 7(1):1–6. March. 2007. 10.5391/IJFIS.2007.7.1.001.
10. Rasmussen, C.E., and C.K.I. Williams. Gaussian Processes for Machine Learning. The MIT Press; 2006.
11. Markowitz, H.M.. Portfolio Selection: Efficient Diversification of Investments. Cowles Foundation Monograph, No. 16. John Wiley and Sons; 1959.
12. Primbs, J.A., and C.H. Sung. A stochastic receding horizon control approach to constrained index tracking. Asia-Pacific Financial Markets. 15:3–24. 2008. 10.1007/s10690-008-9073-1.
13. Beasley, J.E., N. Meade, and T-J Chang. An evolutionary heuristic for the index tracking problem. European Journal of Operational Research. 148(3):621–643. 2003. 10.1016/S0377-2217(02)00425-3.
14. Jeurissen, R., and J. van den Berg. Index tracking using a hybrid genetic algorithm. In : Proceedings of 2005 ICSC Congress on Computational Intelligence Methods and Applications; 2005;
15. Park, J., D. Yang, and K. Park. Approximate dynamic programming-based dynamic portfolio optimization for constrained index tracking. International Journal of Fuzzy Logic and Intelligent Systems. 13(1):19–28. 2013. 10.5391/IJFIS.2013.13.1.19.
16. Boyd, S., M. Mueller, B. O’Donoghue, and Y. Wang. Performance bounds and suboptimal policies for multi-period investment. Foundations and Trends in Optimization. 1(1):1–69. 2014. 10.1561/2400000001.
17. Primbs, J.. Portfolio optimization applications of stochastic receding horizon control. Proceeding of the 2007 American Control Conference. :1811–1816. 2007. 10.1109/ACC.2007.4282251.
18. Park, J., J. Jeong, and K. Park. An investigation on dynamic portfolio selection problems utilizing stochastic receding horizon approach. Journal of Korean Institute of Intelligent Systems. 22(3):386–393. 2012. 10.5391/JKIIS.2012.22.3.386.
19. Dai, M., Q. Zhang, and Q.J. Zhu. Trend following trading under a regime switching model. SIAM Journal on Financial Mathematics. 1:780–810. 2010. 10.1137/090770552.
20. Park, J., D. Yang, and K. Park. Investigations on dynamic trading strategy utilizing stochastic optimal control and machine learning. Journal of Korean Institute of Intelligent Systems. 23(4):348–353. 2013. 10.5391/JKIIS.2013.23.4.348.
21. Dai, M., Q. Zhang, and Q.J. Zhu. Optimal trend following trading rules. July. 19. 2011. Available at SSRN: http://ssrn.com/abstract=1762118 or http://dx.doi.org/10.2139/ssrn.1762118.
22. Kong, H.T., Q. Zhang, and G.G. Yin. A trend-following strategy: Conditions for optimality. Automatica. 47(4):661–667. 2011. 10.1016/j.automatica.2011.01.039.
23. Yu, J., and Q. Zhang. Optimal trend-following trading rules under a three-state regime switching model. Mathematical Control and Related Fields. 2(1):81–100. 2012. 10.3934/mcrf.2012.2.81.
24. Mudchanatongsuk, S., J.A. Primbs, and W. Wong. Optimal pairs trading: A stochastic control approach. In : Proceedings of 2008 American Control Conference; 2008;
25. Hensman, J., AGdeG Matthews, and Z. Ghahramani. Scalable variational Gaussian process classification. arXiv preprint arXiv:1411.2005. 2014.
26. Gal, Y., Y. Chen, and Z. Ghahramani. Latent Gaussian processes for distribution estimation of multivariate categorical data. arXiv preprint arXiv:1503.02182. 2015.
27. Kingma, D.P., and M. Welling. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114. 2013.
28. Rezende, D.J., S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082. 2014.
29. Park, J., J. Lim, W. Lee, S. Ji, K. Sung, and K. Park. Modern probabilistic machine learning and control methods for portfolio optimization. International Journal of Fuzzy Logic and Intelligent Systems. 14(2):73–83. 2014. 10.5391/IJFIS.2014.14.2.73.
30. Peyrl, H., F. Herzog, and H.P. Geering. Numerical solution of the Hamilton-Jacobi-Bellman equation for stochastic optmal control problems. In : Proceedings of 2005 WSEAS Int Conf on dynamical systems and control. pp. 489–497. 2005.
31. Glasmachers, T., T. Schaul, Y. Sun, D. Wierstra, and J. Schmidhuber. Exponential natural evolution strategies. In : Genetic and Evolutionary Computation Conference (GECCO); 2010;
32. Investment Summary. Retrieved March 18, 2016, from http://fount.co/.
33. Betterment Investing Made Better. Retrieved March 18, 2016, from https://www.betterment.com/.
34. Investment Management, Online Financial Advisor Wealthfront. Retrieved March 18, 2016, from https://www.wealthfront.com/.

## Biography

Jooyoung Park received his BS in Electrical Engineering from Seoul National University in 1983 and his PhD in Electrical and Computer Engineering from the University of Texas at Austin in 1992. He joined Korea University in 1993, where he is currently a professor at the Department of Control and Instrumentation Engineering. His recent research interests are in the areas of machine learning, control theory, and financial engineering.

E-mail: parkj@korea.ac.kr

Seongman Heo received his BS in Control and Instrumentation Engineering from Korea University in 2015. Currently, he is a graduate student (master course) at Korea University majoring in Control and Instrumentation Engineering. His research areas include control theory, machine learning and deep learning.

E-mail: hsm0099@korea.ac.kr

Taehwan Kim received his BS in Control and Instrumentation Engineering from Korea University in 2015. Currently, he is a graduate student (master course) at Korea University majoring in Control and Instrumentation Engineering. His research areas include control theory, deep learning and reinforcement learning.

E-mail: kteaw0110@korea.ac.kr

Jeongho Park received his BS in Control and Instrumentation Engineering from Korea University in 2016. Currently, he is a graduate student (master course) at Korea University majoring in Control and Instrumentation Engineering. His research areas include control theory, machine learning and pattern recognition.

E-mail: seanpark0107@korea.ac.kr

Jaein Kim received his Bs in Mathematics & Information from Gachon University in 2012. Currently, he is a graduate student at Korea University majoring in Mathemaitics. His research areas include applied mathematics, machine learning and pattern recognition.

E-mail: kkjin85@korea.ac.kr

Kyungwook Park received his BBA and MBA from Seoul National University and his PhD in Finance from the University of Texas at Austin in 1993. He joined Korea University in 1994, where he is currently a professor at the School of Business Administration. His recent research interests are in the areas of derivatives and hedging, control-theory-based assets and derivatives management, and cost of capital estimation with derivative pricing.

E-mail: pkw@korea.ac.kr

## Article information Continued

### Figure 1

Training data considered for binary classification.

### Figure 2

Classification results of the simplified sparse GPC (with 20 inducing points).

### Figure 3

Classification results of the simplified sparse GPC (with 10 inducing points).

### Figure 4

Classification results of the simplified sparse GPC (with 5 inducing points).

### Figure 5

Classification results of SVM(with 392 support vectors).

### Figure 6

Training data considered for multi-class classification.

### Figure 7

Classification results of the simplified sparse GPC (with 20 inducing points).

### Figure 8

Classification results of the simplified sparse GPC (with 10 inducing points).

### Figure 9

Classification results of the simplified sparse GPC (with 5 inducing points).

### Figure 10

Conceptual diagram of the first issue.

### Figure 11

Conceptual diagram of the second issue.