1. Introduction
Recently, artificial intelligence has reached the level of top information technologies that will have significant influence over many aspects of our future lifestyles. In particular, in the fields of machine learning technologies for classification and decisionmaking, there have been a lot of research efforts for solving estimation and control problems that appear in the various kinds of portfolio management problems via datadriven approaches. Note that these modern datadriven arpproaches, which try to find solutions to the problems based on relevant empirical data rather than mathematical analyses, are useful particularly in practical application domains.
In this paper, we consider the problem of applying kernel methods together with some other optimization methods for portfolio management. As wellknown, kernel methods have attracted great interests in the areas of pattern classification, function approximation, and anomaly detection [
1–
9], and recently Gaussian process played an important role in the field of machine learning as a tool for probabilistic kernel methods [
10]. We apply a simplified version of the sparse Gaussian process (GP) classification method, which is a direct result of two recent remarkable Gaussian process papers [
25,
26], for performing risk sensitivity classification in dealing with financial portfolio management. Since portfolio management problems are optimal decisionmaking problems that rely on actual empirical data, theoretical and practical solutions can be formulated via many of recent machine learning and control advancements: the traditional meanvariance efficient portfolio problem [
11]; index tracking portfolio formulation [
12–
15]; riskadjusted expected return maximizing strategy [
16–
18]; trend following strategy [
19–
23]; longshort trading strategy (including the pairs trading strategy) [
20,
24], etc. In this paper, we also raise two important portfolio management issues in which the GP application results can be useful.
This paper is organized as follows: In Section 2, we briefly describe relevant GP preliminaries. Applying a simplified version of the sparse Gaussian process (GP) classification method for performing risk sensitivity classification as well as their possible applications to portfolio management issues are presented in Section 3. Finally, in Section 4, we present our concluding remarks.
2. Preliminaries
Probabilistic kernel methods, which include Gaussian processes, have recently attracted great interests in the areas of pattern classification, function approximation, and anomaly detection. In this section, we briefly describle some preliminaries on Gaussian processes, which plays an important role in our portfolio management applications. For more details on the Gaussian processes, please refer to, e.g., [
10]. Gaussian process, {
f(
x)}, is an indexed family of random variables with index
x ∈
R^{d} such that for any finite indices,
x_{1}, · · ·,
x_{N},
f(
x_{1}), · · ·,
f(
x_{N}) are jointly Gaussian. Gaussian processes can be characterized by their mean functions and covariance (or kernel) functions, which are defined as follows, respectively:
Gaussian processes with mean function m(x) and covariance function k(x, x′) are often denoted by
One can see that with the socalled kernel trick, Bayesian linear models defined on the feature space can be viewed as Gaussian processes. More specifically, let’s suppose that f(x) is described by φ(x)^{T}w, where the prior distribution of the random vector w is N(0, ∑_{w}). Here, φ(x) is the feature vector, which is the result of mapping the input vector x into the (possiably highdimensional) feature space F. Note that in this situation, the expectation of f(x) is
and the covariance between f(x) and f(x′) satisfies
Thus defining the kernel function, k, by the kernel trick
enables us to compute the covariance, Cov[f(x), f(x′)], directly on the input space using the kernel, i.e., Cov[f(x), f(x′)] = k(x, x′). Therefore, k can be conveniently interpreted as both a kernel function (in the sense of kernel methods) and a covariance function. In general, the mean function, m(x), is assumed to be the zero function, and the assumption is thought to be without loss of generality. The task of obtaining the predictive distribution for any test point in the Gaussian process framework can be summarized as follows: Consider the training data set
$D={\{({x}_{n},{y}_{n})\}}_{n=1}^{N}$, where
$X={\{{x}_{n}\in {R}^{d}\}}_{n=1}^{N}$ is the set of the input values of the training data, and
$y={\{{y}_{n}\in {R}^{d}\}}_{n=1}^{N}$ is the set of the corresponding target values. Since the Gaussian process f(x) has the zero mean function and the kernel function, k(x, x′), the joint distribution of the random vector f = [f(x_{1}), · · ·, f(x_{N})]^{T} can be written as
Here by N(fm, V ), we mean the multivariate Gaussian distribution with mean vector m and covariance matrix V. Also, K(X) is an N × N matrix, whose (i, j)th element is k(x_{i}, x_{j}). For notational convenience, we often use K instead of K(X). In this paper, we consider the following squared exponential (SE) kernel, which is one of the most widely used choices in the kernel method community:
Here σ_{f} and l, which charaterize the shape of the kernel function, are called hyperparameters, and the vector consisting of hyperparameters is denoted as θ. In the Gaussian process regression, the disturbance which occurs in the process of the data observation is taken into account too, and it is characterized by means of a Gasussian noise model:
Hence, by combining p(fX) and p(yf), one can obtain the following marginal likelihood for the regression problem:
$p(y\mid X)=N(y\mid 0,K+{\sigma}_{n}^{2}I)$. Also, the log marginal likelihood for the whole training data D can be written as follows:
Finding the optimal hyperparameter vector can be achieved by maximizing the above log marginal likelihood function with respect to
θ. Also, the predictive distribution of the ouput
y_{*} for the test input point
x_{*} can be obtained by applying the conditional density formula for the multivariate Gaussian distributions [
10], i.e.,
Here, k_{*} and k_{**} are used for notational convenience, and they mean the following, respectively:
Finally, note that the point esimate
${k}_{*}^{T}{(K+{\sigma}_{n}^{2}I)}^{1}y$, which is the mean of y_{*}, can be further written as
where
$\alpha ={[{\alpha}_{1},\cdots ,{\alpha}_{N}]}^{T}={(K+{\sigma}_{n}^{2}I)}^{1}y$, and that (
14) can be viewed as a result of the representer theorem [
1–
3] of the kernel methods.
3. Applications
In this section, we present some observations for portfolio management applications of Gaussian processes, natural evolution strategy, and HamiltonJacobiBellman (HJB) equations. Our observations consist of two parts. In the first part, we consider the applicability of a simplified sparse Gaussian process classification (GPC) method, which is a direct result of two recent remarkable Gaussian process papers [
25,
26], for the task of classifying individuals’ sensitivity with respect to financial risk. Derivation of the simplified sparse GPC method can be summarized as follows: We consider the input data set
$X={\{{x}_{n}\}}_{n=1}^{N}$ together with the target data set
$Y={\{{y}_{n}\}}_{n=1}^{N}$, where
x_{n} ∈
R^{d} and
y_{n} ∈ {1, · · ·,
C}. Note that the
nth observation,
y_{n}, is a categorical variable that can be transformed into the onehotencoding format. Also, note that for the observation
y_{n}, one can use a multinomial distribution whose probabilities are defined by softmax having intensities
f_{n} = (
f_{n}_{1}, · · ·,
f_{nC}). The
kth intensity of
f_{n},
f_{nk}, is the ouput of the Gaussian process
F_{k}(
x_{n}). To achieve a sparse representation, the socalled inducing points,
Z ∈
R^{M×d}, are introduced. Note that in classification problems, the marginal loglikelihood is not tractable, contrary to the case of Gaussian process regression of Section 2. Hence, we need to rely on a variational approximation. With
q(
f,
U) =
q(
U)
p(
f
X,
U) and Jensen’s inequality [
10], we have
where
KL stands for KullbackLeibler divergence [
10]. Note that with
q(
U) =
N(
U
m,
S), we have
where
${a}_{n}={K}_{MM}^{1}{K}_{Mn},{b}_{n}={K}_{nn}{K}_{nM}{K}_{MM}^{1}{K}_{Mn}$. In this paper, we consider the class of diagonal covariance matrices for
S and called the resultant GPC a simplified sparse Gaussian process classification. Since the integration of log
p(
y_{n}
f_{n}) in the right hand side of (
15) is not tractable, we rely on the samplingbased approximation of [
27,
28]. In this paper, we propose to use the simplified sparse GPC as a framework of classifying users’ sensitivity with respect to financial risk. The categorical target variable in the framework describes the sensitivity level (e.g., very sensitive to risk, moderately sensitive to risk, only a little sensitive to risk, etc). The questions for providing inputs along the line may include the following kinds [
32–
35]:
What is your current age and planned age of retirement?
Your annual beforetax income is $ _______.

Your future income until your retirement will be ____.
Increasing
The same
Decreasing
Unpredictable
Total value of your cash and other liquid securities is $ _______.
Your investment horizon is _______ years.

Your primary investment objective is _____.
Investing for comfortable retirement
General investing for wealth accumulation
Securing an emergency fund
Saving for a specific purpose (for example college education for kids)

Your tolerance for risk taking when investing is _____.
Defensive  You accept lower returns to protect your initial investment.
Moderate  You want balance between the stability and longterm return.
Assertive  You are prepared to accept higher volatility to accumulate assets over long term.

If your entire investment portfolio lost 10% of its value in a month during a market decline, what would you do?
Liquidate all the investment
Sell half of the portfolio
Keep the portfolio
Invest more

What return do you expect to achieve from your investments?
Return without losing original money
3–6% per annum
7–10% per annum
11–15% per annum
Over 15% per annum
In order to evaluate the validity and strengths of the simlified sparse GPC method, we performed experiments for the simulated data (see
Figs. 1–
9). From the classification results, one can see that the simplified sparse GPC work well with relatively small number of inducing points. Also,
Figs. 2–
5 show that the GPC can achieve sparsity somewhat more efficiently compared to the standard SVM approach (for which we used
fitcsvm of MATLAB).
In the second part of our observations, we present two portfolio management issues that can utilize the simplified sparse GPC method. The issues covered along the line are the trendfollowing problem [
19,
20,
29] and the portfolio optimization problem [
30]. In the first issue, we consider an exponential natural evolution strategy (NES) [
31] based solution to find an efficent trend following strategy (For details, please refer to [
20,
29]), and propose the strategy of using the transaction cost,
K, as a tuning parameter that can vary according to the GPC results (
Fig. 10). In the second issue, we consider a HJB equation based portfolio optimization problem [
30], where a user has the choice of investing in the stock market or saving in a bank account, the stock market is modelled as geometric Brownian motion, and dynamics for the factor and the volatility are also modelled with appropriate stochastic differencial equations (For details, please refer to [
30]), and propose the strategy of using the coefficient of risk aversion,
γ, as a tuning parameter that can vary according to the GPC results (
Fig. 11). We expect that in our future works, these two issues will ultimately lead to a set of fundamental building blocks for efficient personal financial planning packages.