ASYMPTOTIC NORMALITY OF CONDITIONAL DENSITY AND CONDITIONAL MODE IN THE FUNCTIONAL SINGLE INDEX MODEL

The main objective of this paper is to investigate the nonparametric estimation of the conditional density of a scalar response variable Y, given the explanatory variable X taking value in a Hilbert space when the sample of observations is considered as an independent random variables with identical distribution (i.i.d) and are linked with a single functional index structure. First of all, a kernel type estimator for the conditional density function (cond-df) is introduced. Afterwards, the asymptotic properties are stated for a conditional density estimator when the observations are linked with a singleindex structure from which one derives a central limit theorem (CLT) of the conditional density estimator to show the asymptotic normality of the kernel estimate of this model. As an application the conditional mode in functional single-index model is presented, and the asymptotic (1 – ) confidence interval of the conditional mode function is given for 0 <  < 1. A simulation study is also presented to illustrate the validity and finite sample performance of the considered estimator. Finally, the estimation of the functional index via the pseudo-maximum likelihood method is discussed.


Introduction
The statistical analysis of functional variables has grown considerably over the last two decades. Infact, an important innovation in measuring devices has emerged, permitting to monitor several objects in a continuous way, such as stock market index, pollution, climatology, and satellite images, etc.
Thus a new branch of statistics, called functional statistics, has been developed to treat observations as functional random elements.
The first results on the conditional models were obtained by (Ferraty, Laksaci, and Vieu, 2006). They established the almost complete convergence rate of the kernel estimators for the conditional distribution function, the conditional density and its derivatives, the conditional mode and the conditional quantiles.
As a conditional nonparametric model, regression was one of the first predictive analysis tools. Quantile regression is the common way to describe the dependence structure between a response variable Y and some covariate X. Unlike the regression function (which is defined as the conditional mean) that relies only on the central tendency of the data, the conditional mode function allows the analyst to estimate the functional independence between variables for all portions of the conditional density of the response variable. However, compared with the standard approach based on functional conditional mean prediction that is sensitive to outliers, functional condition mode prediction could be seen as a reasonable alternative to the conditional mean because of its robustness, which allows to consider it as a useful alternative to the regression function.
The conditional model estimator has been widely used to estimate some characteristic features of the data set, such as the conditional mode, the conditional median, and the conditional quantiles. Many authors are interested in the estimation of the conditional mode of a scalar response given a functional covariate. Ferraty, Laksaci and Vieu (2006) introduced nonparametric the kernel-type estimators of some characteristics of the conditional cumulative distribution function and successive derivatives of the conditional density, and some asymptotic properties were established with particular attention to the conditional mode and conditional quantiles. An application to a chemometrical data set coming from the food industry was also presented. The uniform strong consistency with rates and the asymptotic normality for the kernel conditional mode estimator were obtained by Ezzahrioui and Ould-Saïd (2008) in the i.i.d case.
In the case of censoring, Ould-Saïd and Cai (2005) established the strong uniform convergence (with rate) of kernel conditional mode estimator for i.i.d random variables, while Ould-Saïd (2006) constructed a kernel estimator of the conditional quantile and establish its strong uniform convergence rate. Next, (Khardani, Lemdani, and Ould-Saïd, 2010) obtained strong consistency with the rate and asymptotic normality of the conditional mode (Khardani, Lemdani, and Ould-Saïd, 2011) established strong consistency with the rate of the conditional mode for the censored dependent case, while (Khardani, Lemdani, and Ould-Saïd, 2014) presented asymptotic normality.
For infinite dimensional purpose, the study used the terminology functional nonparametric, where the term functional refers to the infinite dimensionality of the data, and where nonparametric refers to the infinite dimensionality of the model. Such functional nonparametric statistics is also called doubly infinite dimensional (see , for more details). Conditional density function estimation is one of the crucial problems in non-parametric statistics, see (De Gooijer and Zerom, 2003).  established the asymptotic normality of the conditional density estimator and the conditional mode estimator for the α-mixing dependence functional time series data. (Ling, Li, and Yang, 2014) investigated the pointwise almost complete consistency and the uniform almost complete convergence of the kernel estimation with a rate for the conditional density in the setting of the α-mixing functional data. Attaoui (2014) investigated the nonparametric estimation of the conditional density of a scalar response variable given a random variable taking values in separable Hilbert space. The author established under general conditions the uniform almost complete convergence rates and the asymptotic normality of the conditional density kernel estimator, when the variables satisfy the strong mixing dependency, based on the single-index structure.
The single index models have been used and studied in both statistical and econometric literature, and are very popular in the economics community as they address two important concerns. The first is the reduction of dimension, since this type of model makes it possible to solve the problem of the scourge of the dimension. The second is related to the interpretability of the index (parameter) introduced in these models. The statistical study of these models, in the context of vectorial explanatory random variables, was initiated by Härdle and Marron (1985). Hristache, Juditsky, and Spokoiny (2001) provided both new theoretical and bibliographic elements. Several authors have worked on simple functional index models, e.g. (Attaoui and Boudiaf, 2014;Aït-Saidi, Ferraty, Kassa, and Vieu, 2008;Belabbaci, Rabhi, and Soltani, 2015;Ferraty, Peuch, and Vieu 2003)).
These models attracted the attention of many researchers, such as Aït-Saidi, Ferraty and Kassa (2005). Bouchentouf, Djebbouri, Rabhi, and Sabri (2014) established a nonparametric estimation of some characteristics of the conditional cumulative distribution function and the successive derivatives of the conditional density of a scalar response variable Y given a Hilbertian random variable X when the observations are linked with a single-index structure. Attaoui, Laksaci, and Ould-Saïd (2011) studied the functional single-index model via its conditional density kernel estimator, and established its pointwise and uniform almost complete convergence rates, and their results were extended to the dependent case by Attaoui (2014). Furthermore,  obtained the asymptotic normality of the conditional density estimator and the conditional mode estimator for the α-mixing dependence functional time series data.
The single-index models are becoming incrementally important and popular, and have been attracting considerable attention in the last few years because of their importance in several areas of science such as econometrics, biostatistics, medicine, etc. The single-index approach is extensively and mostly used in econometrics. Such kind of modelization is extensively studied in the multivariate case, for example in (Härdle, Hall, andIchumira, 1993, Hristache, Juditsky, andSpokoiny, 2001). Based on the regression function, (Delecroix, Härdle, and Hristache, 2003) studied the estimation of the single-index and established some asymptotic properties. The literature is strictly limited in the case where the explanatory variable is functional (that is a curve). The first asymptotic properties in the fixed functional single-model were obtained by , who established the almost complete convergence in the i.i.d case, of the link regression function of this model. Their results were extended to the dependent case by Aït-Saidi, Ferraty, and Kassa (2005). Aït-Saidi, Ferraty, Kassa, and Vieu (2008) studied the case where the functional single--index is unknown, and proposed an estimator of this parameter, based on the cross-validation procedure.
The main contribution of this work is to generalize the result of Ezzahrioui and Ould-Saïd (2008), where a functional parameter  is present in the model. The results can be used to construct prediction intervals, for instance regarding electricity when one wants to construct a maximum interval of demand (or need) forchemometrical data coming from the food industry.
This study established the asymptotic properties of the asymptotic normality for the estimators of conditional density function and conditional mode of a randomly scalar response, given a functional covariate when the data are sampled from ani.i.d process with a single-index structure.
The paper is organized as follows. The model and some basic assumptions are presented in Section 2. Section 3 shows the main results, and the proofs of some lemmas and of the main result. In Section 4 an application of the conditional mode in functional single-index model is presented. Finally, Section 5 illustrates those asymptotic properties through some simulations.

The model and some basic assumptions
In all the paper, the authors denote by C, C0or/and Cθ,x some generic constant in ℝ + * . The authors consider that, given the (Xi, Yi)i = 1,...,n a sequence of independent pairs functional samples, with the same distribution as (X, Y ), where Y is a real-valued random variable and X is a functional random variable (frv), which takes its values in a separable real Hilbert space H with the norm ||·||generated by an inner product <·,· >.
For a fixed in H and let F(θ, y, x) be the conditional cumulative distribution function (cond-cdf) of Y given <θ,X >=<θ,x >, specifically: The authors implicitly assume the existence of a regular version of the conditional distribution and that itis absolutely continuous with respect to the Lebesgue measure on ℝ, the aim was to build nonparametric estimates of several functions related with the conditional density of Y given <X,θ>=< x,θ>. Let be the conditional density of Y given <X, θ>=< x, θ>, for x ∈ H.
In the following, the authors denote by f(θ,·,x) the conditional density of Y given <x,θ> and define the kernel estimator ̂( θ,·,x) for f(θ,·,x) by: with the convention 0/0 = 0, where K and H are kernel functions and hK:= hn,K (respectively hH:= hn,H) is a sequence of bandwidths that decrease to zero as n goes to infinity.
Let Nx be a fixed neighbourhood of x in ℋ, ℝ will be a fixed compact subset of ℝ, now, consider the following basic assumptions that are necessary to accomplish the main result of this paper.
The conditional density f(θ,y,x) satisfies the Hölder condition, that is:

Main result
In this section the asymptotic normality of the estimator ̂( , . , ) in the single functional index model was established. Proof. In order to establish the asymptotic normality of ̂( θ, y, x), further notations and definitions were needed. First the study considered the following decomposition where and It follows that, Then, the proof of Theorem 3.1 can be deduced from the following Lemmas.

Proof.
Using the definition of conditional variance As for J1, by the property of conditional expectation and by changing variables, one obtains as → ∞ On the other hand, by applying (H2) and (H3) Moreover, by changing variables one obtains: the last equality is due to the fact that H is a probability density, and thus:

Therefore, one obtains
This yields the proof of Corollary 4.1.
Finally, in order to show the asymptotic (1 − ξ) confidence interval of Mθ(x), one needs to consider the estimator of ν 2 (θ,x) as follows: Thus, the following corollary is obtained. where τξ/2 is the upper ξ/2 quantile of standard Normal N(0,1).

Simulation study
To study the behaviour of the conditional mode estimator, in this part two examples of simulation were considered. In the first one, the authors compared the model FSIM (functional single index model) with that of NPFDA (non-parametric functional data analysis) and in the latter, knowing the distribution of the regression model (the distribution is known and usual), looked at the behaviour of this estimator of the conditional density function with respect to this distribution. Therefore, the best way to know the behaviour of the estimator of conditional density is to compute its mean square error. Thus this part of paper compared the conditional density estimation in the FSIM which is the authors' model and the conditional density estimation in the NPFDA defined in (5.1).

,
( 5.1) where, the authors estimated the conditional mode ̂( x) with a random variable M(x) such that M(x) = arg sup f(x|y) and ̂( x) = arg sup ̂( x|y). y ∈ ℝ y ∈ ℝ Therefore one has to compare their respective conditional density estimators by computing and comparing their respective mean square errors for some values of the scalar response Y.
In the following, the purpose consists in assessing the performance, in terms of prediction, of ̂( ) and ̂( x). For each given predictor (Xj)j∈ J in the testing subsample, the authors were interested in the prediction of the response variable (Yj)j ∈ J via the single functional index conditional mode ̂( ) and the fully nonparametric conditional mode ̂( x) so as to compare the finite-sample behaviour of the estimator. As the assessment tool, the authors considered the mean square error (MSE) defined as follows: where ̂j is a predictor of Yj obtained either semi-parametrically by ̂( )or nonparametrically via ̂( x).
Furthermore, some tuning parameters had to be specified. The kernel K(·) was chosen to be the quadratic function defined as 2 3 [0, 1] 2 ( ) (1 ) K u u =−1 and the cumu- The semi-metric d(·,·) is specified according to the choice of the functional space H discussed in the scenarios below. It is well-known that one of the crucial parameters in semi-parametric models is the smoothing parameters which are involved in defining the shape of the link function between the response and the covariate.
Using the result given in Theorem 4.1, the variance of this estimator is obtained as .
The idea is to choose the parameters hK and hH so that the variance is minimal. Since the variance (CV) depends on several unknown parameters that must be estimated, the calculus becomes tedious. Thus, by replacing the unknown parameters by their respective estimators 1 ( , ), 2 ( , ), ̂( ), ̂, and ∅ , (ℎ ) one obtains . Now in order to simplify the implementation of the methodology, the authors took the bandwidths hH ∼ hK = h, where h is chosen by the cross-validation method on the k-nearest neighbours (see Ferraty and Vieu, 2006, p. 102).

Simulation 1: the case of smooth curves
Let us consider the following regression model, where the covariate is a curve and the response is a scalar: where isa sequence of i.i.d random variables normally distributed with a variance equal to 0.1.
The functional covariate X is assumed to be a diffusion process defined on [0, 1] and generated by the following equation: where W, b and d are independent of normal distributions respectively ↪ (0, 1),↪ (0, 0.03) and ↪ (0, 0.05). The variables a and c are Bernoulli's laws Bernoulli B(0.5). Figure 1 shows a sample of 200 curves representing a realization of the functional random variable X.
Taking into account the smoothness of the curves Xi(t) (see Figure 1), the authors chose the distance deriv1 (the semi-metric based on the first derivatives of the curves) in H as: as semi-metric.
Then, the study considered a nonlinear regression function defined as Given ↪ (R(x), 0.2), and thus, the conditional median, the conditional mode and the conditional mean functions coincide and are equal to R(x), for any fixed x. The computation of the estimator was based on the observed data (Xi,Yi)i=1,...,n and the single index θ which is unknown and had to be estimated. In practice this parameter can be selected by the cross-validation approach (see Aït--Saidi et al., 2008). In this step it may be that one can select the real-valued function θ(t) among the eigenfunctions of the covariance operator [( ′ − ′ ) < ′ , . > ℋ ] where X (t) is a diffusion process defined on a real interval [a, b] and ′ ( ) its first derivative (see Attaoui and Ling, 2016). Hence for the chosen training sample ℒ, by applying the principal component analysis (PCA) method, the computation of the eigenvectors of the covariance operator estimated by its empirical covariance operator: , is the one best approximation of the functional parameter . Now, let us denote * the first eigenfunction corresponding to the first higher eigenvalue of the empirical covariance operator, which replaces during the simulation step.
In the following graphs, the covariance operator for ℒ = {1, … , 200} gives the discretization of the first eigenfunction (represented by a continuous curve), twenty and all the eigenfunctions ( ) (see Figure 2, Figure 3 and Figure 4).
In the simulation part, the sample of 200 was divided into two parts. The first one from 1 to 125 was used to make the simulation, and the second from 126 to 200 served for the prediction.   Source: own calculations on ground (Attaoui and Ling, 2016). The following steps were taken: Step 1. Simulate the responses variables .
Finally, the authors presented the results by plotting the predicted values versus the true values and computed the sum of squared residuals (SSR) defined by (5.2).
One can see that the sum of squared residuals (SSR) of the method of Functional--Single-Index-Model (FSIM) is less than the one of the Non-Parametric-Functional--Data-Analysis (NPFDA). This is confirmed by the following graphs, which compare the conditional mode by (FSIM) against the conditional mode by (NPFDA) (Figure 1). Thus the estimator is acceptable. As was intuitively expected, it can be observed that the mean square errors of the estimator are smaller than that of NPFDA. Therefore, the FSIM model produces much more accurate estimations than the NPFDA model in all the criteria.
Step 2. For each i in the training sample, calculate the estimator: ̂= * ( ). Step 4. For each j in the test sample ℐ = 126, ..., 200, define the confidence bands by One obtains the following figure which joins the asymptotic confidence bands study. For the purpose of making a decision, the authors chose another Example (5.1) in which the distribution of the model is known and usual. ,4).
In this study, as the curves are rough (see Figure 7) the study used the semi-metric pca.  From the obtained results presented in Table 1, one can confirm that the FSIM estimator of conditional mode is better than that of NPFDA. It gives a smaller mean square error, hence it allows for a more accurate estimation.
After the calculation of the errors, one find for this method an error SSR = 0.091. The NPFDA method gives an error SSR = 0.1181, while the real error (knowing that ↪ (

|< , >| 150
,4) is equal to = 1.672 × 10 −29 SSR = 1.938. This confirms once again that this estimator is much better than that in the NPFDA case. Therefore, in the context of i.i.d data, this estimator is more preferable.

Conclusions
This paper focused on the nonparametric estimation of the conditional mode in the single functional index model for independent data. Both the asymptotic normality as well as a confidence interval of the resulted estimator were derived. The proofs are based on a combination of existing techniques. The study's prime aim was to improve the performance of the single-index model for the conditional mode with a scalar response variable conditioned by a functional Hilbertian regressor under the independent property. Through a series of simulations, this model out performs the nonparametric functional estimator. The contribution in this study is focused on the estimation of the conditional density function as well on the estimation of the regression for complete data in a functional framework. The first approach is used for the estimation of the conditional mode. Then on parametric aspect is properly exploited in the first two sections by the given hypotheses. The proposed estimators are consistent an asymptotically distributed under appropriate conditions. Note that this approach is more significant in the presence of a simple single functional index. The dimensionality of the model is the bias part, while the dimensionality of the functional space of the explanatory variable is in the dispersion part. Then, the estimation and forecast accuracies between the FSIM and NPFDA models were evaluated and compared, and via empirical analysis, it was shown that the considered estimator has good finite sample behaviour for the prediction, and provides improved estimation and prediction accuracy compared to the NPFDA estimator. Research in the non--parametric field remains an open matter that will be the subject of several future studies in order to improve and high light the results obtained in this study. In addition, in order to explore the effectiveness of the authors' method in real situations, one can apply this approach to data constituting hourly electricity demand as well as spectrometric data. An other real example is forecasting the daily peak in electricity demand, as the accurate prediction of daily peak load demand is very important for decisions made in the energy sector. In fact, short-term load forecastsenable effective load shifting between transmission substations, scheduling of the startup times of peak stations, load flow analysis and power system security studies. Other real data applications (Maximum Ozone Concentration, Peak electricity demand) can be highlighted, asseveral attractive features of a functional prediction context, with unknown scale parameter estimator.
Research in the nonparametric field remains an open matter which will be the subject of several future studies in order to improve and highlight the results obtained in this work. To extend this study of estimation of the conditional mode to the estimation of the conditional models of a MAR (missing at random) response to the independent case and the dependent case, another type of dependency could be considered such as the quasi-partner.
Developing the asymptotic properties of a kernel estimator of the k-nearest neighbors, and generalize the results obtained by using other families of semi-metrics in order to improve the prediction performance of the estimators means that the choice of the smoothing window is important.