Single Functional Index Quantile Regression for Functional Data with Missing Data at Random

Single Functional Index Quantile Regression for Functional Data with Missing Data at Random . Econometrics. Ekonometria. Advances in Applied Data Analysis


Introduction
The Single Index Model (SIM) is a financial modelling technique used to analyse the risk and return of a portfolio.It assumes that the returns of individual assets can be explained by their exposure to a common factor or market index.When dealing with missing data in the SIM framework, the missingness is assumed to be at random (MAR).This means that the missing values are related to the observed data but not to the missing values themselves.
To deal with missing data in the SIM, there are several possible approaches one can consider: 1. Complete-case analysis involves excluding any observations with missing data from the analysis.
Although this is the simple approach, it can result in the loss of valuable information if there is a considerable amount of missing data.2. Imputation: such methods involve estimating missing values based on the observed data.There are various imputation techniques available, such as mean imputation, regression imputation, and multiple imputations.These methods aim to replace missing values with plausible estimates to preserve the integrity of the analysis.3. Maximum likelihood estimation involves estimating the model parameters using the likelihood function, which considers both the observed data and the mechanism of missing data.By maximising the likelihood one can obtain parameter estimates that are most consistent with the observed data, taking into account the assumed mechanism for the missing data.4. Multiple imputations is a sophisticated imputation technique that generates multiple plausible values for each missing data point.It involves creating multiple imputed datasets, estimating the model parameters for each dataset, and then combining the results using appropriate rules.
Multiple imputations can provide more reliable estimates and standard errors compared to single imputation methods.
It is crucial to note that the selection of approach depends on the specifics of the data, the extent of missingness, and the assumptions one is willing to make.It is always recommended to carefully consider the nature of the data and consult with domain experts when handling missing data in the SIM or any other modelling framework.
Ongoing research focuses on the asymptotic properties of semi-parametric estimators of the conditional quantile for functional data within the Single Index Model (SIM) considering missing data at random (MAR), and specific results may depend on the particular assumptions and estimation methods employed.However, this study can provide a general overview of some relevant concepts and approaches in this context.In the SIM framework, functional data refers to observations that are functions rather than scalar values.The goal was to estimate the conditional quantile of a functional response variable based on a set of functional predictors and a single index variable.
To establish the asymptotic properties of the semi-parametric estimators for the conditional quantile of functional data in the SIM considering missing data at random, various theoretical conditions need to be satisfied.These conditions often involve assumptions about the functional data, the missing data mechanism, and the model specification.Some common conditions include consistency and efficiency.Specific results in this area may depend on the assumptions and estimation techniques employed in each study.Therefore, it is important to refer to the literature and research articles that focus on the specific estimation method and relevant assumptions to obtain more detailed and precise asymptotic properties of the estimators.
Note: The field of functional data analysis is evolving, and new research may have emerged since the knowledge cutoff in September 2021.Consulting recent publications and academic resources on functional data analysis, missing data, and the single index model would provide the most up-to-date information in this area.
In non-parametric statistics, one of the most common problems is the issue of forecasting.Regression is often employed as the primary tool in addressing this issue.However, regression is inadequate in cases where the conditional density is asymmetrical or multimodal.Hence, the conditional quantile provides a better prediction of the impact of the variable of interest  on the explanatory variable .When the explanatory variable is either infinite-dimensional or of a functional nature, there is a limited amount of literature available that investigates the statistical properties of functional nonparametric regression models for missing data.In 2013, Ferraty, Sued, and Vieu introduced a novel method for estimating the average value of a single variable response using an independent and identically distributed (i.i.d.) functional sample.This method considers cases where the independent variables are observed for each individual, whereas certain responses are missing randomly (MAR).This work extended the results given in Cheng (1994) to the situation where the independent variables possess a functional nature.
As far as it is known, the statistical literature has not yet explored the estimation of the nonparametric conditional distribution in the context of a functional single index structure, which incorporates missing data and stationary processes with functional nature.This study focused on investigating the estimation of conditional quantiles under the assumption of missing at random (MAR) data.The objective was to develop a functional approach that can effectively handle MAR samples in nonparametric problems, specifically in the context of conditional quantile estimation.Thus, the authors established the asymptotic of the estimator under certain mild conditions, and in this context focused on a model where the response variable is missing.In addition to the infinite-dimensional nature of the data, the study intentionally avoided using the strong mixing condition and its variants to measure dependency, as they involve complex probabilistic calculations.Therefore, within this framework the independence of the variables was assumed.To the best of the authors' knowledge the statistical literature does not currently provide any studies on the estimation of conditional quantiles that incorporate censored data, independent theory, and functional data with a single index structure.This work extends to the functional single index model case, the work of Ling, Liang and Vieu (2015), Ling, Liu and Vieu (2016) and Rabhi, Kadiri and Mekki (2021).
The estimation of conditional quantiles, specifically the conditional median function, has attracted considerable interest in the statistical community due to its theoretical and practical implications.
It serves as a compelling alternative predictor to the conditional mean due to its robustness in handling outliers (see Chaudhuri et al., 1997).
Many researchers have shown great interest in the estimation of the conditional mode of a scalar response with a functional covariate.The nonparametric estimator for the conditional quantile, which is defined as the inverse of the conditional distribution function in the case of dependent data, was introduced by Ferraty, Rabhi, and Vieu (2005).Under an α-mixing assumption, Ezzahrioui and Ould--Saïd (2008) established the asymptotic normality of the kernel conditional quantile estimator.Ould-Saïd and Cai (2005) demonstrated the uniform strong consistency, along with rates, of the kernel estimator for the conditional mode function in the censored case.In the context of estimating conditional quantiles, this study referred to the work of Lemdani, Ould-Saïd, and Poulin (2009).Several other authors have shown interest in estimating conditional models under the presence of censored or truncated observations, see for instance, Liang and de Uña-Alvarez (2010), Rabhi, Kadiri and Mekki (2021), Rabhi, Kadiri and Akkal (2021), Hamri, Mekki, Rabhi and Kadiri (2022), Ould-Saïd and Djabrane (2011), Ould-Saïd and Tatachak (2011), etc.
The studies by Aït-Saidi, Ferraty, Kassa and Vieu (2008) focused on using SFIM (Single Functional Index Models) to estimate the regression operator.They proposed a cross-validation procedure to estimate both the unknown link function and the unknown functional index.Attaoui and Boudiaf (2014) and Attaoui and Ling (2016) were interested in the estimation of the conditional density and the conditional cumulative distribution function, respectively, using SFIM.Their studies assumed a strong mixing condition for the data.Kadiri, Rabhi and Bouchetouf (2018) examined the asymptotic properties of a kernel-type estimator for conditional quantiles in the context of right-censored response data sampled from a strong mixing process.
The remaining sections of the paper are structured as follows: Section 2 introduces the non-parametric estimator of the functional conditional model in cases when data are Missing at Random (MAR).Section 3 outlines useful assumptions for the theoretical analysis.Section 4 establishes the pointwise almost complete convergence and the uniform almost-complete convergence of the kernel estimator for our models, along with the corresponding convergence rates.

The functional nonparametric framework
Letus consider a random pair (, ) where  takes values in ℝ and  takes values in an infinite--dimensional Hilbertian space ℋ with scalar product 〈., .〉.It was assumed that the statistical sample of pair (  ,   ) =1,…, is the same distribution as (, ), but they are independent and identically distributed.
By stating this, the authors implied the presence of a regular form of the conditional distribution of  given 〈, 〉 = 〈, 〉.
In the context of the infinite-dimensional objective, the study employed the term "functional nonparametric," where "functional" signifies the infinite dimensionality of the data, and "nonparametric" refers to the infinite dimensionality of the model.This type of statistical approach, known as doubly infinite dimensional, is also referred to as functional nonparametric statistics.For more details, refer to Ferraty and Vieu (2003).Furthermore, the term "operational statistics" was employed as the target object to be estimated (the cond-cdf (, ., ), can be perceived as a nonlinear operator.

The estimators
In the case of complete data, the kernel estimator  �  (, ., ) to estimate (, ., ) is presented as follows: Here  represents a kernel function,  denotes a cumulative distribution function and ℎ  (resp.  ) refers to a sequence of positive real numbers.It is worth noting that Roussas (1969) introduced related estimates based on similar concepts but specifically when X is real.Moreover, Samanta (1989) produced an earlier asymptotic study on the subject.
The uniqueness of such an estimator is assured when is an increasing continuous function.This approach has been extensively employed in situations where the variable  has a finite dimension.
In the case of incomplete data with missing at random for the response variable, the observations consist of (  ,   ,   ) 1≤≤ , where   is completely observed, while   = 1 if   is observed and   = 0 otherwise.The Bernoulli random variable  was introduced, which is defined as follows: where (, )represents a functional operator that is conditionally uniquely on .Thus, the estimator of (, , ) in thesingle index model with response MAR isexpressed as follows: where: � 1 (, )� .
Then a natural estimator of   (, ) is given by

Assumptions on the functional variable
Let   represent the fixed neighborhood of  in ℋ, and then introduce the concept of ball   (, ℎ) centred atwith a radius of ℎ.

The nonparametric model
In nonparametric estimation, it was assumed that cond-cdf (, ., ) satisfies specific smoothness constraints, which also satisfy the following conditions, where  1 and  2 are positive numbers.

Asymptotic study
The aim of this section was to apply these concepts to the context of a functional explanatory variable, and to develop a kernel-type estimator for the conditional distribution function (, , ) adapted to MAR response samples.The authors' goal is to demonstrate the almost complete convergence 1 of the kernel estimator  � (, , ) where the response variable is missing.The provided results are accompanied by the data on the convergence rate.In the following discussions,  and  ʹ will represent generic strictly positive real constants, while ℎ  (resp.  ) denotes a sequence that converge to 0 as  increases.

Pointwise almost complete convergence
Following the assumptions presented in Section 2.4 necessitated supplementary conditions.These assumptions, which were later necessary for the parameters of the estimator, i.e. concerning , , ℎ  and   are not excessively restrictive.Indeed, on one hand, these assumptions are fundamental to the estimation problem of (, , ), and on the other hand, these assumptions correspond to the assumptions typically employed in the context of non-functional variables.Specifically, the following conditions were introduced to ensure the performance of the estimator  � (, ., ): (ii) The support of  (1) is compact and ∀ ≥ ,  () exists and is bounded.(H6) Function , when restricted to the set { ∈ ℝ, () ∈ (0,1)} is strictly increasing.(H7)  is a bounded positive function on the interval [0,1]: ∀ ∈ [0,1], 0 <K().(H8) (, ) is continuous in the neighbourhood of , such that 0 < (, ) < 1.
1 Remember that sequence (  ) ∈ℕ of random variables is considered to converge almost completely to variable S, if, for any  > 0 w one has ∑ ℙ(|  − | > ) < ∞  .This form of convergence induces both the almost certain convergence and convergence in probability (see e.g.Bosq and Lecoutre, 1987).
• Comments on the hypotheses: 1. Assumption (H1) plays an important role in the methodology.It is known as the 'concentration property' of the infinite dimensional spaces.2. (H2) and (H4) are used to control the regularity of the functional space of our model.3. (H6) ensures the existence of   (, ), furthermore its uniqueness is ensured by (H5).4. Assumptions (H5) and (H7) are classical in functional estimation for finite or non-finite dimension spaces and are technical, and permit to give an explicit asymptotic variance.5. Assumptions (H1)-(H4) and (H7) are commonly used in the estimation of conditional distribution estimation in a single functional index model, which have been employed in the i.i.d.case by Kadiri, Rabhi and Bouchentouf (2018).6. (H8) is a supposition for missing at random, hence it is a technical condition for the concision of the proof of the main results.
Lemma 3.1.Suppose that hypotheses (H1)-( H2), (H5)-(i) and (H8) are satisfied, then integrating by parts and using the fact that  is a cdf along with employing a double conditioning with respect to  1 , one readily obtain: and write, under (H3), (H5)-(i) and (H8), to obtain: Finally, the proof is complete.As the first and the third terms can be used similarly, let us focus on the first term.By (H5)-(i) which implies in particular that  is Hölder continuous with order one, this can be expressed as follows: .
Using  � D (, ) = (, ), (H5)-(i) and lim Therefore, for n large enough Following similar arguments, one can write Applying Bernstein's exponential inequality to: Firstly, it follows from the fact that the Kernel Γ is bonded and  ≤ 1, therefore by selecting a sufficiently large value for  0 , the proof can be concluded by the application of the Borel--Cantelli lemma.This allows for an easy deduction of the result.
All the calculations performed earlier using variables Π  (, )remain applicable when considering variables Δ  (, ), and we obtaining Concerning the second part where The authors finalised the proof of Theorem 3.1 by employing inequality (3.1) along with Lemma 3.1, Lemma 3.2, and Lemma 3.3.
Certainly, it can be established that , where where Ξ i (, ) = Proof.The demonstration relies on the Taylor expansion of  � (, ., ) around   (, ) and the application of (H9): The combination of the first part of Lemma 3.3 with Lemmas 3.4-3.5 allows to obtain the desired result.

Uniform almost complete convergence and rate of convergence
In this section the authors establish the uniform version of Theorem 3.1, Proposition 3.1 and Theorem 3.2, which are standard extensions of the pointwise results.Clearly, achieving these results necessitated more intricate technical developments beyond those presented earlier.To enhance the clarity of this section, it was necessary to employ additional tools and consider certain topological conditions (see Hamri et al., 2022).Initially, owing to the compactness of the sets  ℋ and Θ ℋ it was possible to cover them using a finite number of disjoint intervals.Let    ℋ and   Θ ℋ denote the minimal numbers of open balls with radius   in ℋ that are required to cover  ℋ and Θ ℋ , respectively; within these intervals, the points   (resp.  ) ∈ ℋ.

Conditional distribution estimation
The objective of this part was to demonstrate almost complete uniform convergence.In order to extend the previously obtained results, it was essential to introduce a topological framework for the functional space of the observations and the functional character of the model.The study's asymptotic conclusions made use of the topological properties in the functional space of the observations.It is worth mentioning that all the convergence rates rely on the assumption of probability measure concentration of the functional variable within small balls, as well as the concept of Kolmogorov's entropy, which quantifies the number of balls required to cover a given set.To achieve this objective, the authors introduced the following conditions: (U2) Kernel K satisfies both (H3) and the Lipschitz condition, which states that |() − ()| ≤ ‖ − ‖.

Conclusion
This paper focused on the nonparametric estimation of the conditional distribution function and conditional quantile in the single functional index model for independent data, when the variable of interest is subject to the presence of randomly missing data, involving both some (semi-parametric) single model structure and also some censoring process on the variables.Both the almost complete convergences as well as almost uniform complete convergence of the proposed estimators were established.The proofs were based on some standard assumptions in Functional Data Analysis (FDA).