UNFORM IN BANDWITH OF THE CONDITIONAL DISTRIBUTION FUNCTION WITH FUNCTIONAL EXPLANATORY VARIABLE: THE CASE OF SPATIAL DATA WITH THE K NEAREST NEIGHBOUR METHOD

: In this paper the author introduced a new conditional distribution function estimator, in short (cdf), when the co-variables are functional in nature. This estimator is a mix of both procedures the k Nearest Neighbour method and the spatial functional estimation.


Introduction
Statistical problems related to modelling and the study of spatial data have recently been of great interest in the field of statistics.
The importance of this research topic is motivated by the growth in the number of concrete problems for which data are collected in a spatial order. Such problems are encountered in many fields such as epidemiology, econometrics, environmental and earth sciences, agronomy, imaging, etc. In nonparametric statistics, the modelling of spatial data is relatively recent compared to the parametric case. Indeed, the first results were obtained by Tran (1990), while the most recent references include those of Dabo--Niang and Yao (2007), Carbon, Francq and Tran, (2007) and Li and Tran (2009).
The objective of this work was to study the nonparametric estimation of the cdf function when the observations are both spatially dependent and the covariable is in a space of a semi-metric infinite dimension with the k Nearest Neighbour method (kNN). The estimation of the cdf function plays a very important role in statistics. Indeed, it is used in risk analysis and for the study of survival phenomena in many fields such as (medicine, geophysics, reliability, etc. …).
For spatial dependency, nonparametric approaches were also investigated, e.g. we cite the work of Li and Tran (2007): the latter obtained, in a spatial context, the asymptotic normality of a kernel estimator of the cdf function. Lu and Chen (2004) and Biau and Cadre (2004) investigated the non-parametric spatial regression problem, using the Nadaraya-Watson weights to create a kernel estimator and establish the weak convergence and asymptotic distribution. Carbon, Francq and Tran (2007) investigated the non-parametric auto-regression model in a prediction context on random fields. The spatial non-parametric regression was examined by Li and Tran (2009) who demonstrated the construct estimate's asymptotic normality. Then whereas Attouch, Laksaci and Messabihi (2015) studied the nonparametric relative error regression for spatial random variables. For current advances and references in non-parametric geographical data analysis, the author refer to El Machkouri and Stoica (2010), Robinson (2011), andDabo-Niang, Ould-Abdi andDiop (2014).
The main contribution of this work was the generalisation of the results of Ferraty, Rabhi and Vieu (2008) and of the research by Laksaci and Mechab (2010) in the case of spatially dependent observations. Under rather general mixing conditions, this study established the almost complete convergence (with rate) of the cdf function of a real random variable conditioned by a functional explanatory variable by the kNN method. Note that, like all asymptotic properties in functional nonparametric statistics, this result is linked to the phenomenon of concentration of the probability measure of the explanatory variable and to the regularity of the functional space of the model. Non-parametric k Nearest Neighbour (kNN) smoothing approaches have attracted a lot of interest in the statistical literature for evaluating multivariate data because of their flexibility and efficiency. Drawn by its attractive features, the functional kNN smoothing approach has received growing consideration over the last few years. The study by Gyorfi (2002) is a thorough analysis of kNN estimators in the finite dimensional context. Work in this area was started by Cover (1968), and a large number of articles are now available in various estimating contexts, including regression, discrimination, density and mode estimation, and clustering analysis; the author also made reference to Collomb (1981), Devroye and Wagner (1982), Li and Tran (2007), Moore and Yackel (1977), Devroye, Gyorfi, Krzyzak, and Lugosi (1994), Beirlant and Biau (2018), Laloe (2008), Burba, Ferraty andVieu (2009), Tran, Wehrens andBuydens (2006), Lian (2011), Attouch and Bouabsa (2013), Attouch, Bouabsa and Chiker el mozoaur (2018), Kudraszow and Vieu (2013). For the most recent advances and references this study cited Kara, Laksaci, and Vieu (2017), Almanjahie, Aissiri, Laksaci, and Chiker el Mezouar (2020), and Bouabsa (2021).
Note that in spatial statistics, one distinguishes two types of asymptotics (see Cressie (1991)): the extensive asymptotic and the intensive asymptotic. The former deals with the cases where the size of the observations increases with that of the observation domain; in practice, this asymptotic is used when observations are collected by separate measuring stations. Examples of functional data, adapted to this asymptotic, are found in economics (consumption curves of any product in different cities), the environment (the concentration curves of a polluting gas in different regions) and in agronomy (the concentration curves of rainfall in different localities). The intensive asymptotic examines the situation of observations which densify in a fixed bounded domain, which is the case, for example, in prospecting or in radiographic analysis.
The study's main objective was to construct an estimator of the cdf function by relinking the functional estimation approach with the spatial setting using the kNN method estimation. Let us remember that the complexity of this research comes from the fact that the bandwidth parameter in the kNN method is a random variable. To be precise, the bandwidth parameter is priorily defined according to the distance between the functional random variable. Such consideration allows for exploring the topological as well as the specter component of the data. N-F-D-A, kNN cdf with spatial data is new.
The author presents the estimator of the spatial model with the kNN method in Section 2. Section 3 gives the assumptions, and studies the almost complete convergence of this estimator. In Section 4 the author provides all the results and their proof.

The model and its estimator with the kNN method
Let be a natural number in ℕ ∧ * . Consider the random field = ( , ), ∈ ℕ M with values in × ℝ, where ( , ) is a semi-metric space of possibly infinite dimension. In this context, ( ) ∈ℕ M can be a functional random variable. It should be noted that, for a good ten years the statistical community has been preoccupied with the development of models and methods adapted to this context of functional data. While the first studies in this direction mainly focused on linear models (see Bosq, 2000;Ramsay and Silverman (2006)), recent developments (see Ferraty and Vieu, 2006) report non-parametric models suitable for this type of data.
Next, a point in (respectively, a compact ∈ ℝ) was fixed, assuming that the spatial observations ( , ) ∈ℕ M have the same distribution as : = ( , ) and that the regular version of the probability of knowing = . With the kNN method the functional parameter studied in this article, denoted k a so for all ∈ ℝ, hence k a (⋅) = ℙ( ⩽⋅ | = ).
The aim of this work was to study the almost complete convergence of the estimator ̂k a to k a , when the functional random field ( ) ∈ℕ M satisfies the following where ℬ( ) (respectively ℬ(Z ′ )), the Borelian tribe generated by ( , ∈ ) (respectively ( , ∈ Z ′ )), ( ) (respectively Card (Z ′ )) is the cardinal of (respectively Z ′ ), ( , Z ′ ) designates the Euclidean distance between and Z ′ and Ψ a symmetric function: ℕ 2 → ℝ +M , decreasing with respect to the two variables separately and satisfying one of the following conditions: or Ψ( , ) ⩽ ( + + 1) ∧̃, , ∈ ℕ, for some ̃⩾ 1 and > 0; note that these conditions were used by Tran (1990) and they are verified by many spatial models (see Guyon, 1987). Recall that when equation (1) holds with Ψ ≡ 1 or = 1, the random field = ( , ) is desribed as highly mixing (see Doukhan, 1994), for further information on mixing qualities and examples).
In addition, we suppose that the process meets the following mixing condition: Note that conditions (2) and (3) are identical to the mixing conditions employed by Carbon et al. (2007), and Tran (1990).

Asymptotic properties
In the following, the author denotes by and/or ′ any strictly positive constants. Recall that in this spatial context, → ∞ means that min{ ℓ } → ∞ and that for each 1 ⩽ , ℓ ⩽ we have ∞ > > | ℓ |. Let us introduce the following hypotheses is the closed ball, centered at and of radius .

Remarks on the hypotheses
This research provides a link between the work by Kara, Laksaci, and Vieu (2017) and Laksaci, and Mechab (2010), so several assumptions are considered the same as in all these studies.

Proof
The demonstrations are based, respectively, on the following decompositions where is the spatial index of the fixed components 1.
The proof follows from the Lemmas bellows.
One has to show that there exists 0 > 0 such that Following Bernstein's inequality in (Dony and Einmahl (2009), p. 321) the proof is based on Bernstein's inequality for empirical processes, by defining ℎ , = 2 and ( ) = { : ℎ , ⩽ 2 0 }, To write the difference, the demonstration is based on the concepts similar to those used by Carbon, Tran and Wu (1997). Thus Therefore, Consider the spatial decomposition of Tran (1990) on Θ ( ) variables, defined, for the fixed integer , as follows Even if is not exactly equal to 2 , one can group the remaining variables in a block Δ( , , 2 + 1) (this will not change the proof, see Biau, Cadre (2004)). Now, under the last equation (6) This only deals with the case = 1. For this, we number the variables ( (1, , , ); ∈ ℋ) and apply (Lemma 4.1 of Carbon, Tran, Wu (1997)) on the renumbered variables. Variables with the new numbering are noted 1 , … , , where = ∏ =1 = 2̂− ⩽̂− . Note that for all there is a certain in ℋ such as where (1, , , ) = { : 2 + 1 ⩽ ⩽ 2 + ; = 1, … , }. The distance between these sets is greater than and each set contains elements.

Proof
Given the fact that all random variables are distributed in the same way, Integrating by parties, one can see that Taking into account the change in a common variable =  Then, ℘ = (√ log ( ) ).
With the same technic of demonstration like Lemma 4.1, the only difference is that Θ ( ) is used instead of Θ ( ). Hence this leads finally to  Then, for large enough there exists a constant ′ > 0, such that: [ ^] ⩾ ′ for all ∈ ( , ).
As a result of selecting = ′/2, one obtains: