MODELING AND FORECASTING OF MONTHLY GLOBAL PRICE OF BANANAS USING SEASONAL ARIMA AND MULTILAYER PERCEPTRON NEURAL NETWORK

: The primary purpose of this study was to pursue the analysis of the time series data and to demonstrate the role of time series model in the predicting process using long-term records of the monthly global price of bananas from January 1990 to November 2020. Following the Box-Jenkins methodology, ARIMA(4,1,2)(1,0,1)[12] with the drift model was selected to be the best fit model for the time series, according to the lowest AIC value in this study. Empirically, the results revealed that the MLP neural network model performed better compared to ARIMA(4,1,2)(1,0,1)[12] with the drift model at its smaller MSE value. Hence, the MLP neural network model can provide useful information important in the decision-making process related to the impact of the change of the future global price of bananas. Understanding the past global price of bananas is important for the analyses of current and future changes of global price of bananas. In order to sustain these observations, research programs utilizing the resulting data should be able to improve significantly our understanding and narrow projections of the future global price of bananas.


Introduction
Like COVID-19, Panama Disease, also known as Tropical Race 4, has affected the global banana industry. Recently, Panama Disease has suddenly accelerated, spreading from Asia to Australia, the Middle East, Africa and Latin America, where the majority of the bananas are shipped from globally. Bananas are an economically important crop for many developing countries. Currently, the disease is present in more than 20 countries, prompting fears of a banana pandemic and shortages of the world's favorite fruit (Altendorf, 2019).
Bananas are the world's most popular fruit with different tastes, sizes and colours, and the fourth most important food crop after wheat, rice, and maize in terms of production, and is the world's favourite fruit in terms of consumption quantity (Ruiz et al., 2017). Bananas are among the most traded fruits in the world. In 2017, 22.7 million tons of bananas, excluding plantains, were traded, representing almost 20% of global production that year. The value of this trade was worth $11 billion, which is higher than the export value of any other exported fruit (Voora, Larrea, and Bermudez, 2020).
Asia is the largest banana-producing region, while India and China were the two leading banana-producing countries. Asia-Pacific led the market with a 61% share of global consumption, while India was the world's leading producer of banana accounting for nearly 25.7% of the total output. The Philippines consolidated its position as the second largest exporter of bananas behind Ecuador (Voora et al., 2020).
Latin America and the Caribbean is the largest exporting region, responsible for approximately 80% of global exports. In 2018, global banana exports were estimated at 23.3 million tons, with Ecuador being the largest exporter of bananas accounting for 24.7% of global exports. Belgium, Costa Rica and Columbia were the other top banana exporters in the world, whereas the United States was the leading importer of bananas with an 18% share in world imports (Voora et al., 2020).
The retail value of the banana sector was estimated to be worth between $20 billion and $25 billion in 2016 (Voora et al., 2020). The Market Reports World (2019) forecasted that the global banana sector would experience a compound annual growth of 1.21% in consumption from 2019 to 2024, reaching a global consumption volume of 136 million tons by 2025, compared to 116.2 million tons in 2017.
According to the U.S. Bureau of Labor Statistics, the average consumer price of bananas (all fresh traditional first quality organic and non-organic yellow bananas) was $0.53 (Per Lb. in U.S. City Average) with the standard deviation of $0.06 (Minimum: $0.40, Maximum: $0.64, and Median: $0.51). Average price is to measure the price level in a particular month, not to measure price change over time. Furthermore, the average consumer price for bananas was 39.91% higher in 2020 compared to 1990 (a $13.29 difference in value). Between 1990 and 2020, bananas costing $33.30 in 1990 would cost $46.59 in 2020 for an equivalent purchase.

23
Compared to the overall inflation rate of 2.30% during this same period, inflation for bananas was lower.
Given this understanding of market demand and supply of bananas globally, this study would test one simple research question, namely what is the trend of the global price of bananas in the near future? Moreover, a research hypothesis was set that there is a consistent increase in the global price of bananas. Hence, the primary purpose of this study was to pursue the analysis of time series data and to demonstrate the role of the time series model in the predicting process using long-term records of the monthly global price of bananas from January 1990 to November 2020. This paper is organized as follows. The second section briefly reviews the time series applications in the global banana market and the availability of neural networks for time series modeling and forecasting. The third section shows the data source in terms of the monthly global price of bananas. The fourth section presents the methodological approach. The fifth section demonstrates the empirical results using the seasonal ARIMA and multilayer perceptron neural network. The last section provides concluding remarks, and further discussion.

Literature review
Time series forecasting is the use of a model to predict future values based on previously observed values, which is one of the most applied data science techniques in business, used extensively in finance, supply chain management, production and inventory planning. It has a well-established theoretical grounding in statistics, and is important because there are so many prediction problems that involve a time component. Several articles used time series techniques to forecast banana production (Eyduran et al., 2020;Hamjah, 2014;Hossain et al., 2016), while quite a few studies focused on banana price forecasting (Fatin, Titik, and Mulyatno, 2020;Omar, Dewan, and Hoq, 2014).
Neural networks have become one of the most popular trends in machine learning for time series modeling and forecasting. Recently, there has been increasing interest in using neural networks to model and forecast banana harvest yields (Rathod, Mishra, and Singh, 2017;Rathod and Mishra, 2018;Rebortera and Fajardo, 2019). Despite the importance of banana demand and supply in the global markets, there is a lack of studies available in the technical literature on global price forecasting schemes.

Data
The long-term records of the monthly global price of bananas (units: U.S. dollars per metric ton, not seasonally adjusted) from January 1990 to November 2020, is available to the public from the International Monetary

Methodology
For time series analysis, the Box-Jenkins methodology, introduced by Box and Jenkins in 1970, refers to a systematic approach of identifying, fitting, checking, and forecasting the ARIMA model (Young, 1977), which can be applied to find the best fit to provide an adequate description to the time series. The Box-Jenkins methodology can be used as the process for estimating the seasonal ARIMA (SARIMA) model in this study based on its autocorrelation function (ACF) and partial autocorrelation function (PACF) as a means of determining the stationarity of the univariate time series and the lag lengths of the time series.
In statistics, stationarity is a property of a stochastic process. Intuitively, stationarity means that the statistical properties (i.e. mean and variance) of the time series do not change over time. Thus, the Box-Jenkins methodology starts with the assumption that the time series should be as on a stationary status. In the Box-Jenkins methodology there are four important steps, including identification, estimation, diagnostic checking and forecasting, that should be taken in consideration when applying it (Box et al., 2016). The identification step is applied to achieve stationarity and to build a suitable model using ACFs, PACFs, and to differencing. If the time series is non-stationary, it needs to be stationarized through differencing.
In the estimation step, plots and summary statistics can be used to identify trends, and autoregressive and/or moving average elements to gain an idea of the amount of differencing and the size of the lag that will be required for model identification. The next estimation step is to use a fitting procedure to find the coefficients of the model. In order to find good parameters for the model, Akaike's Information Criterion (AIC) or Bayesian Information Criterion (BIC) can be employed to determine the orders of the time series. Good models are obtained by minimizing the AIC or BIC value.
The diagnostic checking step is primarily to use the plots and statistical tests of the residual errors to determine the model fitting, and to evaluate the fit model in the context of the available data and check for areas where the model may be improved. The process is repeated until a desirable level of fit is achieved. There are many accuracy metrics applied after model identification and estimation helping in choosing the best fit model. The commonly used accuracy metrics to judge forecasts include: Mean Square Error ( The last step is the forecasting stage that can be applied in the forecasting process after checking the model in the previous steps. In this study, R 4.0.2 for Windows, an open source for statistical computing and graphics supported by the R Foundation for Statistical Computing, was used as the tool to estimate the model parameters to fit the SARIMA model and the multilayer perceptron (MLP) neural network model, as well to achieve the purpose of this study.   Looking at the time series plot of the data, the ACF is useful for identifying stationary or non-stationary time series. For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly. Hence, the ACF (Figure 4) of the time series, the seasonally adjusted monthly global price of bananas from January 1990 to November 2020, showed strong positive correlations at up to 26 lags that never decay to zero. Meanwhile, the test statistic of the Augmented Dickey-Fuller Test was Dickey-Fuller = -3.0039 with lag order = 7, and the p-value of the test was 0.1532. This suggested that the time series was non-stationary and need to be differenced. This was also illustrated by the single spike at the first lag, followed by small apparently random values after the first lag for the PACF ( Figure 5). Typically, the ACF and PACF shown in Figures 4 and 5 were a time series that has autocorrelation at the first lag only. Since the PACF was cut off after the first lag, it seems that the time series followed the autoregressive (AR) process. In the time series analysis, differencing can be used to transform a non-stationary time series into a stationary one. When both trend and seasonality are present, one may need to apply both a non-seasonal first difference and a seasonal difference. The first difference of a time series is the time series of changes from one period to the next. Note that the graph of the first difference of the time series looked approximately stationary ( Figure 6). According to the Augmented Dickey-Fuller Test, Dickey--Fuller = -10.218 with lag order = 7, and the p-value of the test was smaller than 0.01. It rejected the null hypothesis that is non-stationary, and suggested that the first difference of the time series was stationary.

28
Yeong Nain Chi, Orson Chi  The ACF of the first differenced time series in Figure 7 showed a significant positive spike at the first lag followed by correlations that were statistically significant. The corresponding PACF of the first differenced time series in Figure 8 presented most likely a gradual decrease, and negative after the first few lags. Since the ACF was cut off after the first lag and the PACF decreased gradually, a reasonable conclusion was that the first differenced time series followed the moving average (MA) process.  Seasonal differencing can remove seasonal trends and can also remove a seasonal random walk type of non-stationarity. Seasonal differencing is defined as the difference between a value and a value with a lag that is a multiple of seasonality. In this case, seasonality = 12 months per year is the span of the periodic seasonal behaviour. Figure 9 shows the graph of the 12 th difference of the time series, which looked approximately stationary. Meanwhile, the test statistic of the Augmented Dickey-Fuller Test was Dickey-Fuller = -32.05 with lag order = 7, and the p-value of the test was smaller than 0.01. It rejected the null hypothesis that is non-stationary, and suggested that the 12 th first difference of the time series was stationary.  Figure 10 showed that ACF demonstrated most likely a steady decay after the first few lags and bounced between being positive and negative statistically significant. Meanwhile, Figure 11 showed what PACF usually looks like a steady negative decay in the partial correlations toward zero. This is consistent with the general recommendation that if the autocorrelation at the first seasonal lag is positive, one should use an autoregressive (AR) process vs. a moving average (MA) process.

Estimation of the SARIMA model parameters
SARIMA is an extension of Autoregressive Integrated Moving Average (ARIMA) that explicitly supports the univariate time series with a seasonal component. One shorthand notation for the SARIMA model is ARIMA(p,d,q)(P,D,Q)[m], where p = the order of the autoregressive process (the number of lagged terms), d = the number of differences required to make the time series stationary, q = the order of the moving average process (the number of lagged terms), P = the order of the seasonal autoregressive process, D = the number of seasonal differences applied to the time series, Q = the order of the seasonal moving average process, and m = the seasonality of the model, i.e. the number of time steps for a single seasonal period.
In backshift notation B, "backshift operator" or "lag operator", is a useful notational device when working with time series lags: By t = y t−1 (means "back up by one time unit") and B k y t = y t−k (means "backshift k times"). Thus, the ARIMA(p,d,q) (P,D,Q)[m] model can be expressed in backshift notation as: where: , c is a constant, and e t is the residual error (i.e. white noise) (Montgomery, Jennings, and Kulahci, 2008).

Diagnostic checking of the SARIMA model
A common task when building a forecasting model is to check that the residuals satisfy some assumptions that they are uncorrelated, normally distributed, etc. The top figure of Figure 12 showed that the residuals from the ARIMA(4,1,2)(1,0,1) [12] with the drift model did not violate the assumption of constant location and scale. The bottom right figure of Figure 12 showed that the residual histogram did not reveal a series deviation from normality in this case. The ACF plot of the residuals (the bottom left figure of Figure 12) also showed that all sample autocorrelations were within the threshold limits, indicating that the residuals appeared to be random. The Ljung-Box Q-test (Ljung and Box, 1978) is a diagnostic tool used to test the lack of fit of a time series model. In this case, the test statistic of the Ljung--Box Q-test was Q = 14.768 with 15 degrees of freedom, and the p-value of the test was 0.4682 with model degrees of freedom: 9 and total lags used: 24, indicating that the residuals were random and that the model provided an adequate fit to the time series. This, combined with the Ljung-Box Q-test statistic, suggested that the ARIMA(4,1,2)(1,0,1)[12] with the drift model appropriately modelled the dynamics for this time series.

Using the SARIMA model in the forecasting process
The forecasted values of monthly global price of bananas are shown in Table 3. Forecasts from ARIMA(4,1,2)(1,0,1)[12] with the drift model are shown in Figure 13, presenting a linear upward trend projected to the future. The next figure (Figure 14) illustrated that the black line represented the visuals of the monthly global price of bananas dataset without forecasting and the red line represented the visuals of the monthly global price of bananas dataset with forecasted values. The forecasting process with ARIMA(4,1,2)(1,0,1)[12] with the drift model indicated a good fit of the model for forecasting at a constant increasing rate in the short term.

Multilayer Perceptron (MLP) neural network model
The Multilayer Perceptron (MLP) neural network is one of the most popular machine learning methods, which is able to carry out prediction tasks in a more reliable manner. MLP is a system of simple interconnected neurons, or nodes, which is a model representing a nonlinear mapping between an input vector and an output vector (Gardner and Dorling, 1998). The MLP neural network consists of three layers of nodes: an input layer, a hidden layer and an output layer ( Figure 15). Except for the input nodes, each node is a neuron that uses a nonlinear activation function. The MLP neural network is trained for multi-step-ahead prediction, on testing as well as on training data sets for short-term prediction. Thus, the optimum neural network model is proposed for the short-term prediction of the time series. The function "mlp()" from the "nnfor" package in R 4.0.2 for Windows automatically generates ensembles of networks, the training of which starts with different random initial weights.
Training process was categorized into two main steps: the first is to select the best architecture of networks, and the second is to integrate MLP neural networks with optimizers. The output indicated that the resulting MLP neural network model ( Figure 15) has 5 hidden nodes, it was trained 20 times and the different forecasts were combined using the median operator. The results were compared with reference to the MSE (mean square error) = 992.9927 against ARIMA(4,1,2)(1,0,1)[12] with the drift model, MSE = 4527.31832.  Figure 16 showed that the residual histogram from the MLP neural network model did not reveal a series deviation from normality. At the same time, the ACF plot of the residuals (the bottom left figure of Figure 16) from the MLP neural network model showed that all sample autocorrelations were within the threshold limits, indicating that the residuals appeared to be random.   The MLP neural network forecasting the values of the monthly global price of bananas was presented in Table 4. Forecasts from the MLP neural network model are also shown in Figure 17, with a linear upward trend projected into the future. The forecasting process with the MLP neural network model indicated a good fit of the model for forecasting at a constant increasing pattern in the short term, seasonally.

Discussion and conclusion
Bananas rank as a leading crop in world agricultural production and trade. Assuming normal weather conditions and no further spread of banana plant diseases, bananas have seen rapidly increasing production and trade volumes in response to fast population growth in producing countries, as well as an expanding global import demand. Obviously, the future is more intensive regarding knowledge, and more understanding of the natural complexities of living systems. To bring together a wide variety of perspectives and concepts requires holistic solutions that involve working across disciplines, principles and methods to support interdisciplinarity and transdisciplinarity, to explore and formalize systems concepts, and to develop systemic methods for learning and change.
Despite the importance of banana demand and supply in the global markets, there is a lack of studies available in the technical literature on global price forecasting schemes. In this study, the best choice time series model was ARIMA(4,1,2)(1,0,1) [12] with the drift model as its lowest AIC values among other models. It was noted that this ARIMA(4,1,2)(1,0,1)[12] with the drift model provided evidence that the future monthly global price of bananas would increase over time.
Prediction is a kind of dynamic filtering, in which the past values of one or more time series are used to predict future values. The MLP neural network is one of the most popular machine learning methods, which is able to carry out prediction tasks in a more reliable manner. With time series data, lagged values of the time series can be used as inputs to a neural network, the MLP neural network was applied to time series prediction using its past values of a univariate time series in this study.
Empirically, the results revealed that the MLP neural network model performed better compared to ARIMA(4,1,2)(1,0,1)[12] with the drift model at its smaller MSE value. Hence, the MLP neural network model can provide useful information important in the decision-making process related to the impact of the changes of the future global price of bananas. Understanding the past global price of bananas is important for the analyses of the changes of the current and future global price of bananas changes.
In order to sustain these observations, research programs utilizing the resulting data should be able to improve significantly our understanding and narrow projections of the future global price of bananas. Therefore, checking the validity of the market price of bananas in the case of certain countries could be one of the main limitations of this study. Furthermore, the nonlinear autoregressive exogenous (NARX) neural network can also be used to understand not only past information of the same time series (the monthly global price of bananas), but also current and past information of the externally determined time series that influences the time series of interest (i.e. the production of bananas, climate change considerations, consumer behaviour expectation, etc.).