Analysis of the Effects of Cell Temperature on the Predictability of the Solar Photovoltaic Power Production

The use of intermittent power supplies, such as solar energy, has posed a complex conundrum when it comes to the prediction of the next days’ supply. There have been several approaches developed to predict the power production using Machine Learning methods, such as Artificial Neural Networks (ANNs). In this work, we propose the use of weather variables, such as ambient temperature, solar irradiation, and wind speed, collected from a weather station of a Photovoltaic (PV) system located in Amman, Jordan. The objective is to substitute the aforementioned ambient temperature with the more realistic PV cell temperature with a desire of achieving better prediction results. To this aim, ten physics-based models have been investigated to determine the cell temperature, and those models have been validated using measured PV cell temperatures by computing the Root Mean Square Error (RMSE). Then, the model with the lowest RMSE has been adopted in training a data-driven prediction model. The proposed prediction model is to use an ANN compared to the well-known benchmark model from the literature, i.e., Multiple Linear Regression (MLR). The results obtained, using standard performance metrics, have displayed the importance of considering the cell temperature when predicting the PV power output.


INTRODUCTION
Jordan is a nation lying in the heart of the Middle East, surrounded by Palestine, Iraq, Syria, Saudi Arabia, and shares a water border with Egypt. Unlike the most of the neighboring nations, Jordan does not have enough crude oil to sustain itself. In fact, Jordan relies heavily on the import of the crude oil to satisfy the consumption. This fact meant that Jordan has to import oil at a huge cost which amounts to more than 10% of the total GDP (Department of Statistics 2017; Jaber et al., 2004;Ministry of Energy and Mineral Resources (MEMR) 2017; Ministry of Planning and International Cooperation 2015; National Electric Power Company (NEPCO) 2018).
In order for Jordan to meet its growing energy demand, alternative means of generating energy have been investigated. Jordan's In order to predict the power generated from PV panels, there have been two main types of ML algorithms utilized to determine the power output as accurately as possible. These algorithms can be generally categorized into physics-based and data-driven (Al-Dahidi et al., 2018;Das et al., 2018;Ernst et al., 2009;Moslehi et al., 2018). Physics-based extract a mathematical equation from the collected weather variables (e.g., ambient temperature, irradiation, wind speed, etc.) to find the PV power output. On the other hand, data-driven are appropriated by ML algorithms without the need for any physics-based model. In fact, they exploit preexisting historical data collected from sensors or a weather station to find a relation between the weather variables and the power output.
In this work, only data driven methods will be analyzed. Most of the previous research works have used the two parameters of irradiation and ambient temperature in their analysis. For example, Fernandez-Jimenez et al., 2012, presented a short term forecasting method that consists of three modules, two of which were Numerical Weather Prediction (NWP) models and the third was an Artificial Neural Network (ANN)-based model. The first two were used to forecast weather variables to be used by the third module. The final value is the hourly power output of the PV plant with a 1-39 h forecast horizon; Liu et al., 2017, proposed the use of BP NN to predict power output up to 24 h-ahead; Zhong et al., 2018, employed the use of both General Regression and BP, and the results were then compared showing more favorable results with BP; Liu et al., 2019, established a Weight Varying Ensemble forecasting model that improved short term power prediction. In (Mellit, 2009), a Recurrent NN (RNN) was used for forecasting the generation of a PV power system; Ding et al., adopted an ANN-based approach. An improved BP learning algorithm is used to overcome the shortcomings of the standard BP learning algorithm; Chow et al., employed ANN to mimic the nonlinear correlation between meteorological factors and power output, and then display that short-term power prediction performance is commensurate to the real-time power prediction performance when ahead solar angles are taken into account; Oudjana et al., adopted NN for one week-ahead prediction using weather variables; Shi et al., proposed a forecasting PV power output approach based on weather classification and Support Vector Machines (SVMs); Hussein et al. (Kazem and Yousif 2017), used neural mathematical models such as Generalized Feedforward Networks (GFF), MultiLayer Perceptron (MLP), Self-Organizing Feature Maps (SOFM) and SVM to predict power produced and compared the results. Al-Dahidi et al., proposed the exploitation of ELM for faster computational speed and better generalization capability and compared the performance of the model with the traditional BP-ANN of literature.
Some other common weather variables used for prediction purposes were the relative humidity and wind speed with the aforementioned variables. For example, Lin et al.,; proposed a unique hybrid prediction model combining improved K-means clustering, Grey Relational Analysis (GRA) and Elman NN (Hybrid improved K means-GRA-Elman, HKGE) for forecasting the PV power output. The proposed model was implemented using multiple meteorological conditions and history files of PV output.
The main weather variables have been irradiation and ambient temperature. The following research works substituted the ambient temperature with the cell temperature. For example, Ba et al., implemented a statistical approach using Weibull probability distribution function and obtained an accurate relationship for power output between irradiation and the cells' back temperature. The calculated power output was compared to the measured and they obtained a high correlation coefficient. Bouzerdom et al., combined two models: the Seasonal Auto-Regressive Integrated Moving Average method (SARIMA) and the SVM. The hybrid model showed better prediction results. In (Paulescu et al., 2017), two advanced models for predicting the power output of PV cells were analyzed: a black-box Takagi-Sugeno fuzzy model and a physically inspired, semiparametric statistical model (Generalized Additive Model, GAM) based on smoothing splines. In (Baharin et al., 2016), a Support Vector Regression (SVR) method was used as well as ANN (nonlinear autoregressive), and these methods were compared to a benchmark model using persistence method. In (Yu and Chang 2011), a NN method was implemented using BP algorithms. Al-Bashir et al., employed a Multivariate Linear Regression (MLR) to forecast power output. Moslehi et al., examined various data collection and modelling scenarios for the prediction of the PV power production. In particular, the effect of exploiting measured (or calculated) cell temperatures on the predictability of the PV power production was studied.
So far, the temperature of the module has been underutilized, and few efforts have been made to implement it into the data-driven prediction model. In this work, the cell temperature is derived from ten physics-based models and each result is correlated with the power output, so that the best models will be determined. Afterwards, a validation of the results is carried out and the Root Mean Square Error (RMSE) will be compared to choose the best model. Finally, this model will be implemented in developing the Multiple Linear Regression (MLR) and ANN models for the PV power production prediction and evaluating their performances.
The performance of the prediction models is verified with respect to two standard metrics, namely RMSE and Coefficient of Determination (R 2 ).
The remaining of this paper is organized as follows. Section 2 states the PV power production prediction problem. Section 3 presents the ASU solar PV system case study. Section 4 describes the methodology proposed for investigating the effect of incorporating the cell temperature instead of the ambient temperature. Section 5 discusses the obtained results. Finally, some conclusions are drawn in Section 6.

PROBLEM STATEMENT
Let us consider the availability of the weather data (W) and the corresponding power productions (  P ) of a solar PV system for Y years. The former is assumed to combine the hourly values of three main variables: the global solar radiations  I rr will be, then, used to build/develop prediction models and the built-models are in need to be evaluated to verify the effectiveness of such a substitution.
The proposed prediction model is to use the Artificial NN (ANN) whose prediction capability is to be compared with the well-known benchmark MLR from the literature.

CASE STUDY
The solar PV power grid-connected system of the Applied Science Private University (ASU) of a capacity 264 kWp is presented in this Section. A brief introduction on the site is in order. ASU is a private university located in Amman (Capital of Jordan) at the coordinates 32°2'24.0324" N and 35°54'1.4328" E, latitude and longitude, respectively. The location of the PV cells are found atop the Faculty of Engineering building ( Figure 2). The PV array was at an angle of 36° pointing in the direction of southeast, and have a tilt angle of 11°. The inverters connected to the PV panels are of the SMA SUNNY TRIPOWER type and consist of 13 17000W inverters and one 10000 W inverter. The solar panels are of the Yingli Solar: YL 245P-29b-PC type, and those consist of polycrystalline structure (Applied Science Private University, 2019).
The existing weather station in the ASU campus located around 171 m from the Faculty of Engineering helped by tabulating and recording the weather conditions experienced by the PV system, and classifying them into 45 different variables (e.g., solar radiation, ambient temperatures, wind speeds), and gave values for these weather variables every hour for the past Y~3.5 years (i.e., May 16, 2015 to December 31, 2018), whereas the inverters connected to the PV panels recorded the corresponding power output delivered by the system (Applied Science Private University (ASU) 2019).
Among the available weather variables, some of them have been excluded from the analysis due to the facts that either their behaviour is constant during the Y~3.5 years study period, such as precipitation amounts, or they are irrelevant to the delivered PV power, such as soil surface and subsoil (−10 cm) temperatures, whereas the global solar radiation, ambient temperature at 1m level, and the wind speed at 10 m level have been recommended and utilized for building the prediction models as they have the large influence on the solar PV power productions . In addition to the before-mentioned considered weather variables, the time effect in day hour and number in a year has been also considered while building the prediction models because they represent the diurnal cyclic and the seasonal effects, respectively (Al-Dahidi et al., 2019).
All of the considered hourly weather variables together with the time stamp and the corresponding power productions are stored in the dataset matrix X that is used later on in Section 5 for the purpose of calculating the cell temperature, validating the calculated cell temperatures, building/developing the MLR and ANN prediction models, and comparing their performances.
The whole inputs (weather variables and the time stamp)-outputs (power productions) patterns are divided into (i) training dataset (X train ) (it contains N train = 15115 patterns (i.e., 50%) randomly selected from the 30229 inputs-outputs patterns available in the overall dataset matrix), (ii) validation dataset (X valid ) (it contains N valid = 7557 patterns (i.e., 25%) randomly selected from the remaining patterns available in the overall dataset matrix), and (iii) test dataset (X test ) (it contains N test = 7557 patterns [i.e., the remaining 25%]).
The three datasets will be used to build/develop the prediction models, optimize the models' architectures, and test/evaluate the effectiveness of the predictability of the two prediction models and compare their predictability when the ambient temperature is being replaced with the best obtained cell temperature, respectively.

METHODOLOGY
In this Section, we describe the methodology proposed for predicting the solar power productions of the ASU PV system. The proposed methodology is structured in three phases and is sketched in Figure 3. The proposed methodology amounts to calculate the ASU cell temperatures by using different physics-based models and validate the calculated values (Phase I -Section 4.1), build/ develop two different prediction models (Phase II -Section 4.2), and evaluate the built-prediction models (Phase III -Section 4.3). The models characterize the inherent relationship between the cell temperature, relevant weather variables, such as global solar radiation, wind speed, wind direction, and ambient temperature, and some other characteristics which depend on the PV cell technology under study (i.e., in our case study the polycrystalline silicon (p-Si)).

Phase I: Calculating and Validating the Cell
The different physics-based models adopted in this work are hereafter summarized. For more details on the PV cell temperature physics-based models, the interested reader may refer to (HOMER Pro. 2019; Schwingshackl et al., 2013).
• Standard PV cell temperature model This is the simplest physics-based model developed for estimating the PV cell temperature (Markvart, 2000). It calculates the cell temperature T cell 1 ( ) as a function of the ambient temperature (T amb ), solar radiation (I rr ), and other PV technology dependent characteristics (Eq. (1)).
where T cell,NOCT is the Nominal Operating Cell Temperature that depends on the PV technology under study whose value is taken at the solar radiation I NOCT = 800 W/m 2 , the ambient temperature T amb,NOCT = 20°C, and wind speed v = 1 m/s. This model is denoted as Model 1.
where η STC and β STC are efficiency and temperature coefficient of maximal power under Standard Test Conditions (STC), respectively, i.e., solar radiation of 1000 W/m 2 , ambient temperature T amb,STC = 25°C, and air mass of 1.5. τ and α are the transmittance of the cover system and absorption coefficient of the PV cells [%], respectively. h w,NOCT is the wind convection heat transfer coefficient for wind speed measured at NOCT conditions, ( where v f is the wind speed whose values are measured at 10m above the ground, whereas v w is the wind speed whose values are measured close to the PV module. The v w can be obtained from the v f through v w = 0.68 vf -0.5 (Loveday and Taki 1996;Schwingshackl et al., 2013). Other formulations of the h w (v) have been defined in (Sharples and Charlesworth, 1998) for the wind direction perpendicular and parallel to the PV module's surface as follows, respectively: The cell temperatures T cell 4 5 , ( ) obtained using the former equations (Eq. (5) and Eq. (6)) for the wind convection heat transfer coefficients are hereafter denoted as Model 4 and Model 5.
• Kurtz PV cell temperature model (Kurtz et al., 2009) This model is denoted as Model 6.
• Koehl PV cell temperature model This model calculates the cell temperature ( T cell 7 -hereafter denoted as Model 7) as a function of I rr , T amb , local wind speed (v w ), and other PV cell technology dependent constants (i.e., U 0 ,U 1 ) (Koehl et al., 2011): • Mattei PV cell temperature model (Mattei et al., 2006) estimated the cell temperature as follows: .
, ( ) are the heat exchange coefficients for the total surface of the PV module. Two different formulations for the UPV have been defined in (Mattei et al., 2006) for the U v PV 8 9 , ( ) and adopted in this work, they are: The obtained cell temperatures T T cell cell 8  9 and ( ) using the former two equations for the heat exchange coefficient U PV are hereafter denoted as Model 8 and Model 9, respectively.
• Homer PV cell temperature model Apart from the above-mentioned equations, another equation was used to determine the cell temperature T cell 10 ( ) taken from (Duffie and Beckman, 1991;HOMER Pro. 2019) and is hereafter denoted as Model 10: Thereafter, the different investigated models are used to estimate the cell temperature of the ASU PV system and, then, correlated with the PV output power to determine the most promising model to be used later in the analysis (Section 4.2). The numerical values and the application results are fully reported in Section 5.

Validating the obtained cell temperatures
A field trip to the ASU was carried out to validate the former findings and find the best model that represents the real values of PV cell temperature. A K-type infrared sensor was initially calibrated and then the readings of the cell temperature were taken at a five-minute interval for two hours. Due to the large number of PV cells available, the cells selected were random and the temperature of the module was measured at the top and bottom to get the average. For each interval, two modules were selected and the average was taken. The 24 results obtained were then compared to the theoretical value based on the physics-based models by calculating the RMSE) (Eq. (13) The lowest RMSE value indicates the goodness of the estimated cell temperature (hereafter denoted as T cell best ) obtained by most realistic physics-based model among the ten investigated models.

Phase II: Building the Prediction Models
Two different prediction models are here developed and later evaluated in terms of their prediction performances of the ASU PV power production (i.e., MLR (Abuella and Chowdhury, 2015) and ANNs (Hornik et al., 1989;Rumelhart et al., 1986) to study the influence of the cell temperature on the solar PV power production prediction.
The ambient temperature (T amb ) is replaced with the best obtained cell temperature T cell best ( ) and the overall dataset X′ is established that will be used to build/develop the prediction models.
A problem arises when the data is directly used due to the presence of missing values, therefore the data are pre-proceeded as follows Missing data were also found in the data for the T amb , I rr , and power productions due to malfunctioning measurement sensors at the weather station, as well as failure in the inverters. These values have been excluded from the analysis; 3. The final step before being able to properly utilize the data is to normalize the values of time stamp, irradiation, temperature (whether ambient or cell), wind speed and power. These datasets are made to be in the range of [0-1]. The normalization formula is in the form of (Eq. (14)): where X, X max , X min are the actual, maximum, and minimum values of the considered variables to be normalized.
It is worth mentioning that the data patterns of the early morning and late evening of each day (i.e., power values available in these periods are zeros) have been used to train/develop the prediction models but, they have been excluded from the evaluation analysis of the prediction models' effectiveness (Section 4.3). This is because the PV system owner is not interested in predicting the power output of PV cells during the early morning or night with no solar irradiance. The two prediction models adopted in this work are hereafter presented (Sections 4.2.1 and 4.2.2).

MLR
The MLR employs a mechanism with which it attempts to model a relationship between the inputs (independent variables), i.e., time stamp and weather variables, with the output (dependent variable), i.e., PV power, by fitting a linear model as per Eq. (15). Each value from the independent variables is assigned to a value of the output. In the least-squares method, the best-fit line is calculated by reducing the sum of the squares of the vertical deviations from each data point to the line.

P a a hr a d a I a T or T rr cell a mb
where P is the hourly PV power production, hr and d are the hour and day number time stamp parameters from the beginning of each year data, I rr and T cell (or T amb ) are the hourly solar global radiation and cell (or ambient) temperature, a 0 ,a 1 ,a 2 ,a 3 ,a 4 are the regression coefficients, and ∈ is the mismatch between the actual (true) and the predicted hourly PV power production of the PV system.
The Minitab (Minitab LLC. 2013) is used to define the optimal relation between the inputs and the output by estimating the regression model intercept and coefficients associated with each variable (Eq. 15). Afterwards, the best regression model function is used to predict the hourly PV power production values of the test dataset (X test ) based on the hourly inputs' values. The obtained results will be compared to the predictions obtained by the ANN prediction model.

ANNs
A brief explanation will be given for the inner workings of the ANN to aid in understanding the how it works. ANN is a method used for computers to mimic the real world behavior and make it learn by itself. Even though a computer on its own is fast and reliably solves our tasks, but it does not have the capability of solving if the user does not know the problem, or if the data used is incomplete or random. The ANN aids the computer in this regard. ANN was first proposed in 1958 by a psychologist and was meant to see how a human recognized objects and interpreted visual stimuli (Hornik et al., 1989;Rumelhart et al., 1986).

Just as the human brains are connected by the means of neurons
where the dendrites take information from other neurons whereas the axon shares the information, so does the ANN function (Hornik et al., 1989;Rumelhart et al., 1986). The ANN is split into three main categories: input layer, hidden layer, and output layer (Muhammad Ehsan et al., 2017). Figure 4 shows a very basic architecture of the ANN.
The schematic above serves to explain the mathematics behind the ANN. The input layer are the I = 5 inputs available in the training dataset (X train ) used to predict the output, these inputs could be just one or many depending on the application (i.e. in this work for the PV power production prediction, time stamp ( hourly wind speed (  v ) are used as inputs, whereas the hourly power productions (  P ) are used as outputs). Each i-th input is then connected to each h-th hidden neuron in the hidden layer (h=1,…,H) with a different weight (w i,h, i=1,…,I,h=1,…H). Initially the weights assigned to the connections are random and are changed with each iteration. A multiplication operation is performed such that the input value is multiplied to the weight given to that connection and added to an additional weight (hidden bias [b h ]) of the connection between the bias neuron and the corresponding hidden neuron, and then an addition operation gets carried out to add all the modified inputs that come to the neuron after they are multiplied with the weighted value. The hidden neurons are given an activation function g, which works by transforming the signal or the value coming from the input layer into another to be taken to the outer layer. Each activation function is more or less a graph where the value coming from the input layer is the x-value, and the value leaving the neuron is the respective y-value on the graph. Finally these values are sent to the output layer, multiplied with the weights of connections between the hidden neurons and the output neuron (w h,o, h=1,…,H,o=1), added to an additional weight (output bias [b o ]) of the connection between the bias neuron and the output neuron, ultimately all added together to give the final value typically via a linear activation function. This value is then checked with the actual power output and an error value is measured. From this value, the weights that were initially randomly assigned are readjusted and the process is repeated to get a more accurate result (i.e., the so called error Back-Propagation (BP) optimization algorithm) (Rumelhart et al., 1986).
In this work, different candidate numbers of the hidden neurons h candidate and different candidate hidden neuron activation functions g candidate are explored to establish an optimum version of the ANN architecture.

Phase III: Evaluating the Built-prediction Models
Once the prediction models are built using the training dataset (X train ), the prediction models are, then, evaluated on the test dataset (X test ), in terms of their prediction performances using two wellknown standard performance metrics from the literature, they are (Al-Dahidi et al., 2018;: (Eq. (16)) that computes the deviation between the actual (true) and the predicted power productions obtained by the two prediction models. The model with the smallest RMSE value means that it is effectively capable of capturing the hidden (unknown) mathematical relationship between the inputs and the output and, thus, of predicting the PV power productions accurately, and vice versa.
• Coefficient of Determination (R 2 ) [%] (Eq. (17)) that describes the variability in the outputs of the two prediction models caused by the considered inputs. A value of R 2 = 100% indicates that the variability in the prediction models' outputs have been fully justified by the considered inputs used to build/ develop the corresponding prediction models, and vice versa: lower R 2 values indicate that, in addition to the considered inputs, other variables need to be taken into account during the development of the prediction models to fully justify their prediction outcomes.
where P j and P are the j-th actual (true) and the predicted PV power production obtained by the two prediction models, j = 1,…,N test , N test is the overall test data patterns available in the test dataset (X test ), and P is the mean value of the obtained power production predictions.
The two considered metrics are calculated on the N test test data patterns for the two prediction models and the obtained values are, then, compared to each other. Furthermore, the performance gain ( where Metric Tamb and Metric T cell best are the two considered performance metrics calculated for each prediction model when the T amb and the T cell best are used in developing, optimizing, and evaluating the prediction models, respectively. Positive/negative values of the PG RMSE /PG R 2 indicate the benefits of exploiting the cell temperature instead of the ambient temperature, and vice versa.

RESULTS
In this Section, the application results of the proposed methodology of Section 4 ( Figure 3) on the ASU case study of Section 3 are here presented step-by-step.

Calculating the cell temperatures
The ten physics-based models investigated in this work are used to calculate the cell temperatures of the ASU solar PV system for the Y~3.5 years (i.e., 16 May 2015 to 31 December 2018) study period.
For p-Si modules of the ASU PV system under study, Table 1 reports the models' parameters values used to calculate the different cell temperatures (Duffie and Beckman, 1991;HOMER Pro. 2019;Mattei et al., 2006;Schwingshackl et al., 2013;Skoplaki et al., 2008). The cell temperatures obtained by the ten models are denoted as T cell 1 to T cell 10 .
Once the cell temperatures values are obtained, the correlations of these values with the PV power productions are calculated for each season of each year and for each year of the study period as shown in Figure 5 (top and bottom, respectively).
This variation can be justified by whether the wind speed (v w ) is considered in the physics-based models to calculate the cell temperatures or not (Section 4.1.1). Specifically: • Model 10 and Model 1 do not incorporate the wind speed to calculate the cell temperatures; • Model 6 and Model 7 directly incorporate the wind speed to calculate the cell temperatures;  • The remaining six models consider different formulations for the wind convection heat transfer coefficients (h w ) and the heat exchange coefficients for the total surface of the PV module (U PV ) to incorporate the wind speed in the calculations of the cell temperatures.
Considering the fact that the weather station is 171 m away from the ASU PV system under study, the available wind speed values might not be fully representative at the PV panels' locations and, thus, the inclusion of the wind speed in calculating the cell temperatures might lead to non-accurate cell temperatures (as we shall see in Section 5.1.2).
For clarification purposes for the importance of calculating the correlation values, Figure 6 shows Looking at Figure 6, one can notice that even though the irradiation was higher in Summer than in Spring, the power output in Summer was lower than that in Spring due to the higher ambient temperature in Summer with respect to that in Spring, and hence higher cell temperature. In addition, one can also recognize that the cell temperature T cell 10 ( ) has a higher correlation to the power output than the ambient temperature T amb ( ) .

Validating the obtained cell temperatures
For the 24 measured cell temperatures of the ASU PV system, the corresponding weather variables are recorded from the weather station at the ASU for the estimation of the PV cell temperatures by using the investigated ten models discussed earlier. These variables were the solar irradiation, ambient temperature at 1 m, and wind speed at 10 m.
Finally, the RMSE value is computed for each method to display which model has more accurate results. From Figure 7 it can be inferred that T cell 1 had the lowest RMSE (i.e., 2.834), and hence the best representation of the actual PV temperature T cell best ( ) . This temperature will be used to substitute the ambient temperature and establish the updated dataset X′.

Phase II: Building the Prediction Models
Once the updated dataset (X′) is established, it is used to build/ develop the MLR and ANN prediction models.

Building the MLR Model
With respect to the MLR, the MLR model is built using the training dataset to provide the solar PV power production predictions. The obtained linear regression models using either the T amb or the T cell are given by Eq. (19) and Eq. (20), respectively. It is worth mentioning that the inclusion of the time stamp (i.e., the chronological order of the hour and day number) in the MLR would not be representative in this case. In fact, if one would manipulate the time stamp to be used in the MLR, it would be correlated (and thus, excluded) with the solar irradiation variable (i.e., I rr ). However, in this case, the results obtained show that the predictability of the solar power production does not significantly change, which indicates that the MLR cannot capture the hidden "apparently non-linear relationship" between the inputs and the power output.
In fact looking at Eq. (19) and Eq. (20) one can notice that: • As the I rr increases, the power production increases due to the increase of energy incident on PV system. This has been effectively represented by the positive regression coefficient associated with the I rr variable; • As the T cell (or T amb ) increases, the power production decreases due to the significant decrease in output voltage compared to marginal increase in output current (Al-Bashir et al., 2020;Ba et al., 2018). This has been effectively represented by the negative regression coefficient associated with the T cell (or T amb ) variable; • as the v increases, the power production increases due to the cooling of the PV panels, and hence, decreasing the cell temperature. This has been effectively represented by the positive regression coefficient associated with the v variable; With respect to the ANN prediction model, the model is built (using the training dataset) and optimized (using the validation dataset) in the Matlab NN Toolbox TM (Demuth et al., 2009) in terms of number of hidden neurons, H and hidden neuron activation functions (g), to provide accurate solar PV power production predictions. Specifically, we follow an exhaustive search procedure by considering: 1. Twenty different numbers of hidden neurons that span the interval [2-40] with a step size of 2 for the ANN model development; 2. Twelve different activation functions, g = "Log-Sigmoid", "Tan-Sigmoid", "Linear", "Triangular Basis", "Radial Basis", "Elliot Symmetric Sigmoid", "Symmetric hard-limit", "hard-limit", "Positive Linear", "Normalized Radial Basis", "Saturating linear", and "Symmetric Saturating Linear" functions available in the Matlab NN Toolbox TM (Demuth et al., 2009); The effectiveness of each ANN architecture established by a combination of the above-mentioned corresponding choices, is examined by quantifying the predictions accuracy of the validation dataset (X valid ), using the RMSE (Eq. 16) and R 2 (Eq. 17) performance metrics. Specifically, a 5-fold cross validation procedure is used to robustly evaluate the ANN prediction performance in terms of the RMSE and R 2 : the training and validation patterns are sampled randomly from the inputs-output patterns available in the updated dataset (X′) with fractions of 50% (i.e., N train = 15115 patterns) and 25% (i.e., N valid = 7557 patterns), respectively. The cross validation procedure is then, repeated 5 times, using different patterns for training and validation datasets.
The final metrics values are then, computed by averaging the 5 metrics' values of the 5 different trials. Table 2 reports the modelling parameters of the optimum ANN architecture found at the smallest RMSE value, i.e., RMSE = 10.9784 kW (using the T cell best ) and 11.0150 kW (using the T amb ), and largest R 2 value, i.e., R 2 = 96.8593 % (using the T cell best ) and 96.8112 % (using the T amb ) on the validation dataset. For completeness, the obtained metrics found at H = 25 when the T amb is used are RMSE = 11.2532 kW and R 2 = 96.7079 %. This assures the improvement obtained in the prediction accuracy when the T cell best is being used instead of the T amb .

Phase III: Evaluating the Built-prediction Models
To demonstrate the effectiveness of replacing the T amb with the best obtained cell temperature T cell best (i.e., the use of the updated dataset X′ which contains the T cell best instead of the original dataset X which contains the T amb in developing the prediction models), Table 3 reports the average performance metrics obtained by the 5-fold cross validation using the prediction models for the case of using the T cell best instead of the T amb , on the test dataset, together with the computed performance gains. Table 3 one can easily recognize:

Looking at
• A small improvement in the prediction accuracy is gained by the ANN prediction model when the T cell best is used instead of the T amb . Specifically, an enhancement reaches up to ~1.93% and 0.11% on the RMSE and R 2 performance metrics, respectively. Despite the fact that these improvements in the  prediction accuracy might seem to be relatively small, but they can be considered relevant to the PV system owner; • The MLR model cannot exploit the benefits of using the T cell best instead of the T amb and yield the same predictability results; • The worse predictability obtained by the MLR compared to that obtained by the ANN is an indication of the existence of non-linearity relationship between the PV power production and the corresponding weather and time stamp variables that cannot be captured by the simple MLR, and nonlinear prediction models are required.
For completeness, possible sources of errors can be avoided in the future when using the PV cell temperatures to further enhance the obtained improvements in the prediction accuracy, they are: • During the validation phase of the PV cell temperature models, the PV panels were measured randomly and due to the fact that the PV cells have different conditions (i.e., some are shaded, some have more access to wind, etc.), this could have led to taking a wrong model; • The weather station is 171 m away from the faculty of engineering and, thus, the weather variables used in calculating the cell temperatures, such as the wind speed, could have contributed to the error.

CONCLUSIONS
In this work, the effect of the PV cell temperature for predicting power output has been investigated and the results have been compared to those when ambient temperature has been used instead. The case study was taken at the Applied Science Private University (ASU) where 264 kWp PV panels are installed on the roof of the faculty of engineering, Amman, Jordan. Two main prediction models have been utilized, namely MLR and ANNs. The best performance metrics (the RMSE and Coefficient of Determination) have been exhibited by the ANN and this is mainly due to its capability in capturing the hidden (unknown) nonlinear relationship between the inputs-output patterns of the ASU solar PV system. The results displayed the importance of adopting the PV cell temperature in the prediction models, however more research is needed to find a better way of measuring it.
Finally, some possible improvements for the future. Placing K-type infrared sensors on the back of the PV panels to constantly measure the temperature of the module. The use of ANN may be limiting and, therefore, using another Machine Learning (ML) approaches such as Long Short-Term Memory (LSTM) may aid in power prediction and improve the performance metrics.