Electricity Price Fundamentals in Hydrothermal Power Generation Markets Using Machine Learning and Quantile Regression Analysis

A hydrothermal power generation market is characterized by a strong dependence on water reservoir capacity and fossil fuel sources, which causes differences in generation marginal costs and high variability of the electricity spot price. Therefore, this study proposes an empirical approach to identify the price determinants and their effects on price dynamics. This paper presents two methodologies: a machine learning approach and a quantile regression analysis. The first method is used to validate the price determinants through a prediction process, and the second, the quantile regression, to identify the non-linear effects. The most important factors observed are total market demand, water reservoirs capacity for generation, and fossil fuel consumption. The results offer a new perspective about the market structure and spot price volatility.


I. INTRODUCTION
The different reforms in electricity markets defined electricity as a commodity, which can be sold, bought, and traded in a market (Berrie and Hoyle, 1985). However, its storage limitations make the market price shows characteristics such as seasonal patterns, high volatility, mean reversion, price spikes, and others Huisman and Mahieu, 2003). Besides, modeling the price dynamic requires understanding its asymmetric distribution, high dispersion, and serial correlation (Ciarreta et al., 2011). Therefore, analyzing and predicting the spot prices is a challenge for academics and market agents.
On the other hand, the market structure and generation technologies are fundamentals factors in the price formation. Based on a particular case of a hydrothermal power generation market which presents: (i) significant differences in the marginal costs of the generation sector; (ii) a small renewable generation capacity; (iii) a strong dependence on exogenous variables as fossil fuel prices and climatology factors; and, where (iv) the risk and uncertainty are higher for market agents, it has been observed that these features cause further increased in price variability (Mosquera-López et al., 2017a;Fernández-Blanco et al., 2017;Cotia et al., 2019). Hence, it is relevant to recognize the determinants that explain the electricity price behavior in this market structure.
For this reason, the objective of this study was to identify the economic and technological fundamentals in the hydrothermal power generation market. Also, it was sought to evaluate fundamentals effects on spot price dynamic. For the empirical analysis, the Colombian electricity market was selected. Moreover, the methodology applied in this analysis was divided into two: a machine learning approach and a quantile regression analysis. First, a gaussian process regression (GPR) model was trained to validate the determinants and compute the spot price prediction for the next 6 months of the dataset. This method identifies complex patterns in a large volume of data and reviews the data to predict future behavior (Castelli et al., 2020;Díaz et al., 2019;Gonzalez-Briones et al., 2019;Imani et al., 2020;Ribeiro et al., 2020). Second, a quantile regression model was fitted because it allows modeling electricity prices seasonality and quantifying the non-linear effects of determinants (Ma and Koenker, 2006;Maciejowska, 2020;Mosquera-López et al., 2017b;Uribe and Guillen, 2020).
According to Aggarwal et al. (2009) and Girish and Vijayalakshmi (2013), the spot price determinants were grouped into five categories: (i) market characteristics, (ii) fundamental factors, (iii) operation factor, (iv) strategic factors, and (v) historical factors. In the first group, it was identified variables such as energy supply and demand, electricity exports/imports, market-clearing quantity, and energy policy (Deng and Oren, 2006;Mandal et al., 2007;Mosquera-López and Nursimulu, 2019;Zhang et al., 2008). In the second group, the fundamental factors considered were price volatility, fuel price, weather factors, and hydrological conditions. By contrast, operational factors describe fundamentals as a system load rate, electricity production (deficit/surplus), energy sources (nuclear, hydric, or thermal), line status and limits, and power transmission costs (He et al., 2010;Rodriguez and Anders, 2004;Zhang et al., 2008). Meanwhile, strategic factors correspond to energy purchasing agreements, bilateral contracts, bidding strategy, and market design (Crespo-Cuaresma et al., 2004;Kian and Keyhani, 2001;Rodriguez and Anders, 2004). Finally, in the fifth group, it has been identified that past observations of variables as demand and supply, hydric reserves, and electricity price affect the present spot price dynamic (Ciarreta et al., 2011;Mandal et al., 2007).
However, and based on the power generation structure selected, the results of the empirical application described that total market demand, water reservoirs capacity for generation, and fossil fuel consumption, are the most relevant determinants of the spot price. Also, this paper provides a new contribution in terms of market structure analysis and a new perspective of the spot price distribution.
The paper is structured, after section 1, as follows: section 2, it is described the structure of the Colombia electricity market. Section 3 presents the empirical methodologies, and, in section 4, the dataset is described. In section 5, the results are reported, and section 6 presents the conclusions.

COLOMBIAN ELECTRICITY MARKET
Since 1990, the Colombian energy sector has presented relevant reforms. García et al. (2011) described that the liberalization process allowed an improvement in the electricity market by introducing competition in different sectors, and hence, abolish the limitations of the vertical structure. Besides, the wholesale energy market (WEM) was created under a regulatory framework, and its operation is through a trade spot structure. However, the electricity sector presents limitations such as a low generation capacity and high demand, which do not allow structuring a competitive market, and electricity prices cannot capture the relationships between the supply and demand (Barrientos et al., 2012).
On the other hand, Colombia is part of a region with a lot of hydric sources. According to International Energy Agency (IEA) statistics, in 2018, approximately 86% of power generation in Central and South America was through hydric and thermal generation. Therefore, Colombia is part of these hydrothermal generation systems, where hydroelectric power generation represents 68% and thermal power generation (gas, coal, and liquid) 31% ( Figure 1). While, renewable sources do not have a representative value in the power generation matrix (0.21%).
Due to hydrothermal power generation dependence, the Colombian electricity sector presents a high vulnerability by two exogenous factors: El Niño-Southern Oscillation (ENSO) and energy fossil price fluctuations. Figure 2 shows the daily spot price dynamic for the period 2000-2019, and significant effects of ENSO were observed in four periods during 2003 and 2014; however, the most important shock was observed between 2015 and 2016, where the price reached a maximum peak, and the gas prices increased considerably. Besides, the thermal generation sector did not have an economic guaranty to cover the demand 1 ; hence, the state intervened in the market to avoid rationing (Botero-Duque et al., 2016;Montes, 2018). 1 Thermal generation is a backup source for hydropower generation in two specific moments: high demand or low water reservoir levels.

METHODOLOGY
Two approaches were considered to analyze the fundamentals of the electricity spot price in a hydrothermal power generation market. First, a machine learning approach was used, through a GPR, to fit a multivariable model to predict daily electricity price and validate the importance of variables considered; second, a quantile regression model was fitted to evaluate the effects of these predictors on the electricity price dynamic.

Gaussian Process Regression Models
According to Rasmussen and Williams (2006), and The Mathworks (2020), the GPR models are nonparametric kernel-based models of supervised learning, used for regression analysis and probabilistic classification. These models capture uncertainty and allow predictions where the data have unknown distributions. Besides, the GPR is a powerful method to perform Bayesian inference, and it is much better when the availability of the data is a problem (Aye and Heyns, 2017;Gonzalez-Briones et al., 2019).
A training set is defined as {(x i ,y i );i=1,2,…,n}, where x i ∈R d and y i ∈R, and have an unknown distribution. Based on a linear regression model, a GPR model predicts the response variable by introducing latent variables, f(x i ),i=1,2,…,n, from a gaussian process (GP), and explicit basis function, h.
A GP is defined by its mean function, m(x), and covariance function, k(x,x'). If {f(x),x∈R d } is a GP, then E(f(x))=m (x) and (x,x´). Therefore, it considers the following model: Where f(x)~GP(0,k(x,x´)), i.e. f(x) is zero mean GP with covariance function k(x,x´). Besides, h(x) is a set of basis functions that project the input x into a new p-dimensional feature space vector (R p ) and β is a px1 dimension vector of basis function coefficients. This is a representation of GPR model and the response variable can be described as: Therefore, a GPR model is a probabilistic model. Furthermore, the GPR model is nonparametric model because of the observation x i has a latent variable f(x i ).
The joint distribution of latent variable f(x 1 ),f(x 2 ),f(x 3 ),…,f(x n ) in the GPR model is P(f|X)~N(f|0,K(X,X)), close to a linear regression model, where K(X,X) is the covariance function and can be parametrized by a set of kernel parameters, θ. Hence, k(x,x') is written as k(x,x' |θ) to explicitly indicate the dependence on kernel parameters.

Kernel function options
The kernel parameters are based on the signal standard deviation σ f and the characteristic length scale σ l . The characteristic length scales define the distance between the input values x i and response values to become uncorrelated. The standard deviation and the characteristic length scale must be greater than zero, given θ 1 =logσ l and θ 2 =logσ f .
The following four built-in kernel function with the same length scale were considered: • Rational quadratic Kernel where σ l is the characteristic length scale, α is the positive-valued scale-mixture parameter, and r is the Euclidean distance between x i and x j . Source: XM information system.
• Squared exponential kernel where σ l is the characteristic length scale and σ f is the signal standard deviation.
where σ l is the characteristic length scale and r is the Euclidean distance between x i and x j .

Parameter estimation
To estimate the parameters β, θ, and σ 2 of a GPR model, the likelihood P(y|X) must be maximized as a function of parameters: Because, P(y|X,β, θ, σ 2 )=N(y|Hβ,K(X,X|θ)+σ 2 I n ), the marginal log-likelihood function is as follows: where, H is the vector of explicit basis functions, and K(X,X│θ) is the covariance function. To estimate the parameters, first, 2 ( , ) β θ σ is determined and its estimation is used to compute the β-profiled likelihood. Second, the β-profiled log-likelihood is given by logP y X β θ σ θ σ , where it maximizes the β-profiled log-likelihood over θ, σ 2 to find their estimates.
Finally, during the estimation process, principal component analysis (PCA) was applied to avoid multicollinearity and dimensionality problems.

Response variable forecast
To predict a value of a response variable y new , given a new input vector x new , and the training data, it is defined the density P(y new |y,X,x new ) by conditional probabilities: To find the joint density in the numerator, it is necessary to introduce the latent variables f new and f corresponding to y new , and y, respectively. Thus, it is possible to use the joint distribution for y new , y, f new , and f to compute (9). The GP models assume that each response only depends on the corresponding latent variable f i and the feature vector x i .
After we found the density P(y new |y,X,x new ), the expected value of prediction y new at a new point x new , given y, X, and parameters β, θ, σ 2 is:

Performance indicators
To check the GPR model performance, different calibration metrics were used such as root mean square error (RMSE), R-squared (R 2 ), and mean absolute error (MAE). These metrics are described in the following: where n is the number of observations, y i is the i-th observed value, and ˆi y is the i-th predicted value. For RMSE and MAE lower values are desired, and for R 2 , a closest value to one shows a better performance. Besides, the performance metrics of the estimated GPR model were compared with two supervised learning models: Support vector machines (SVM) and Tree-based methods. The performance metrics are described in the result and discussion section.

Quantile Regression Approach
The quantile regression is a semi-parametric approach, with high flexibility that captures the stochastic relationship between variables, allows consistent estimation in non-Gaussian environmental, and requires a minimal distributional assumption on the data generating process (Koenker, 2004;Ma and Koenker, 2006;Uribe and Guillen, 2020). To describe the quantile regression model, a linear regression model was assumed, where the response variable Y i,t represents the electricity spot prices and is related to a set of explanatory variables or fundamentals in a matrix X i,t . Following Koenker and Bassett (1978), the quantile regression model can be written as: where, Y i,t is a (Tx1) vector, with T denoting the number of observations (t=1,2,3,…T). Besides, the matrix X i t , ' of dimensions (Txd), has (d-1) predictors that also includes a constant, and β q is a (dx1) vector of unknown parameters for each quantile q, q∈(0,1). The regression coefficients ˆq β of the quantile q were estimated as a solution to the following minimization problem: (17) and must be computed in separate regressions for each i, i=1,…N. According to Mosquera-López et al. (2017b), and Uribe and Guillen (2020), the quantile regression is a special case of the least absolute deviation estimator (LAD), that allows robust estimations when the data present heavy tails as for electricity spot prices.

DATA
The fundamentals of spot price are determined by the generation technologies. For example, in Central and South America, the generation is based, principally, on hydroelectric and thermal power sources. In this cases, different studies have described the following determinants: demand, hydrology changes, fossil fuel price variation, investment decisions making, the structure of the transmission system, and agent strategies (Barria and Rudnick, 2011;Barrientos-Marín and Toro-Martínez, 2017;Blazsek and Hernández, 2018;Samudio-Carter et al., 2019;Vaca et al., 2019;Xavier et al., 2016). Therefore, the first database contained variables such as (i) total demand: real, commercial, and National Interconnected System (NIS); (ii) reservoir levels: daily volume in percentage and generation capacity; (iii) climatology factors as quantity of water that fuel reservoirs; and (iv) fuel fossil consumption: gas, coal, fuel oil, and kerosene. On the other hand, variables as the bilateral bidding price, electricity imports/exports, or the price regulatory policies were not selected due to the spot price is contained in their structures or missing observations were identified.
According to the variables described, finally, the correlation analysis was used to select the spot price determinants. Besides, considering the capacity of generation (Figure 1), the volume of water available in the reservoirs and the consumption of fossil fuels from two of the most important sources, gas and coal, were selected. Also, NIS demand was chosen because this variable is calculated based on the net generation of the plants.
These variables were chosen due to they allow the structure of a parsimonious model characterized by describing a classic supply and demand model. The dataset applied in this research represents the market structure and seeks to explain the spot price dynamic. Table 1 shows the variables, specifying data source and units.
In summary, the database is a balanced panel composed of daily data that starts in August 2009 and ends in December 2019. The period was determined because of the availability of data with no methodological changes, and the current supply scheme for the generation sector is included (Creg 051 of 2009, article 10). Likewise, 2020 data were not selected because regulated and nonregulated demand decreased by 4.2% and 12.9%, respectively, during the first quarterly by the SARS-CoV-2 (COVID-19) pandemic (Vidal et al., 2020). 2 Table 2 reports summary statistics and unit root test (augmented Dickey-Fuller -ADF) of the variables and Figure 3 describes their dynamics during the sample period. The spot price presents a high variability and dispersion, especially in the last quartile due to ENSO effects during 2015 and 2016, where the price increased to 1943 COP$/kWh. Then again, the demand has a dynamic growth and shows a correlation of 0.26 with the price, which is positive and weak, despite the demand is a significant price determinant. Regarding water volume, it was observed a high variability by seasonal patterns and a negative correlation with price. Likewise, gas and coal are sources used to supply the demand when the hydropower system presents any limitation. Hence, these variables have a high dispersion in the last quartiles 2. COP is the representative sign of the Colombian peso.  t-ADF −4.63*** −8.53*** -3.70** −4.10*** −6.06*** ** and *** indicates that null hypothesis of a unit root is rejected at 5% and 1% level, respectively.
Source: Authors' analysis and a significant and positive correlation with the price. Finally, ADF test was computed, and the result shows evidence against the presence of unit root in the variables for a 1% and 5% level of confidence. Therefore, the variables do not require stationary transformation before the estimation. Figure 4 summarizes the machine learning methodology through the variable set described. First, the dataset was imported from XM Information System, explored, and processed to find their descriptive statistics and identify their characteristics. In general, the variables did not transform, except for the spot prices due to outliers observed during the 2015-2016 period. Spot price outliers were filled through the Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) to avoid their effects in the prediction process and a possible overfitting.

Determining, Training and Testing Set for Machine Learning Approach
Second, a training set is used to train the model, while a validation set is used to evaluate the model performs with the dataset by the performance indicators, and a final test is used to confirm the model specification and identify overfitting. Therefore, holdout method was used to divide the dataset into three parts: train (65%), validation (15%), and test (20%) sets 3 . In this process stage, 3 For train, validation, and test sets, the period August 2009-July 2019 was used.
the response of the variables and their predictors were defined. According to the GPR model described in equation (2) where, the response variable y is the spot prices and the vector X has the fundamentals: demand, water volume, and gas and coal consumption.
Third, the best models were identified through performance indicators and the prediction of the daily spot price for the period August 2019-December 2019 was implemented.

Determining Quantile Regression Model
Based on equation (17), the linear quantile regression model can be written as a function of the response variable and their predictors: where, P t is the response variable, spot price, while D t is the demand, W t is water volume, and C t is the total gas and coal consumption.
For estimating the quantile regression model, the period August

EMPIRICAL RESULTS AND DISCUSSION
The main findings are presented below for the variables and timespan selected. First, the machine learning training results and performance metrics are described. Then, the daily spot price prediction is shown. Second, in this section, the results of the quantile regression analysis are described to identify the effects of the main determinants on the spot price.

Machine Learning Results
The performance of the GPR model fitting is assessed using RMSE, R 2 , and MAE metrics 4 . Besides, the GPR model was compared with the support vector machine, which is categorized as a supervised learning method for the application of regression and classification. This method is based on determining hyperplanes that maximize the margin between classes (Gao et al., 2008). The following SVM kernels were used: • Quadratic • Cubic • Gaussian: fine, medium, and coarse.
On the other hand, tree-based methods were considered due to their fast for fitting and prediction, low memory usage, and ease of interpretation. Therefore, the models used were: fine, medium, and coarse, for tree regression.
Besides, the training process is computed through PCA. Therefore, it was observed that models were estimated through the first two principal components due to these factors explained 98% of the variation of the selected determinants.
Tables 3-5 describe the metric performance for different fitting models and the kernels selected. Based on all performance metrics, the results show that the GPR Exponential performs better. In general, good performance was observed for the GPR models because the metrics for the three sets used were similar, in contrast to the SVM models that present a significant difference in the RMSE between the train and the other two datasets. Therefore, this leads us to conclude the possibility of overfitting in the SVM models. However, the SVM models presented a similar performance in validation and test sets in the MAE metric, this could suggest that the models still have a good predictive process. Then again, some differences were observed in tree regresion metrics; but the Medium and Coarse models presented a similar RMSE and MAE during the train, validation, and test sets. Finally, the R 2 shows the percentage of the dependent variable variation that explain by the model, but some of these models are not linear, so the use of this indicator may be subject to criticism (Díaz et al., 2019).
According to Barrientos-Marín and Toro-Martínez (2017), another performance indicator is the mean absolute percentage error (MAPE). This metric describes the relative absolute deviation in 4 RMSE and MAE metrics value in COP$/MWh. per unit value. For each of the GPR models, the MAPE is 21%, for SVM models, the lowest MAPE is 21% for fine and medium Gaussian kernels through the test dataset. Likewise, the Coarse for tree regression has a MAPE equal to 22%.   In summary, it was observed better performance metrics for the GPR models, especially the GPR exponential model. These models provide predictions for a given spectrum and a predictive distribution that allows computing the first and second moments: the mean and the standard deviation. Likewise, the kernels that provide rankings of the input variables or variance estimation of the data noise. Hence, the GPR models offer an alternative for analyzing a variable that presents mean reversion, spikes, and seasonal patterns.
Therefore, the daily prediction was computed through this model for the period from August 2019 to December 2019 ( Figure 5). The dynamics of the spot price generated by the selected predictors were observed and it was conclude that the model allows a good approximation for lower prices, i.e., under 250 COP$/kWh. However, the prediction has not reached the true value for high prices, especially during the period from September to November.
Barrientos-Marín and Toro-Martínez (2017) described for the Spanish market, an asymmetry response between the high and low prices. When the price is high, the model does not believe that prices will be higher. Likewise, when the price is low, the model is not confident that prices will be lower. Therefore, the authors explained that their model could capture the agent behavior, who submit bids with low prices to compete. Nevertheless, Weron (2014) and Ziel (2016) described there is not a standard structure for the electricity markets. Hence, it is not possible to make a comparison between markets and performance metrics for machine learning approach.
By contrast, the average spot price during July 2019 was 123.57 COP$/kWh, and during October 2019, the price reached an average of 390.4 COP$/kWh. A reduction in hydric sources during August and September could explain the high price increase; however, the water reservoirs had a percentage of 74% and 67% in August and September, respectively. Besides, the water reservoir percentage in October and November was approximately 69%. Therefore, the generation concentration index or an oligopolistic indicator must be considered because the hydropower generation tries to make speculations when the water sources decrease and, thus, increase the price in the following months (Aggarwal et al., 2009;Zhang and Luh, 2005).

Quantile Regression Results
For estimating the quantile regression model, the complete sample was used: August 2009-December 2019. Likewise, the response variable was not transformed by outliers due to quantile regression models are robust to these data and according to Uribe and Guillen (2020) the financial time series presents crises and booms with high or low observations. Figure 6 describes the spot prices' quantile against the corresponding fraction of data. A low spot price was observed for the lower quantiles, approximately equal to 35 COP$/kWh, and around 147 COP$/kWh for the median price. From the lower to higher quantiles, a smooth increase was identified; however, after 85% quantile, the price presents a sharp peak related to exogenous effects during 2015 and 2016.
The linear model described in (21) was estimated for different percentiles of the distribution of electricity prices, i.e. from the 10 th to the 90 th percentile. Furthermore, the gas and coal consumption were added to analyze the proportion of fossil fuel consumption due to these two variables are around 22% of total generation capacity. The main results are summarized in Figure 7 and the quantile regression coefficients are presented in the Appendix A.

Effects of the determinants variables of the electricity
spot price for different percentiles 5.2.1.1. Demand effects The sensitivity to changes in demand is positive and significant statistically, but its effects vary over the different spot price quantiles. In the 10 th percentile, where the price is low, the demand presents a high impact, e.g. for a demand variation of 1%, the price variation is approximately 2%. However, around the 20 th to the 50 th percentile, the demand impact decreases significantly. For  prices in the 60 th and the 80 th percentile, the demand tries to stabilize, but for prices over the 80 th percentile, the effect associated with variation in demand is lower. Therefore, with a demand variation of 1%, the price variation is equal to 1.4%. Given the inverse relationship between prices and demand, its impact is less on high quantiles.
According to Barrientos et al. (2012), Barrientos-Marín and Toro-Martínez (2017), and García et al. (2011), demand is one of the most relevant spot price determinants. It is concluded that the price has a positive trend in the future by a positive demand shock. However, the effect is higher in the short-term. Besides, the price captures the complex effects of supply and demand Source: Authors' analysis. generation plants must turn on, affecting the price. By contrast, the elasticity of the water volume or reservoir capacity is negative, with increased impact on lower and higher quantiles. That is, seasonal patterns of reservoirs cause a strong price fluctuation, e.g., each rainy season, the spot price decrease significantly. An important aspect is the generation sector's influence on the price by future speculation of water volume; for this reason, it must be added a fundamental that captures the oligopoly structure.
Positive elasticities were found for fossil fuel consumption. It was revealed how gas and coal increased the price significantly on last quantiles. Exogenous effects such as dry seasons or the demand changes, increase the spot price through generation costs.
Therefore, it has described how the magnitude changes in fundamental variables in a hydrothermal power, explain the electricity spot price. The effect of reservoir changes represents the main risk factor for generators. Besides, the generation sector faces risk by fossil fuel price fluctuation; hence, they cannot recover the costs through the electricity price increases. Likewise, this study allowed identifying the importance of renewable energy because they can become a smoother of the volatility prices and prevent their extreme changes caused by exogenous effects.
Finally, to improve the model prediction it will be required the inclusion of generation concentration index or agent strategies. However, the model can serve as a point of reference, given the hydrothermal generation sector characteristic and exogenous factors that explain the electricity price dynamics.

Water volume effects
The elasticity of the water volume is negative and significant statistically, independent of the quantiles. Therefore, an increased impact of water volume sensitivity was observed on lower and higher quantiles, i.e. when the water reservoir capacity is high, it always leads to a reduction in the electricity prices. In the first quantile, the price is low by a high water volume. It was observed that a water volume variation of 1% causes a price variation equal to −0.41%. In the last quantiles, the impact is higher because water volume becomes the most important source and an alternative to reduce the spot price when the thermal plants are on. In the 20 th -70 th percentiles the effects measured by quantile regression are similar to the median effects.
Hydraulic technology presents lower generation costs than thermal technology. However, hydric sources are high uncertainty to the energy and market reliability. Given the seasonal patterns in hydric sources, the electricity spot prices are lower in the rainy season and higher in the dry season (García et al., 2011). According to Barrientos-Marín and Toro-Martínez (2017) a positive effect on the available hydric capacity causes a negative real price. Likewise, hydropower generation depends on the future situation (or not observable); hence, this sector tries to influence on the spot prices.

Fossil fuel consumption effects
Positive and significant elasticities were observed for fossil fuel consumption. Around the 10 th percentile, the effects are minor, but for prices over the 40 th percentile, the effects are becoming higher. This means that the thermal plants must turn on by a decrease in water volume or an increase in demand and, as a result, the generation costs and spot prices increase. In the 90 th percentile, the price variation is approximate 1.25% when the fossil fuel consumption is 1%.
According to Mosquera-López et al. (2017a), when the thermal generation plants are on, they present marginal costs of up to 300%, higher than hydropower plants. Therefore, the marginal generation costs show a relevant difference between the two most important generation technologies, which explains the price fluctuations.

CONCLUSIONS
Considering the Colombian power generation market structure, where hydropower generation is the most relevant source, followed by thermal power technology, a set of market fundamentals was validated through a price prediction using a machine learning trained model. Besides, by using quantile regression, the non-linear effects of these variables on the spot price were measured. In the sensitivity analyses for the different variables across the price distribution, it was observed how the demand, the water reservoir capacity, and the fossil fuel consumption influence the price.
Therefore, positive changes were observed in the spot price through demand variations. When the electricity consumption increases, all generation technologies must produce to meet demand. However, if the demand is not cover, the thermal power  Table A.I shows the quantile regression coefficients from 10 th to 90 th percentiles. All coefficients are significant statistically for a 1% level of confidence.