Abstract

The price of Brent crude oil is very important to the global economy as it has a huge influence and serves as one of the benchmarks in how other countries and organizations value their crude oil. Few original studies on modeling the Brent crude oil price used predominantly different classical models but the application of machine learning methods in modeling the Brent crude oil price has been grossly understudied. In this study, we identified the optimal MLMD (MLMD) amongst the Support Vector Regression (SVR), Random Forest (RF), Artificial Neural Network (ANN), and Deep Neural Network (DNN) in modeling the Brent crude oil price and also showed that the optimal MLMD is a better fit to the Brent crude oil price than the classical Autoregressive Integrated Moving Average (ARIMA) model that has been used in original studies. Daily secondary data from the U.S. Energy Information Administration were used in this study. The results showed that the ANN and DNN models behaved alike and both outperformed the SVR and RF models and are chosen as the optimal MLMDs in modeling the Brent crude oil price. The ANN was also better than the classical ARIMA model that performed very poorly. The ANN and DNN models are therefore suggested for a close monitoring of the Brent crude oil price and also for a pre-knowledge of future Brent crude oil price changes.

Keywords: Brent crude oil, Support vector regression, Random forest, Artificial neural network, Deep neural network, Machine learning.

JEL Classification: C10; C45; C50; C52; Q40.

Received: 24 June 2021 / Revised: 29 July 2021 / Accepted: 20 August 2021/ Published: 13 September 2021

Contribution/ Originality

The paper’s primary contribution is to model the Brent crude oil price using different MLMDs and to show that the optimal MLMD performs better than the classical ARIMA model used in most original studies to model the Brent crude oil price.

1. INTRODUCTION

Crude oil is one of the most used energy resources in various economic activities and different crude oils are available in the oil market and these crude oils have different features and prices. Some of these crude oils are Dubai crude, West Texas Intermediate (WTI), Oman crude, Shanghai crude, Europe Brent crude (also called Brent crude), and Bonny Light crude. The Brent Crude has gained popularity and under usage because it is both sweet and light due to its low relatively sulfur content and low density, hence, making it easy to refine into diesel and gasoline. The quantity of sulfur present in the crude oil determines the amount of processing that is needed to refine the oil and since Brent has less than 1% sulfur, processing it is less tasking (EIA, 2021). The North Sea in the Atlantic Ocean is where the Europe Brent is extracted from and it is bound by the United Kingdom, France, Norway, Netherlands, Denmark, Belgium, and Germany. The North Sea is where Brent crude oil was discovered in 1859 though commercial exploration did not start then, it started in 1966. The exploration activities experienced a huge increase in the early 1970s and this resulted to the construction of the first oil pipeline in 1975 (CFI, 2021). Europe Brent crude oil is ubiquitous and is used as a benchmark in determining the price of about two-third of other crude oils (EIA, 2021). Unlike the West Texas Intermediate (WTI) crude oil that is produced in landlocked areas and increases the transportation cost significantly, the Brent crude oil is produced near the sea and this significantly reduces the cost of transportation of the crude oil (EIA, 2021). The Brent crude oil and WIT are the two majorly traded crude oil in the world. The Brent Crude as one of the major oil benchmarks is the most traded of all of the oil benchmarks. The price of Brent crude oil is influenced by different factors. Some of the factors that influence the change in price of Brent Crude oil are crude oil supply-demand stability, oil production levels, and geopolitical issues in global oil markets. The price of Brent crude oil is of economic importance as it affects significantly the way other countries and organizations value their crude oil and it also has a significant impact on the global economy. Brent crude oil price is one of the benchmarks for crude oil prices, hence fluctuation in the price of Brent crude oil may affect the price of other crude oils negatively or positively which also affects world trade. The effect of the oil price fluctuations will affect the cost of production of goods, economic developments and investments thereby causing a negative or positive impact on various economic activities. In recent years, price of crude oil has become volatile, leading to unfortunate fiscal planning outcomes in some countries with severe consequences (Asaolu & Ilo, 2012). A close monitoring of the oil and gas stocks (using robust and sensitive models) as well as appropriate policies will help curtail volatility (risk) associated with oil price shocks. Few original studies (AlـGounmeein & Ismail, 2021; Mensah, 2015; Xiang & Zhuang, 2013) have tried to keep a close monitoring of the Brent crude oil price using different linear and classical models but the utilization of different MLMDs in modeling the Brent crude oil prices has been understudied to a large extent. This gap in the knowledge of the utilization of MLMDs to model the Brent crude oil price necessitated the present study. In this study, species of MLMDs (SVR, ANN, RF and DNN) were considered to obtain the optimal MLMD in modeling the Brent crude oil price. In addition, the new study demonstrated that the optimal MLMD is a better fit to the Brent crude oil price than the classical ARIMA model that has been used by researchers. The optimal model can be used to predict the Brent crude oil price for many years. The plot of the price of Brent crude oil from 7th June, 2016 to 7th June, 2021 is given in Figure 1.

Figure-1. The plot of the price of Brent crude oil from 7th June, 2016 to 7th June, 2021.

2. LITERATURE REVIEW

The important role Brent crude oil plays in the oil market and global economy has made researchers to try different methods to model the Brent crude oil price. Mensah (2015) examined the Brent crude oil price using the ARIMA method. They used the first seventeen years to train the model and the last three years to confirm the accuracy of the forecasts. The ARIMA (1, 1, 1) model is the best when compared to other orders of the ARIMA model. Abdollahi and Ebrahimi (2020) used the autoregressive fractionally integrated moving average, adaptive neuro fuzzy inference system and the markov-switching models to model the Brent crude oil price. They were motivated due to the prodigious impact the crude oil price has on various economic sectors. They compared the models using the RMSE, MAPE and Diebold-Mariano test; and discovered from the numerical results that a hybrid model of the selected methods weighted by genetic algorithm performs better than the other models.

AlـGounmeein and Ismail (2021) also modeled the Brent crude oil price. They used two different types of the Generalized Autoregressive Conditionally Heteroskedastic (GARCH) model – the Standard and functional GARCH models – to study the volatility of the Autoregressive Fractionally Integrated Moving Average (ARFIMA). They also compared two hybrid models ARFIMA-sGARCH and ARFIMA-fGARCH in modeling the Brent crude oil price. Their results show that the accuracy level of the two hybrid models is the same using the RMSE value. Xiang and Zhuang (2013) fitted the ARIMA model on the Brent crude oil price and the ARIMA (1, 1, 1) model was chosen as the best as it had a good prediction effect.

3. METHODS

The data on Brent crude oil prices used in this research are secondary data from the U.S. Energy Information Administration, retrieved from Federal Reserve Bank of St. louis (FRED). The data are daily data from 7th June, 2016 to 7th June, 2021. To ensure the optimal machine model is obtained and the model does not over fit the data and perform poorly on new dataset, the data were split into two groups – the training and test sets. The training set (in-sample period) is 0.70 of the data while the 0.30 left is for the test set (out-of-sample period). The parameters of the model were estimated using the training set while the validation of the model was done using the test set. The rolling window estimation approach was used in this study. The historical fixed set of data (the training set) was used to predict future number continuously over a period of time (the test set).

Four MLMDs were used to model the Brent crude oil prices with the aim of obtaining the optimal MLMD for the Brent crude oil prices. The MLMDs considered are the SVR, ANN, RF and DNN. The time variable, first and second lags of the Brent crude oil prices were used in the model. Most applications of machine learning methods on time series use the lags as explanatory variables (Lee, Kim, Lee, Kim, & Kim, 2018; Nwosu, Obite, & Bartholomew, 2021; Obite, Chukwu, Bartholomew, Nwosu, & Esiaba, 2021; Tealab, Hefny, & Badr, 2017). The optimal MLMD will be compared with a classical ARIMA model that was used by Xiang and Zhuang (2013) and Mensah (2015). The methods used in the ARIMA model is explained in details in Obite, Olewuezi, Ugwuanyim, and Bartholomew (2020).

3.1. Support Vector Regression

The SVR proposed by Vapnik (1982) uses a nonlinear mapping to approximate a function that performs a regression in a feature space after mapping the input data into a high-dimensional feature space. The form of the function that SVR approximate is given in Equation 1.

3.2. Artificial Neural Network

The ANN is a machine learning method used for regression problems and has already been explained recently by Obite et al. (2021). The ANN model is represented mathematical in Equation 9.

The weights are assigned to the connection between the different nodes in the network. The quadratic error function given in Equation 10 is minimized to obtain the value of all the weights.

To reduce the difficulty in the convergence of the neural network, the min-max normalization technique was used to normalize the input variables (input nodes) (Nwosu et al., 2021; Obite et al., 2020).

3.3. Random Forest

The random forest model operates by assembling many decision trees and taking the mode for classification problem or the average for regression problem. Details of the RF method is given in Nwosu et al. (2021) and Obite et al. (2021). The best feature within a random subset of features is searched for and is used for splitting a node in random forest. This helps to fit a better model.

The steps involved in growing each tree are given below:

To grow the trees, select q random sample of the q cases with replacement from the data, where q is the number of cases in the data.
Choose a number d D, and select randomly d variables out of all the D explanatory variables at each of the nodes. The best split on the d selected variables will be identified and the node will be split with it.

Figure 2 depicts a random forest plot.

Figure-2. A random forest plots.

3.4. Deep Neural Network

The Deep Neural Network (DNN) is an ANN with multiple hidden layers. The DNN model has the potentials to study more complex structure than the ANN model due to the presence of multiple hidden layers though if the complexity of the structure is minimal, the ANN model is preferred due to its simple nature when compared to the DNN model. The DNN model has one input layer with at least one node; multiple hidden layers with at least one node in each of the hidden layers; and one output layer with at least one node. All the nodes are connected to all the nodes in the layer next to it and the connections are assigned weights.The logistic activation function is used in the hidden layers while for the linear activation function is used in the output layer.

3.5. Performance Measures

In choosing the optimal MLMD for the Europe Brent crude oil prices, different performance measures were used, namely: The Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE) and Nash-Sutcliffe Efficiency (NSE). The model with the highest NSE; and least RMSE and MAPE values, is the optimal MLMD. The modified Diebold-Mariano (MODM) test was employed also to test if the difference in the predictive ability of the models are significant. The MODM method test the null hypothesis that both models are similar in their predictive ability against the alternative hypothesis that the first model was outperformed by the second model at alpha level of 0.05.

4. RESULTS

The Europe Brent crude oil prices within the study period have been having a continuous upward and downward movement. It had an upward trend from 7th June, 2016 till it got to its peak (86.07$ per barrel) within the study period in 4th October, 2018. After it got to its peak, it began a new trend, a downward trend till it got to the lowest crude oil price within the study period (9.12$ per barrel) in 21st April, 2021, and this was due to the Covid-19 virus. After then, it began a new upward trend till the last day of the study period. The average Europe Brent crude oil dollars per barrel price within the study period is 57.18$.

The results of the four MLMDs (SVR, ANN, RF and DNN) are given below:

4.1. The Support Vector Regression Model

A grid search was used to select the optimal hyperparameters.

This gave us 88 different models. A 10-fold cross validation was used during the grid search to obtain the optimal parameters and the radial basis function was used as the kernel function. The result of the grid search is given in Figure 3.

Carefully looking at Figure 4, we discovered several darker patches for between 0 to 0.12; and for cost, between 0 to 10. With the aid of the R software, = 0.04 and cost = 4 were choosing as the optimal hyperparameters. The number of support vectors for the optimal model is 605. The model has a RMSE of 1.158. MAPE of 0.0145, and NSE of 0.987 in the training set; while in the test set, the RMSE is 17.419, MAPE is 0.4188, and NSE is -0.422.

Table-1. The different ANN models.

	Training			Test
Hidden node size	RMSE	MAPE	NSE	RMSE	MAPE	NSE
1	1.192	0.015	0.986	1.586	0.030	0.988
2	1.206	0.015	0.986	1.859	0.037	0.984
3	1.208	0.015	0.986	2.989	0.067	0.958
4	1.233	0.015	0.985	3.700	0.084	0.936
5	1.199	0.015	0.986	4.001	0.096	0.925
6	1.238	0.015	0.985	5.718	0.135	0.847
7	1.179	0.015	0.987	2.158	0.044	0.978
8	1.186	0.015	0.987	3.826	0.091	0.931

4.2. The Artificial Neural Network Model

In obtaining the optimal model for the ANN, different node sizes for the hidden layer ranging from 1 – 8 were used to fit the model. The performance of the ANN models with different hidden node size is given in Table 1. The model with only 1 node in the hidden layer has the best between the MAPE, RMSE and NSE in both sets; and is considered the optimal ANN model for the Brent crude oil prices. Increasing the nodes in the hidden layer either resulted to overfitting or poor fitting of the training set of the Brent crude oil prices. Minimizing the quadratic error function gave us the weights of the model. The weights and the plot of the neural network are shown in Figure 5.

Figure-5. The ANN (3, 1, 1) model.

4.3. The Random Forest Model

The result of the different RF models trained so as to obtain the optimal parameter for the number of variables between 1 to 3, that will be selected for splitting the node is given in Table 2. Selecting 3 variables gave us the optimal RF model using the different performance measures in both sets.

Table-2. The different RF models.

	Training			Test
No. of Variables	RMSE	MAPE	NSE	RMSE	MAPE	NSE
1	0.588	0.008	0.997	11.976	0.281	0.328
2	0.567	0.007	0.997	7.343	0.151	0.747
3	0.558	0.007	0.997	7.202	0.147	0.757

4.4. The Deep Neural Network Model

The hidden layer was increased to 2 to help the neural network model greater complexity in the data and to produce a model that will carefully study and understand the structural patterns within the Europe Brent crude oil prices. To obtain the optimal hyperparameters for the two hidden layers, 100 models were trained using different combinations of 1 to 10 nodes in both the hidden layers. The performance of 5 out of the 100 models trained are given in Table 3. The model with one node in the first hidden layer and six nodes in the second hidden layer were chosen as the optimal hyperparameters. The weights of the model were obtained by minimizing the quadratic error function of the model. The weights and the DNN plot are shown in Figure 6.

Table-3. The performance of five of the DNN models.

		Training set			Test set
First hidden node	Second hidden node	RMSE	MAPE	NSE	RMSE	MAPE	NSE
1	1	1.188	0.015	0.987	1.612	0.030	0.988
1	6	1.192	0.015	0.986	1.585	0.029	0.988
2	6	1.189	0.015	0.987	1.810	0.035	0.985
4	3	1.195	0.015	0.986	1.813	0.036	0.985
7	9	1.186	0.015	0.987	1.881	0.037	0.983

Figure-6. The deep neural network model.

4.5. Comparison of the SVR, ANN, RF and DNN Models

The four different MLMDs behaved differently in both sets. The MAPE, RMSE, and NSE values; and the MODM test were used to compare the SVR, ANN, RF and DNN models as shown in Tables 4 and 5. A significant MODM test means that the first model was outperformed by the second model. Similarly, a smaller RMSE and MAPE; and a higher NSE means that the model is likely to be better if the MODM test is significant. The main focus of identifying the optimal MLMD is the test set. Their performance in the test set shows that the model was not over fitted and can certainly perform well in a new dataset, and be used in forecasting for future Europe Brent Crude Oil Prices. Though the SVR and RF models performed slightly better than the ANN and DNN models, they both performed poorly in the test set. The SVR model was the least in performance in the test set as confirmed by the RMSE, MAPE, NSE and MODM test.

The ANN and DNN had the least RMSE and MAPE, the highest NSE and a significant MODM test in the test set when compared to the SVR and RF models. Though the DNN and ANN models performed similarly in both sets as confirmed by the RMSE, MAPE, NSE and MODM test. The both models are chosen as the optimal MLMDs for modeling the Europe Brent crude oil prices. We suggested the use of the ANN model in predicting the Brent crude oil prices since it is less complex than the DNN model.

Table-4. Comparison of the different MLMDs.

	Training			Test
Model	RMSE	MAPE	NSE	RMSE	MAPE	NSE
SVR	1.159	0.014	0.987	17.120	0.411	-0.373
ANN (3, 1, 1)	1.192	0.015	0.986	1.586	0.030	0.988
DNN (3, 1, 6, 1)	1.192	0.015	0.986	1.585	0.029	0.988
RF	0.558	0.007	0.997	7.202	0.147	0.757

Table-5. MODM test for the different MLMDs.

			First Model
			SVR		ANN		RF		DNN
			Training	Test	Training	Test	Training	Test	Training	Test
Second Model	SVR	MDM			3.591	-12.064	-13.983	-14.067	3.533	-12.068
	SVR	P-value			0.00**	1.00	1.00	1.00	0.00**	1.00
	ANN	MDM	-3.59	12.06			-14.39	6.68	-0.16	-0.07
	ANN	P-value	1.00	0.00**			1.00	0.00**	0.56	0.53
	RF	MDM	13.98	14.07	14.39	-6.68			14.34	-6.69
	RF	P-value	0.00**	0.00**	0.00**	1.00			0.00**	1.00
	DNN	MDM	-3.53	12.07	0.16	0.07	-14.34	6.69
	DNN	P-value	1.00	0.00**	0.44	0.47	1.00	0.00**

Note: ** and bold mean that the test is significant, and the second model is better.

A plot of the true value (TV) and predicted values from the SVR, ANN, RF and DNN models is given in Figure 7. The plot shows how DNN and ANN accurately forecasted the test set with minimum errors. The SVR forecasts were very far from the true values.

Using the methods in Obite et al. (2021) the ARIMA model with order (0, 1, 0) was the best ARIMA model and it performed very poorly when compared to the true value and the optimal ANN model in the test set as shown in Figure 8.

Figure-7. Plot of the forecasts from the SVR, ANN, DNN and RF models; and the True Value (TV) for the test set.

Figure-8. Plot of the forecasts from the ANN, and ARIMA models; and the True Value (TV) for the test set.

5. DISCUSSION

The analysis of results revealed that the performance of the different MLMDs varied and this is in line with the model behaviour in Nwosu et al. (2021). Precisely, some performed well only on the set used to train it such as the RF and SVR, but when tried on the out-of-sample set, they performed poorly. This conditional variation in performance is also supported by literature (Obite et al., 2021). The ANN and DNN behaved similarly in both the in-sample and out-of-sample sets and are selected as the optimal model. The ANN model chosen to be used for modeling the Brent crude oil price as it is less complex than the DNN model, outperformed the ARIMA model used by Xiang and Zhuang (2013) and Mensah (2015) and in modeling the Brent crude oil price. The ARIMA model could not capture the volatility nature of the crude oil price (not sensitively adaptable) and is therefore a poor choice of model in modeling the Brent crude oil price when compared to the MLMDs. The MLMDs (ANN and DNN) behaved almost exactly like the actual values in the out-of-sample set as shown in Figure 7 and should be used as the optimal model for the Brent crude oil price. It is now obvious why MLMDs are recently promoted

6. CONCLUSION

The price of Brent crude oil is very important to the global economy as it has a huge influence and serves as one of the benchmarks in how other countries and organizations value their crude oil. Few original studies on modeling the Brent crude oil price concentrated on linear and classical models without diverse attention to MLMDs for possible accuracy enhancement and robustness. In fact, the use of MLMDs in modeling the Brent crude oil price has been largely untested. Therefore, in this current study, we used different MLMDs (SVR, ANN, RF and DNN) to obtain the optimal MLMD in modeling the Brent crude oil price and also to show that the optimal MLMD is a better fit to the Brent crude oil price than the classical ARIMA model that has been used by researchers using the in-sample and out-of-sample techniques. The results show that the ANN and DNN models have a high accuracy in predicting the Brent crude oil price and are suggested to be used always to keep a close monitoring of the Brent crude oil price and also to have a pre-knowledge of future Brent crude oil prices.

We recommend that a hybrid ARIMA-ANN and ARIMA-RF models be compared with the DNN and ANN models. It is easily discernible that the proposed hybrid will yield high performance and eminency.

Funding: This study received no specific financial support.

Competing Interests: The authors declare that they have no competing interests.

Acknowledgement: All authors contributed equally to the conception and design of the study.

REFERENCES

Abdollahi, H., & Ebrahimi, S. B. (2020). A new hybrid model for forecasting Brent crude oil price. Energy, 200, 117520.Available at: https://doi.org/10.1016/j.energy.2020.117520.

AlـGounmeein, R. S., & Ismail, M. T. (2021). Modelling and forecasting monthly Brent crude oil prices: A long memory and volatility approach. Statistics in Transition New Series, 22(1), 29-54.Available at: https://doi.org/10.21307/stattrans-2021-002.

Asaolu, T., & Ilo, B. (2012). The Nigerian stock market and oil price: A cointegration analysis. Kuwait Chapter of Arabian Journal of Business and Management Review, 1(5), 39-54.

CFI. (2021). History of North Sea brent crude. Retrieved from https://corporatefinanceinstitute.com/resources/knowledge/other/north-sea-brent-crude/.

EIA. (2021). Changing quality mix is affecting crude oil price differentials and refining decisions. Retrieved from https://www.eia.gov/todayinenergy/detail.php?id=33012 .

EIA. (2021). Drop in U.S. Gasoline prices reflects decline in crude oil costs. Retrieved from https://www.eia.gov/todayinenergy/detail.php?id=6850 .

EIA. (2021). Transportation constraints and export costs widen the brent-WTI crude oil price spread. Retrieved from https://www.eia.gov/todayinenergy/detail.php?id=33752

Lee, J., Kim, C.-G., Lee, J. E., Kim, N. W., & Kim, H. (2018). Application of artificial neural networks to rainfall forecasting in the Geum River basin, Korea. Water, 10(10), 1448.Available at: https://doi.org/10.3390/w10101448.

Mensah, E. K. (2015). Box-Jenkins modelling and forecasting of brent crude oil price. Munich Personal RePEc Archive, No. 67748.

Nwosu, U. I., Obite, C. P., & Bartholomew, D. C. (2021). Modeling US Dollar and Nigerian Naira exchange rates during COVID-19 pandemic period: Identification of a high-performance model for new application. Journal of Mathematics and Statistics Studies, 2(1), 40-52.Available at: https://doi.org/10.32996/jmss.2021.2.1.5.

Obite, C. P., Chukwu, A., Bartholomew, D. C., Nwosu, U. I., & Esiaba, G. E. (2021). Classical and machine learning of crude oil production in Nigeria: Identification of an eminent model for application. Energy Reports, 7, 3497-3505.Available at: https://doi.org/10.1016/j.egyr.2021.06.005.

Obite, C. P., Olewuezi, N. P., Ugwuanyim, G. U., & Bartholomew, D. C. (2020). Multicollinearity effect in regression analysis: A feed forward artificial neural network approach. Asian Journal of Probability and Statistics, 6(1), 22-33.Available at: https://doi.org/10.9734/AJPAS/2020/v6i130151.

Tealab, A., Hefny, H., & Badr, A. (2017). Forecasting of nonlinear time series using ANN. Future Computing and Informatics Journal, 2(1), 39-47.Available at: https://doi.org/10.1016/j.fcij.2017.05.001.

Vapnik, V. N. (1982). Estimation of dependences based on empirical data. Addendum 1, New York: Springer-Verlag.

Xiang, Y., & Zhuang, X. H. (2013). Application of ARIMA model in short-term prediction of international crude oil price. Advanced Materials Research, 798-799, 979-982.Available at: https://doi.org/10.4028/www.scientific.net/amr.798-799.979 .

Views and opinions expressed in this article are the views and opinions of the author(s), Quarterly Journal of Econometrics Research shall not be responsible or answerable for any loss, damage or liability etc. caused in relation to/arising out of the use of the content.

Index