We’ll first estimate a Gaussian regression using the
GauPro
package in R on a CPU, then using
scikit-learn
in Python on a CPU, and finally estimate the
same regression on a GPU in Python using GPJax
from the
Jax
machine learning framework.
Getting started with Gaussian Processes in easier in R which is why we’ll look at an implementation in that language first. See the GauPro vignettes here.
Data
We’ll pull crude oil and natural gas data from the EIA’s API using my R package updated for EIA’s v2 API.
library(devtools)
## Loading required package: usethis
library(GauPro)
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
install_github("Matt-Brigida/EIAdata", ref = "v2_fix")
library(EIAdata)
key <- source("~/eia_key")$value
weekly_ng <- getEIA("NG.RNGC1.W", key=key)
Sys.sleep(10)
weekly_cl <- getEIA("PET.RCLC1.W", key=key)
data <- merge.xts(weekly_ng, weekly_cl, join="inner")
data
## write to csv to also test in python-------
# write.zoo(data, "./ng_cl_data.csv", sep=",")
data <- read.csv("ng_cl_data.csv")
data <- as.xts(data, order.by=as.Date(data$Index))[, c(2,3)]
We’ll convert the data into logs and plot:
data_post_2000 <- data['2000-01-01/']
ng <- as.numeric(data_post_2000$NG.RNGC1.W)
lng <- log(as.numeric(data_post_2000$NG.RNGC1.W))
cl <- as.numeric(data_post_2000$PET.RCLC1.W)
lcl <- log(as.numeric(data_post_2000$PET.RCLC1.W))
# plot(cl, ng)
plot(lcl, lng)
GauPro in R
Gaussian Regression
gpl <- GauPro(lcl, lng, parallel=FALSE)
plot(gpl)
We can get a prediction given an input point:
gpl$predict(2.)
## [,1]
## [1,] 1.35412
However we also get a standard error for the prediction
(se
below).
gpl$predict(2., se=TRUE)
## mean s2 se
## 1 1.35412 0.3420651 0.5848634
gpl$predict(3.)
## [,1]
## [1,] 0.8681941
gpl$predict(4.)
## [,1]
## [1,] 1.314578
kern.exp <- Exponential$new(0)
gpk.exp <- GauPro_kernel_model$new(matrix(lcl, ncol=1), lng, kernel=kern.exp, parallel=FALSE)
## Expected run time: 116 seconds
plot(gpk.exp)
gpk.exp$predict(2.)
## [1] 0.8061206
gpk.exp$predict(3.)
## [1] 0.8631025
gpk.exp$predict(4.)
## [1] 1.303579
Scikit-Learn in Python
import pandas as pd
pd.read_csv("ng_cl_data.csv")
## Index NG.RNGC1.W PET.RCLC1.W
## 0 1994-01-14 2.231 14.63
## 1 1994-01-21 2.297 15.05
## 2 1994-01-28 2.404 15.31
## 3 1994-02-04 2.506 15.73
## 4 1994-02-11 2.369 14.87
## ... ... ... ...
## 1526 2023-04-14 2.114 81.84
## 1527 2023-04-21 2.269 79.20
## 1528 2023-04-28 2.292 76.33
## 1529 2023-05-05 2.188 71.16
## 1530 2023-05-12 2.230 72.07
##
## [1531 rows x 3 columns]