Gaussian Processes

Matt Brigida

We’ll first estimate a Gaussian regression using the GauPro package in R on a CPU, then using scikit-learn in Python on a CPU, and finally estimate the same regression on a GPU in Python using GPJax from the Jax machine learning framework.

Getting started with Gaussian Processes in easier in R which is why we’ll look at an implementation in that language first. See the GauPro vignettes here.

Data

We’ll pull crude oil and natural gas data from the EIA’s API using my R package updated for EIA’s v2 API.

library(devtools)
## Loading required package: usethis
library(GauPro)
library(xts)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
install_github("Matt-Brigida/EIAdata", ref = "v2_fix")

library(EIAdata)

key <- source("~/eia_key")$value

weekly_ng <- getEIA("NG.RNGC1.W", key=key)
Sys.sleep(10)
weekly_cl <- getEIA("PET.RCLC1.W", key=key)

data <- merge.xts(weekly_ng, weekly_cl, join="inner")
data
## write to csv to also test in python-------
# write.zoo(data, "./ng_cl_data.csv", sep=",")
data <- read.csv("ng_cl_data.csv")
data <- as.xts(data, order.by=as.Date(data$Index))[, c(2,3)]

We’ll convert the data into logs and plot:

data_post_2000 <- data['2000-01-01/']
ng <- as.numeric(data_post_2000$NG.RNGC1.W)
lng <- log(as.numeric(data_post_2000$NG.RNGC1.W))
cl <- as.numeric(data_post_2000$PET.RCLC1.W)
lcl <- log(as.numeric(data_post_2000$PET.RCLC1.W))

# plot(cl, ng)
plot(lcl, lng)

GauPro in R

Gaussian Regression

gpl <- GauPro(lcl, lng, parallel=FALSE)
plot(gpl)

We can get a prediction given an input point:

gpl$predict(2.)
##         [,1]
## [1,] 1.35412

However we also get a standard error for the prediction (se below).

gpl$predict(2., se=TRUE)
##      mean        s2        se
## 1 1.35412 0.3420651 0.5848634
gpl$predict(3.)
##           [,1]
## [1,] 0.8681941
gpl$predict(4.)
##          [,1]
## [1,] 1.314578
kern.exp <- Exponential$new(0)
gpk.exp <- GauPro_kernel_model$new(matrix(lcl, ncol=1), lng, kernel=kern.exp, parallel=FALSE)
## Expected run time: 116 seconds
plot(gpk.exp)

gpk.exp$predict(2.)
## [1] 0.8061206
gpk.exp$predict(3.)
## [1] 0.8631025
gpk.exp$predict(4.)
## [1] 1.303579

Scikit-Learn in Python

Here is a good tutorial

import pandas as pd
pd.read_csv("ng_cl_data.csv")
##            Index  NG.RNGC1.W  PET.RCLC1.W
## 0     1994-01-14       2.231        14.63
## 1     1994-01-21       2.297        15.05
## 2     1994-01-28       2.404        15.31
## 3     1994-02-04       2.506        15.73
## 4     1994-02-11       2.369        14.87
## ...          ...         ...          ...
## 1526  2023-04-14       2.114        81.84
## 1527  2023-04-21       2.269        79.20
## 1528  2023-04-28       2.292        76.33
## 1529  2023-05-05       2.188        71.16
## 1530  2023-05-12       2.230        72.07
## 
## [1531 rows x 3 columns]

Pyro