38  Internal Utility Functions

The examples in Chapter 38 require that the search path contains the following namespaces,

library(groupedHyperframe)
library(groupedHyperframe.random)
library(maxEff)
# Registered S3 method overwritten by 'pROC':
#   method   from            
#   plot.roc spatstat.explore

38.1 'add_numeric_'

The internal class 'add_numeric_' defined in package maxEff v0.2.1 inherits from the class 'call', with additional attributes

  • attr(., 'effsize'), a numeric scalar, regression coefficients, i.e., effect size effsize, of the additional numeric predictor
  • attr(., 'model'), the regression model with additional numeric predictor

The S3 method base::print.default() displays each 'add_numeric_' object.

Example: training models a0, 1st element
a0[[1L]]
Example: training models a0, 2nd element
a0[[2L]]

The S3 method spatstat.geom::with.hyperframe() obtains the selected numeric predictors by passing the call to parameter ee.

Example: 1st selected numeric predictor
s0 |>
  with(ee = a0[[1L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()
s1 |>
  with(ee = a0[[1L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()
Example: 2nd selected numeric predictor
s0 |>
  with(ee = a0[[2L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()
s1 |>
  with(ee = a0[[2L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()

The S3 method predict.add_numeric_() is the workhorse of the S3 method predict.add_numeric().

Example: predict.add_numeric_(); predicted models a1, 1st element
a11 = a0[[1L]] |> 
  predict(newdata = s1)
stopifnot(identical(a1[[1L]], a11))
Example: predict.add_numeric_(); predicted models a1, 2nd element
a12 = a0[[2L]] |> 
  predict(newdata = s1)
stopifnot(identical(a1[[2L]], a12))  

38.2 'add_dummy_'

The internal class 'add_dummy_' defined in package maxEff v0.2.1 inherits from the class 'node1' (Chapter 24), with additional attributes

  • attr(., 'p1'), a numeric scalar between 0 and 1, the TRUE probability of the additional logical predictor in the training set
  • attr(., 'effsize'), a numeric scalar, the regression coefficients, i.e., effect size effsize, of the additional logical predictor
  • attr(., 'model'), the regression model with additional logical predictor

The S3 method base::print.default() displays each 'add_dummy_' object.

Example: training models b0 in training set s0: 1st element
b0[[1L]]
Example: training models b0 in training set s0: 2nd element
b0[[2L]]
Example: training models c0 in test-subset of training set s0: 1st element
c0[[1L]]
Example: training models c0 in test-subset of training set s0: 2nd element
c0[[2L]]

The S3 method predict.node1() evaluates a dichotomizing rule in a hyper data frame. Note that user must call the S3 method predict.node1() explicitly, otherwise the S3 generic stats::predict() would dispatch to predict.add_dummy_().

Example: predict.node1(); 1st selected logical predictor
b0[[1L]] |> 
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins()  
b0[[1L]] |> 
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()
Example: predict.node1(); 2nd selected logical predictor
b0[[2L]] |> 
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins() 
b0[[2L]] |> 
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()  
Example: predict.node1(); 1st selected logical predictor via repeated partitions
c0[[1L]] |>
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins()
c0[[1L]] |>
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()
Example: predict.node1(); 2nd selected logical predictor via repeated partitions
c0[[2L]] |>
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins()
c0[[2L]] |>
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()

The S3 method predict.add_dummy_() is the workhorse of the S3 method predict.add_dummy().

Example: predict.add_dummy_(); predicted models b1: 1st element
b11 = b0[[1L]] |> 
  predict(newdata = s1)
stopifnot(identical(b1[[1L]], b11))
Example: predict.add_dummy_(); predicted models b1: 2nd element
b12 = b0[[2L]] |> 
  predict(newdata = s1)
stopifnot(identical(b1[[2L]], b12))  
Example: predict.add_dummy_(); predicted models c1: 1st element
c11 = c0[[1L]] |> 
  predict(newdata = s1)
stopifnot(identical(c1[[1L]], c11))
Example: predict.add_dummy_(); predicted models c1: 2nd element
c12 = c0[[2L]] |> 
  predict(newdata = s1)
stopifnot(identical(c1[[2L]], c12))  

38.3 grouped_rppp()

Function groupedHyperframe.random::grouped_rppp() implements the matrix parameterization using advanced R language operations. The code snippet inside function grouped_rppp() in Section 4.2 cannot be taken outside function grouped_rppp()!

Previously: p_Matern
set.seed(37); (n = sample(x = 1:4, size = 3L, replace = TRUE)) 
# [1] 2 3 4
set.seed(39); p_Matern = mapply(
  FUN = mvrnorm2, 
  mu = list(kappa = c(3,2), mu = c(10,5), scale = c(.4,.2), meanlog = c(3,5), sdlog = c(.4,.2)), 
  sd = list(kappa = .2, mu = .5, scale = .05, meanlog = .1, sdlog = .01), 
  MoreArgs = list(n = 3L), 
  SIMPLIFY = FALSE
) |>
  within.list(expr = {
    kappa = pmax(kappa, 1 + .Machine$double.eps)
    mu = pmax(mu, 1 + .Machine$double.eps)
    scale = pmax(scale, .Machine$double.eps)
    sdlog = pmax(sdlog, .Machine$double.eps)
  })
Advanced: without language operation
tryCatch(expr = {
  p_Matern |> 
    with.default(expr = {
      spatstat.random::rMatClust(kappa = kappa, scale = scale, mu = mu)
    })
}, error = identity)
# <simpleError: 'scale' should be a single number>

The native pipe operator |> successfully passes the code snippet into function grouped_rppp(), while the pipe operator magrittr::`%>%` (Bache and Wickham 2025, v2.0.4) does not pass the code snippet into function grouped_rppp()!

Advanced: language operation via native pipe |>
p_Matern |> 
  with.default(expr = {
    rMatClust(kappa = kappa, scale = scale, mu = mu) |> 
      grouped_rppp(n = n)
  })
# Grouped Hyperframe: ~g1/g2
# 
# 9 g2 nested in
# 3 g1
# 
# Preview of first 10 (or less) rows:
# 
#     ppp g1 g2
# 1 (ppp)  1  1
# 2 (ppp)  1  2
# 3 (ppp)  2  1
# 4 (ppp)  2  2
# 5 (ppp)  2  3
# 6 (ppp)  3  1
# 7 (ppp)  3  2
# 8 (ppp)  3  3
# 9 (ppp)  3  4
Advanced: language operation via magrittr::`%>%`
library(magrittr)
tryCatch(expr = {
  p_Matern |> 
    with.default(expr = {
      rMatClust(kappa = kappa, scale = scale, mu = mu) %>% 
        grouped_rppp(n = n)
    })
}, error = identity)
# <notSubsettableError in i[[1L]]: object of type 'symbol' is not subsettable>

38.4 mvrnorm2()

Function groupedHyperframe.random::mvrnorm2() is a wrapper of the multivariate normal simulator function MASS::mvrnorm() (Venables and Ripley 2002) to accept the standard deviation(s) \(\sigma\) via parameter sd

  • parameter \(\sigma\) sd may be a numeric scalar, indicating an all-equal diagonal-variance zero-covariance matrix;
  • parameter \(\sigma\) sd may be a numeric vector of the same length as parameter \(\mu\) mu, indicating a diagonal-variance zero-covariance matrix;
  • To specify a full variance-covariance matrix \(\Sigma\), user should use function MASS::mvrnorm() (Venables and Ripley 2002).
Example: function mvrnorm2(), scalar \(\sigma\)
set.seed(12); a1 = MASS::mvrnorm(n = 3L, mu = c(0, 0), Sigma = diag(x = .9^2, nrow = 2L))
set.seed(12); a2 = mvrnorm2(n = 3L, mu = c(0, 0), sd = .9)
stopifnot(identical(a1, a2))
Example: function mvrnorm2(), vector \(\sigma\)
set.seed(42); b1 = MASS::mvrnorm(n = 3L, mu = c(0, 0), Sigma = diag(x = c(.9, 1.1)^2, nrow = 2L))
set.seed(42); b2 = mvrnorm2(n = 3L, mu = c(0, 0), sd = c(.9, 1.1))
stopifnot(identical(b1, b2))
Example: function mvrnorm2(), matrix \(\Sigma\)
(R = matrix(c(1, .5, .5, 1), nrow = 2L)) # correlation matrix
#      [,1] [,2]
# [1,]  1.0  0.5
# [2,]  0.5  1.0
(S = c(.9, 1.1) * R * rep(c(.9, 1.1), each = 2L)) # variance-covariance matrix
#       [,1]  [,2]
# [1,] 0.810 0.495
# [2,] 0.495 1.210
set.seed(23); c1 = MASS::mvrnorm(n = 3L, mu = c(0, 0), Sigma = S)
set.seed(23); c2 = mvrnorm2(n = 3L, mu = c(0, 0), Sigma = S)
stopifnot(identical(c1, c2))

38.5 statusPartition()

Function maxEff::statusPartition() (v0.2.1)

  1. splits a left-censored survival::Surv object by its survival status, i.e., observed vs. left-censored;
  2. partitions the observed and left-censored subjects, respectively, into test/training sets.

See Section 12.2 for the usage of the terms “split” vs. “partition”.

Consider a toy example of

Data: left-censored Surv object capacitor_failure
capacitor_failure = survival::capacitor |> 
  with(expr = survival::Surv(time, status))
capacitor_failure
#  [1]  439   904  1092  1105   572   690   904  1090   315   315   439   628   258   258   347   588   959  1065  1065  1087   216   315   455   473   241   315   332   380   241   241   435   455 
# [33] 1105+ 1105+ 1105+ 1105+ 1090+ 1090+ 1090+ 1090+  628+  628+  628+  628+  588+  588+  588+  588+ 1087+ 1087+ 1087+ 1087+  473+  473+  473+  473+  380+  380+  380+  380+  455+  455+  455+  455+

Function statusPartition() intends to avoid the situation that a Cox proportional hazards model survival::coxph() in one or more of the partitioned data set being degenerate due to the fact that all subjects in that partition being censored.

Example: statusPartition()
set.seed(12); id = capacitor_failure |>
  statusPartition(times = 1L, p = .5)
capacitor_failure[id[[1L]], 2L] |> 
  table() # balanced by survival status
# 
#  0  1 
# 16 16

Function statusPartition() is an extension of the very popular function caret::createDataPartition(), which stratifies a Surv object by the quantiles of its survival time (as of package caret v7.0.1).

Review: caret::createDataPartition(), not balanced by survival status
set.seed(12); id0 = capacitor_failure |>
  caret::createDataPartition(times = 1L, p = .5)
capacitor_failure[id0[[1L]], 2L] |> 
  table()
# 
#  0  1 
# 19 14

38.6 rfactor()

Function groupedHyperframe.random::rfactor() is a wrapper of function base::sample.int(). Function rfactor()

  • has first parameter n of the random sample size, similar to functions stats::rlnorm(), stats::rnbinom(), etc.
  • returns a factor
Example: rfactor()
set.seed(18); rfactor(n = 20L, prob = c(4,2,3))
#  [1] 2 3 2 1 1 3 1 3 1 1 3 3 1 1 2 1 3 1 2 1
# Levels: 1 2 3
Example: rfactor() with levels
set.seed(18); rfactor(n = 20L, prob = c(4,2,3), levels = letters[1:3])
#  [1] b c b a a c a c a a c c a a b a c a b a
# Levels: a b c

38.7 .rppp()

Function groupedHyperframe.random::.rppp() (v0.2.0.20251031) implements the vectorized parameterization using advanced R language operations. The code snippet inside function .rppp() in Section 4.1 cannot be taken outside function .rppp()!

Advanced: without language operation
tryCatch(expr = {
  spatstat.random::rMatClust(kappa = c(10, 5), mu = c(8, 4), scale = c(.15, .06))
}, error = identity)
# <simpleError: 'scale' should be a single number>

The native pipe operator |> successfully passes the code snippet into function .rppp(), while the pipe operator magrittr::`%>%` (Bache and Wickham 2025, v2.0.4) does not pass the code snippet into function .rppp()!

Advanced: language operation via native pipe |>
set.seed(12); r = rMatClust(kappa = c(10, 5), mu = c(8, 4), scale = c(.15, .06)) |>
  .rppp()
# Point-pattern simulated by `spatstat.random::rMatClust()`
# 
Advanced: language operation via magrittr::`%>%`
library(magrittr)
tryCatch(expr = {
  rMatClust(kappa = c(10, 5), mu = c(8, 4), scale = c(.15, .06)) %>% 
    .rppp()
}, error = identity)
# <notSubsettableError in i[[1L]]: object of type 'symbol' is not subsettable>