38 Internal Utility Functions

The examples in Chapter 38 require that the search path contains the following namespaces,

library(groupedHyperframe)
library(groupedHyperframe.random)
library(maxEff)
# Registered S3 method overwritten by 'pROC':
#   method   from            
#   plot.roc spatstat.explore

38.1 `'add_numeric_'`

The internal class 'add_numeric_' defined in package maxEff v0.2.1 inherits from the class 'call', with additional attributes

attr(., 'effsize'), a numeric scalar, regression coefficients, i.e., effect size effsize, of the additional numeric predictor
attr(., 'model'), the regression model with additional numeric predictor

The S3 method base::print.default() displays each 'add_numeric_' object.

Example: training models a0, 1st element

a0[[1L]]

Example: training models a0, 2nd element

a0[[2L]]

The S3 method spatstat.geom::with.hyperframe() obtains the selected numeric predictors by passing the call to parameter ee.

Example: 1st selected numeric predictor

s0 |>
  with(ee = a0[[1L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()
s1 |>
  with(ee = a0[[1L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()

Example: 2nd selected numeric predictor

s0 |>
  with(ee = a0[[2L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()
s1 |>
  with(ee = a0[[2L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()

The S3 method predict.add_numeric_() is the workhorse of the S3 method predict.add_numeric().

Example: predict.add_numeric_(); predicted models a1, 1st element

a11 = a0[[1L]] |> 
  predict(newdata = s1)
stopifnot(identical(a1[[1L]], a11))

Example: predict.add_numeric_(); predicted models a1, 2nd element

a12 = a0[[2L]] |> 
  predict(newdata = s1)
stopifnot(identical(a1[[2L]], a12))

38.2 `'add_dummy_'`

The internal class 'add_dummy_' defined in package maxEff v0.2.1 inherits from the class 'node1' (Chapter 24), with additional attributes

attr(., 'p1'), a numeric scalar between 0 and 1, the TRUE probability of the additional logical predictor in the training set
attr(., 'effsize'), a numeric scalar, the regression coefficients, i.e., effect size effsize, of the additional logical predictor
attr(., 'model'), the regression model with additional logical predictor

The S3 method base::print.default() displays each 'add_dummy_' object.

Example: training models b0 in training set s0: 1st element

b0[[1L]]

Example: training models b0 in training set s0: 2nd element

b0[[2L]]

Example: training models c0 in test-subset of training set s0: 1st element

c0[[1L]]

Example: training models c0 in test-subset of training set s0: 2nd element

c0[[2L]]

The S3 method predict.node1() evaluates a dichotomizing rule in a hyper data frame. Note that user must call the S3 method predict.node1() explicitly, otherwise the S3 generic stats::predict() would dispatch to predict.add_dummy_().

Example: predict.node1(); 1st selected logical predictor

b0[[1L]] |> 
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins()  
b0[[1L]] |> 
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()

Example: predict.node1(); 2nd selected logical predictor

b0[[2L]] |> 
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins() 
b0[[2L]] |> 
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()

Example: predict.node1(); 1st selected logical predictor via repeated partitions

c0[[1L]] |>
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins()
c0[[1L]] |>
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()

Example: predict.node1(); 2nd selected logical predictor via repeated partitions

c0[[2L]] |>
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins()
c0[[2L]] |>
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()

The S3 method predict.add_dummy_() is the workhorse of the S3 method predict.add_dummy().

Example: predict.add_dummy_(); predicted models b1: 1st element

b11 = b0[[1L]] |> 
  predict(newdata = s1)
stopifnot(identical(b1[[1L]], b11))

Example: predict.add_dummy_(); predicted models b1: 2nd element

b12 = b0[[2L]] |> 
  predict(newdata = s1)
stopifnot(identical(b1[[2L]], b12))

Example: predict.add_dummy_(); predicted models c1: 1st element

c11 = c0[[1L]] |> 
  predict(newdata = s1)
stopifnot(identical(c1[[1L]], c11))

Example: predict.add_dummy_(); predicted models c1: 2nd element

c12 = c0[[2L]] |> 
  predict(newdata = s1)
stopifnot(identical(c1[[2L]], c12))

38.3 `grouped_rppp()`

Function groupedHyperframe.random::grouped_rppp() implements the matrix parameterization using advanced R language operations. The code snippet inside function grouped_rppp() in Section 4.2 cannot be taken outside function grouped_rppp()!

Previously: p_Matern

set.seed(37); (n = sample(x = 1:4, size = 3L, replace = TRUE)) 
# [1] 2 3 4
set.seed(39); p_Matern = mapply(
  FUN = mvrnorm2, 
  mu = list(kappa = c(3,2), mu = c(10,5), scale = c(.4,.2), meanlog = c(3,5), sdlog = c(.4,.2)), 
  sd = list(kappa = .2, mu = .5, scale = .05, meanlog = .1, sdlog = .01), 
  MoreArgs = list(n = 3L), 
  SIMPLIFY = FALSE
) |>
  within.list(expr = {
    kappa = pmax(kappa, 1 + .Machine$double.eps)
    mu = pmax(mu, 1 + .Machine$double.eps)
    scale = pmax(scale, .Machine$double.eps)
    sdlog = pmax(sdlog, .Machine$double.eps)
  })

Advanced: without language operation

tryCatch(expr = {
  p_Matern |> 
    with.default(expr = {
      spatstat.random::rMatClust(kappa = kappa, scale = scale, mu = mu)
    })
}, error = identity)
# <simpleError: 'scale' should be a single number>

The native pipe operator |> successfully passes the code snippet into function grouped_rppp(), while the pipe operator magrittr::`%>%` (Bache and Wickham 2025, v2.0.4) does not pass the code snippet into function grouped_rppp()!

Advanced: language operation via native pipe |>

p_Matern |> 
  with.default(expr = {
    rMatClust(kappa = kappa, scale = scale, mu = mu) |> 
      grouped_rppp(n = n)
  })
# Grouped Hyperframe: ~g1/g2
# 
# 9 g2 nested in
# 3 g1
# 
# Preview of first 10 (or less) rows:
# 
#     ppp g1 g2
# 1 (ppp)  1  1
# 2 (ppp)  1  2
# 3 (ppp)  2  1
# 4 (ppp)  2  2
# 5 (ppp)  2  3
# 6 (ppp)  3  1
# 7 (ppp)  3  2
# 8 (ppp)  3  3
# 9 (ppp)  3  4

Advanced: language operation via magrittr::`%>%`

library(magrittr)
tryCatch(expr = {
  p_Matern |> 
    with.default(expr = {
      rMatClust(kappa = kappa, scale = scale, mu = mu) %>% 
        grouped_rppp(n = n)
    })
}, error = identity)
# <notSubsettableError in i[[1L]]: object of type 'symbol' is not subsettable>

38.4 `mvrnorm2()`

Function groupedHyperframe.random::mvrnorm2() is a wrapper of the multivariate normal simulator function MASS::mvrnorm() (Venables and Ripley 2002) to accept the standard deviation(s) \(\sigma\) via parameter sd

parameter \(\sigma\) sd may be a numeric scalar, indicating an all-equal diagonal-variance zero-covariance matrix;
parameter \(\sigma\) sd may be a numeric vector of the same length as parameter \(\mu\) mu, indicating a diagonal-variance zero-covariance matrix;
To specify a full variance-covariance matrix \(\Sigma\), user should use function MASS::mvrnorm() (Venables and Ripley 2002).

Example: function mvrnorm2(), scalar \(\sigma\)

set.seed(12); a1 = MASS::mvrnorm(n = 3L, mu = c(0, 0), Sigma = diag(x = .9^2, nrow = 2L))
set.seed(12); a2 = mvrnorm2(n = 3L, mu = c(0, 0), sd = .9)
stopifnot(identical(a1, a2))

Example: function mvrnorm2(), vector \(\sigma\)

set.seed(42); b1 = MASS::mvrnorm(n = 3L, mu = c(0, 0), Sigma = diag(x = c(.9, 1.1)^2, nrow = 2L))
set.seed(42); b2 = mvrnorm2(n = 3L, mu = c(0, 0), sd = c(.9, 1.1))
stopifnot(identical(b1, b2))

Example: function mvrnorm2(), matrix \(\Sigma\)

(R = matrix(c(1, .5, .5, 1), nrow = 2L)) # correlation matrix
#      [,1] [,2]
# [1,]  1.0  0.5
# [2,]  0.5  1.0
(S = c(.9, 1.1) * R * rep(c(.9, 1.1), each = 2L)) # variance-covariance matrix
#       [,1]  [,2]
# [1,] 0.810 0.495
# [2,] 0.495 1.210
set.seed(23); c1 = MASS::mvrnorm(n = 3L, mu = c(0, 0), Sigma = S)
set.seed(23); c2 = mvrnorm2(n = 3L, mu = c(0, 0), Sigma = S)
stopifnot(identical(c1, c2))

38.5 `statusPartition()`

Function maxEff::statusPartition() (v0.2.1)

splits a left-censored survival::Surv object by its survival status, i.e., observed vs. left-censored;
partitions the observed and left-censored subjects, respectively, into test/training sets.

See Section 12.2 for the usage of the terms “split” vs. “partition”.

Consider a toy example of

Data: left-censored Surv object capacitor_failure

capacitor_failure = survival::capacitor |> 
  with(expr = survival::Surv(time, status))
capacitor_failure
#  [1]  439   904  1092  1105   572   690   904  1090   315   315   439   628   258   258   347   588   959  1065  1065  1087   216   315   455   473   241   315   332   380   241   241   435   455 
# [33] 1105+ 1105+ 1105+ 1105+ 1090+ 1090+ 1090+ 1090+  628+  628+  628+  628+  588+  588+  588+  588+ 1087+ 1087+ 1087+ 1087+  473+  473+  473+  473+  380+  380+  380+  380+  455+  455+  455+  455+

Function statusPartition() intends to avoid the situation that a Cox proportional hazards model survival::coxph() in one or more of the partitioned data set being degenerate due to the fact that all subjects in that partition being censored.

Example: statusPartition()

set.seed(12); id = capacitor_failure |>
  statusPartition(times = 1L, p = .5)
capacitor_failure[id[[1L]], 2L] |> 
  table() # balanced by survival status
# 
#  0  1 
# 16 16

Function statusPartition() is an extension of the very popular function caret::createDataPartition(), which stratifies a Surv object by the quantiles of its survival time (as of package caret v7.0.1).

Review: caret::createDataPartition(), not balanced by survival status

set.seed(12); id0 = capacitor_failure |>
  caret::createDataPartition(times = 1L, p = .5)
capacitor_failure[id0[[1L]], 2L] |> 
  table()
# 
#  0  1 
# 19 14

38.6 `rfactor()`

Function groupedHyperframe.random::rfactor() is a wrapper of function base::sample.int(). Function rfactor()

has first parameter n of the random sample size, similar to functions stats::rlnorm(), stats::rnbinom(), etc.
returns a factor

Example: rfactor()

set.seed(18); rfactor(n = 20L, prob = c(4,2,3))
#  [1] 2 3 2 1 1 3 1 3 1 1 3 3 1 1 2 1 3 1 2 1
# Levels: 1 2 3

Example: rfactor() with levels

set.seed(18); rfactor(n = 20L, prob = c(4,2,3), levels = letters[1:3])
#  [1] b c b a a c a c a a c c a a b a c a b a
# Levels: a b c

38.7 `.rppp()`

Function groupedHyperframe.random::.rppp() (v0.2.0.20251031) implements the vectorized parameterization using advanced R language operations. The code snippet inside function .rppp() in Section 4.1 cannot be taken outside function .rppp()!

Advanced: without language operation

tryCatch(expr = {
  spatstat.random::rMatClust(kappa = c(10, 5), mu = c(8, 4), scale = c(.15, .06))
}, error = identity)
# <simpleError: 'scale' should be a single number>

The native pipe operator |> successfully passes the code snippet into function .rppp(), while the pipe operator magrittr::`%>%` (Bache and Wickham 2025, v2.0.4) does not pass the code snippet into function .rppp()!

Advanced: language operation via native pipe |>

set.seed(12); r = rMatClust(kappa = c(10, 5), mu = c(8, 4), scale = c(.15, .06)) |>
  .rppp()
# Point-pattern simulated by `spatstat.random::rMatClust()`
#

Advanced: language operation via magrittr::`%>%`

library(magrittr)
tryCatch(expr = {
  rMatClust(kappa = c(10, 5), mu = c(8, 4), scale = c(.15, .06)) %>% 
    .rppp()
}, error = identity)
# <notSubsettableError in i[[1L]]: object of type 'symbol' is not subsettable>

38.1 'add_numeric_'

38.2 'add_dummy_'

38.3 grouped_rppp()

38.4 mvrnorm2()

38.5 statusPartition()

38.6 rfactor()

38.7 .rppp()