library(survival)
library(maxEff)
# Loading required package: groupedHyperframe
# Registered S3 method overwritten by 'pROC':
# method from
# plot.roc spatstat.explore6 Predictor with Maximum Effect Size
The examples in Chapter 6 require that the
searchpath contains the followingnamespaces. See the explanation of the function name conflict in Section 7.3.5.
search path on author’s computer running RStudio (Posit Team 2025)
search()
# [1] ".GlobalEnv" "package:maxEff" "package:groupedHyperframe" "package:survival" "package:stats" "package:graphics" "package:grDevices"
# [8] "package:utils" "package:datasets" "package:methods" "Autoloads" "package:base"🚧 Chapter 6 is being re-written English-wise. All code are correct here, though. Expected delivery by 2025-12-31.
Chapter 6 documents methods to select one from many numeric predictors for a regression model, to ensure that the additional predictor has the maximum effect size.
6.1 Compute Aggregated Quantiles
Ki67q = groupedHyperframe::Ki67 |>
within.data.frame(expr = {
x = y = NULL # remove x- and y-coords for non-spatial application
}) |>
as.groupedHyperframe(group = ~ patientID/tissueID) |>
quantile(probs = seq.int(from = .01, to = .99, by = .01)) |>
aggregate(by = ~ patientID)A hyper data frame Ki67q with aggregated quantiles
Ki67q |>
head()
# Hyperframe:
# Tstage PFS adj_rad adj_chemo histology Her2 HR node race age patientID logKi67.quantile
# 1 2 100+ FALSE FALSE 3 TRUE TRUE TRUE White 66 PT00037 (numeric)
# 2 1 22 FALSE FALSE 3 FALSE TRUE FALSE Black 42 PT00039 (numeric)
# 3 1 99+ FALSE NA 3 FALSE TRUE FALSE White 60 PT00040 (numeric)
# 4 1 99+ FALSE TRUE 3 FALSE TRUE TRUE White 53 PT00042 (numeric)
# 5 1 112 TRUE TRUE 3 FALSE TRUE TRUE White 52 PT00054 (numeric)
# 6 4 12 TRUE FALSE 2 TRUE TRUE TRUE Black 51 PT00059 (numeric)6.2 Data Partition
Partition into a training (80%) and test (20%) set.
set.seed(32); id = sample.int(n = nrow(Ki67q), size = nrow(Ki67q)*.8)
s0 = Ki67q[id, , drop = FALSE] # training set
s1 = Ki67q[-id, , drop = FALSE] # test setdim(s0)
# [1] 497 12dim(s1)
# [1] 125 126.3 Starting Model
Let’s consider a starting model of endpoint PFS with predictor Tstage on the training set s0. As of package spatstat.geom v3.6.1, the S3 method spatstat.geom::as.data.frame.hyperframe() warns on hypercolumns that aren’t able to be converted to a data frame. Therefore, user should use suppressWarnings() or set #| warning: false in R markdown code–chunk.
m0 = coxph(PFS ~ Tstage, data = s0)Starting coxph model m0
summary(m0)
# Call:
# coxph(formula = PFS ~ Tstage, data = s0)
#
# n= 497, number of events= 91
#
# coef exp(coef) se(coef) z Pr(>|z|)
# Tstage 0.6725 1.9591 0.1061 6.34 2.29e-10 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# exp(coef) exp(-coef) lower .95 upper .95
# Tstage 1.959 0.5104 1.591 2.412
#
# Concordance= 0.654 (se = 0.028 )
# Likelihood ratio test= 32.23 on 1 df, p=1e-08
# Wald test = 40.2 on 1 df, p=2e-10
# Score (logrank) test = 43.73 on 1 df, p=4e-116.4 Adding numeric Predictor
a0 = m0 |>
add_numeric(x = ~ logKi67.quantile) |>
sort_by(y = abs(effsize), decreasing = TRUE) |>
head(n = 2L)The pipeline above consists of three steps.
First, function add_numeric()
- considers each “slice” of the numeric-hypercolumn
logKi67.quantileas an additionalnumericpredictor. Users are encourage to read the packagegroupedHyperframevignette, Appendix Sectionanylistfor more details; updates the starting modelm0with each one of the additionalnumericpredictors, respectively;- returns an
'add_numeric'object, whichinheritsfrom the class'add_'. This is alistofobjects with internal class'add_numeric_'(Section 38.1).
Second, the S3 method sort_by.add_() sorts the input by the absolute values of the regression coefficients, i.e., effect size effsize, of the additional numeric predictors.
Lastly, the S3 method utils:::head.default() chooses the top n element from the input.
The S3 method print.add_numeric() displays the calls to the selected numeric predictors.
a0
# logKi67.quantile["6%"]
# logKi67.quantile["5%"]The S3 method predict.add_numeric() applies the starting model structure in m0, as well as the additional numeric predictors selected by a0, on the test set s1. The author categorizes this functionality under the S3 generic stats::predict(), instead of stats::update(), to emphasize that this “update” is on a newdata, even that the workhorse function is the S3 method stats:::update.default().
a1 = a0 |>
predict(newdata = s1)Predicted models a1
a1
# logKi67.quantile["6%"] :
# Call:
# coxph(formula = PFS ~ Tstage + x., data = newd)
#
# coef exp(coef) se(coef) z p
# Tstage 0.5900 1.8041 0.1959 3.012 0.0026
# x. 0.6071 1.8352 0.8687 0.699 0.4846
#
# Likelihood ratio test=9.23 on 2 df, p=0.009926
# n= 125, number of events= 27
#
# logKi67.quantile["5%"] :
# Call:
# coxph(formula = PFS ~ Tstage + x., data = newd)
#
# coef exp(coef) se(coef) z p
# Tstage 0.5886 1.8015 0.1961 3.002 0.00269
# x. 0.6276 1.8732 0.8746 0.718 0.47299
#
# Likelihood ratio test=9.25 on 2 df, p=0.009794
# n= 125, number of events= 276.5 Adding logical Predictor
6.5.1 Naive Practice
b0 = m0 |>
add_dummy(x = ~ logKi67.quantile) |>
subset(subset = p1 > .05 & p1 < .95) |>
sort_by(y = abs(effsize), decreasing = TRUE) |>
head(n = 2L)The pipeline above consists of four steps.
First, function add_dummy()
- dichotomizes (Chapter 24) each “slice” of the numeric-hypercolumn
logKi67.quantileinto alogicalvariable; - removes the
duplicatedlogicalvariables; updates the starting modelm0with each one of the additionallogicalpredictors, respectively;- returns an
'add_dummy'object, whichinheritsfrom the class'add_'. This is alistofobjects with internal class'add_dummy_'(Section 38.2).
Second, the S3 method subset.add_dummy() subsets the input by the balance of the partition of the additional logical predictor.
Third, the S3 method sort_by.add_() sorts the input by the absolute value of regression coefficients, i.e., effect size effsize, of the additional logical predictor.
Lastly, the S3 method utils:::head.default() chooses the top n element from the input.
The S3 method print.add_dummy() displays the calls to the selected logical predictors.
b0
# logKi67.quantile["95%"]>=6.76825904668107
# logKi67.quantile["94%"]>=6.72093005757954The S3 method generic predict.add_dummy() applies the starting model structure in m0, as well as the additional logical predictors selected by b0, on the test set s1.
b1 = b0 |>
predict(newdata = s1)Predicted models b1
b1
# logKi67.quantile["95%"]>=6.76825904668107 :
# Call:
# coxph(formula = PFS ~ Tstage + x., data = newd)
#
# coef exp(coef) se(coef) z p
# Tstage 0.6287 1.8753 0.1922 3.271 0.00107
# x.TRUE -0.2630 0.7687 0.4693 -0.560 0.57515
#
# Likelihood ratio test=9.05 on 2 df, p=0.01086
# n= 125, number of events= 27
#
# logKi67.quantile["94%"]>=6.72093005757954 :
# Call:
# coxph(formula = PFS ~ Tstage + x., data = newd)
#
# coef exp(coef) se(coef) z p
# Tstage 0.61820 1.85558 0.19162 3.226 0.00125
# x.TRUE -0.09157 0.91249 0.46920 -0.195 0.84526
#
# Likelihood ratio test=8.78 on 2 df, p=0.01238
# n= 125, number of events= 276.5.2 via Repeated Partitions
set.seed(83); c0 = m0 |>
add_dummy_partition(~ logKi67.quantile, times = 20L, p = .8) |>
subset(subset = p1 > .15 & p1 < .85) |>
sort_by(y = abs(effsize), decreasing = TRUE) |>
head(n = 2L)The pipeline above consists of four steps.
First, function add_dummy_partition() creates a dichotomizing rule for each additional numeric predictor in the following steps.
- Generates twenty (
20L) partitions of the training sets0using functionscaret::createDataPartition()orstatusPartition()(Section 38.5) at.8vs..2=(1-.8)ratio. - For the \(i\)-th partition \((i=1,\cdots,20)\) of the training set
s0,- creates a dichotomizing rule of the
numericpredictor in the \(i\)-th training-subset ofs0using functionnode1()(Chapter 24); - applies such dichotomizing rule to the
numericpredictor in the \(i\)-th test-subset ofs0; updates the starting modelm0with the \(i\)-th test-subset ofs0, as well as thelogicalpredictor from the previous step;- records the estimated regression coefficients, i.e., effect size
effsize, of thelogicalpredictor.
- creates a dichotomizing rule of the
- Selects the dichotomizing rule based on the partition with the
medianeffect size of thelogicalpredictor in their own, specific test-subset ofs0. - Returns an
'add_dummy'object.
Second, the S3 method subset.add_dummy() subsets the input by the balance of the partition of the additional logical predictor in their own, specific test-subset of s0.
Third, the S3 method sort_by.add_() sorts the input by the absolute value of regression coefficients, i.e., effect size effsize, of the additional logical predictor in their own, specific test-subset of s0.
Lastly, the S3 method utils:::head.default() chooses the top n element from the input.
Similar to Section 6.5.1, the S3 method print.add_dummy() displays the calls to the selected logical predictors.
c0
# logKi67.quantile["96%"]>=6.8055032497215
# logKi67.quantile["98%"]>=7.25821545225641Similar to Section 6.5.1, the S3 method generic predict.add_dummy() applies the starting model structure in m0, as well as the additional logical predictors selected by c0, on the test set s1.
c1 = c0 |>
predict(newdata = s1)Predicted models c1
c1
# logKi67.quantile["96%"]>=6.8055032497215 :
# Call:
# coxph(formula = PFS ~ Tstage + x., data = newd)
#
# coef exp(coef) se(coef) z p
# Tstage 0.6275 1.8730 0.1924 3.261 0.00111
# x.TRUE -0.3212 0.7253 0.4695 -0.684 0.49392
#
# Likelihood ratio test=9.19 on 2 df, p=0.01012
# n= 125, number of events= 27
#
# logKi67.quantile["98%"]>=7.25821545225641 :
# Call:
# coxph(formula = PFS ~ Tstage + x., data = newd)
#
# coef exp(coef) se(coef) z p
# Tstage 0.6346 1.8862 0.1915 3.313 0.000923
# x.TRUE -0.6196 0.5382 0.3935 -1.575 0.115309
#
# Likelihood ratio test=11.19 on 2 df, p=0.00372
# n= 125, number of events= 27