No genlasso #595

dajmcdon · 2025-01-23T00:19:48Z

Checklist

Please:

Make sure this PR is against "dev", not "main" (unless this is a release
PR).
Request a review from one of the current main reviewers:
brookslogan, nmdefries.
Makes sure to bump the version number in DESCRIPTION. Always increment
the patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
Describe changes made in NEWS.md, making sure breaking changes
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
See DEVELOPMENT.md for more information on the development
process.

Change explanations for reviewer

This completes the process of removing the genlasso dependency (because it required a C compiler).

We replace the functionality with glmgen/trendfilter.
But list this in Suggests and test for it if that method is requested.
Also refactor growth_rate() to be a bit less clunky. It could (and often did) return vectors of unexpected lengths (due to NAs or duplicated x-values). This does not play nice with mutate() which was the main example use case.
Add tests for the (many) edge cases of argument interaction.

Note that this does entail breaking changes.

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

Resolves Use glmgen in growth rate function when ready #57
Closes Move genlasso to Suggests:/Enhances: #585

…into no-genlasso

Merge branch 'dev' into no-genlasso # Conflicts: # DESCRIPTION

dshemetov · 2025-01-24T20:57:42Z

/preview-docs

github-actions · 2025-01-24T21:04:24Z

🚀 Deployed on https://679400526c295d36604a015e--epiprocess.netlify.app

brookslogan · 2025-01-24T21:32:04Z

@dshemetov you got ahead of me, I was just wanting to test this out!! Very useful already.

@dajmcdon how compatible are genlasso and trendfilter supposed to be? There appear to be some changes from dev:

to this PR:

and trendfilter seems like it might be a bit under-regularized in this case.

dajmcdon · 2025-01-24T21:55:27Z

The implementation is completely different. genlasso is a path algorithm while trendfilter uses ADMM with warm-starts (From Ramdas+Tibshirani 2016). So the lambda's visited are likely quite different. The default here is to use the CV minimizer (in both cases). So they shouldn't be too far off, modulo the possible lambdas that could be used. We could switch to the 1se rule if we want more regularization.

brookslogan

Dumping some notes here. I plan to make at least some minor editing commits and maybe take another look / do actual testing when my mind's a bit more fresh.

R/growth_rate.R

NEWS.md

R/growth_rate.R

dshemetov

Installing trendfilter wasn't so bad after installing g++ on Ubuntu, so hopefully this alleviates installation issues, thanks!
Did some sanity checking of the code, as far as I could follow it (mostly the logic here, don't quite understand the external package parameters enough to comment on those)
Did some testing of the functions to make sure we're getting sensible values. Assuming those are fine, this looks good to me.

…rd error

brookslogan · 2025-01-28T18:24:50Z

When I run tests locally, I get a lot of warnings like:

subscript out of bounds (index 29 >= vector size 29)

and at some point something like "partial match of tol = to tolerance".

[Ah, now I can see both again. Here's an example of one:

── Warning (test-growth_rate.R:25:3): new setup args and warnings are as expected ───
subscript out of bounds (index 29 >= vector size 29)
Backtrace:
     ▆
  1. ├─testthat::expect_length(...) at test-growth_rate.R:25:3
  2. │ └─testthat::quasi_label(enquo(object), arg = "object") at testthat/R/expect-length.R:18:3
  3. │   └─rlang::eval_bare(expr, quo_get_env(quo)) at testthat/R/quasi-label.R:45:3
  4. └─epiprocess::growth_rate(y = c(1:20, NA, 22:30), method = "trend_filter")
  5.   ├─stats::predict(obj, newx = x0, which_lambda = which_lambda) at epiprocess/R/growth_rate.R:255:9
  6.   └─trendfilter:::predict.cv_trendfilter(obj, newx = x0, which_lambda = which_lambda)
  7.     ├─stats::predict(object$full_fit, newx, which_lambda, ...)
  8.     └─trendfilter:::predict.trendfilter(...)
  9.       └─base::apply(...)
 10.         └─trendfilter (local) FUN(newX[, i], ...)
 11.           └─dspline::dspline_interp(th, object$k, object$x, newx)
 12.             └─dspline:::rcpp_dspline_interp(v, k, xd, x, implicit)

and an example of the other:

── Warning (test-growth_rate.R:187:3): trendfilter growth_rate implementation ───────
partial match of 'tol' to 'tolerance'
Backtrace:
    ▆
 1. ├─testthat::expect_length(...) at test-growth_rate.R:187:3
 2. │ └─testthat::quasi_label(enquo(object), arg = "object") at testthat/R/expect-length.R:18:3
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo)) at testthat/R/quasi-label.R:45:3
 4. └─epiprocess::growth_rate(y = z, method = "trend_filter", params = growth_rate_params(nfolds = 3))
 5.   └─trendfilter::cv_trendfilter(...) at epiprocess/R/growth_rate.R:246:9
 6.     └─trendfilter::trendfilter(...)
 7.       └─trendfilter:::admm_lambda_seq(...)

]

At least some from these parts of my .Rprofile

options(
  warnPartialMatchArgs = TRUE,
  warnPartialMatchAttr = TRUE,
  warnPartialMatchDollar = TRUE,
  useFancyQuotes = FALSE
)

## seems to work on (currently installed as of time of writing) 4.0.5 though doesn't seem to be documented in ?Logic like in some later version
## thought this was only about [[ not [ though
Sys.setenv("_R_CHECK_LENGTH_1_LOGIC2_"=TRUE)

Are these epiprocess or upstream problems? Do they need addressed? The subscript out of bounds thing seems a bit fishy, the partial match more innocuous.

brookslogan · 2025-01-28T19:42:16Z

@dajmcdon I finished the rewords I was hoping to do + some of the minor tasks. There's still some lingering stuff that I'm hoping you'll be able to decide what to do with & handle.

(I've linted those overly long lines in another PR.)

This reverts commit 5612df0.

dajmcdon · 2025-01-29T06:11:35Z

OK. By my count, two outstanding issues (after the above commits)

The subscript out of bounds (index 29 >= vector size 29) warning. I can't reproduce this locally, but it happens in CI.
The question of "are the new results" too wiggly? I think this is up for debate. @lcbrooks what do you think about using lambda.1se as the default instead of the minimizer (it was the minimizer previously). I'm not sure what could account for the difference, though Ryan may know.

dajmcdon · 2025-01-29T21:31:25Z

@brookslogan following up on the point about underregularization. I did a bit of investigating. The previous implementation used the cv minimizer, but capped the number of steps in the solution path at 1000. Because of this, lambda_min == min(lambda) and min(lambda) from genlasso is much larger than min(lambda) from trendfilter. The 1000 steps doesn't "go down the path" as far as in the other implementation, visiting only smoother solutions. So it effectively forces more smoothness due to other algorithmic choices.

However, the trendfilter implementation also produces a questionable level of smoothness. CV there also has the property that lambda_min == min(lambda). So ideally, we would consider even smaller levels of regularization!

I'm not sure how much more to beat on this. We could do some other default choice, but I think this is a larger problem (one I've encountered before) that we shouldn't necessarily handle here. We can put it on the tooling agenda for next time we meet with Ryan.

x <- cases_deaths_subset |>
  filter(geo_value == "pa" & time_value >= "2020-06-01") |>
  select(tv = time_value, cases = cases_7d_av) |>
  arrange(tv) |>
  mutate(tv = as.numeric(tv))

# using internal defaults from current epiprocess implementation
gen <- genlasso::trendfilter(x$cases, ord = 3L, maxsteps = 1000L)
cvgen <- genlasso::cv.trendfilter(gen, k = 3)

tf <- trendfilter::cv_trendfilter(x$cases, x$tv, nfolds = 3L) # chosen to match

# both minimizers are at the minimum lambda
plot(cvgen)
plot(tf)

# convert to growth rates using lambda_min
genf <- gen$beta[,cvgen$i.min]
tff <- predict(tf, which_lambda = "lambda_min")
dgenf <- diff(genf) / diff(x$tv)
dgenf <- c(dgenf, dgenf[length(dgenf)])
dtff <- diff(tff) / diff(x$tv)
dtff <- c(dtff, dtff[length(dtff)])

plot(dgenf / genf, ty = "l", col = 1)
lines(dtff / tff, col = 2)

brookslogan · 2025-01-29T21:45:28Z

Nice catches! I do think the current plot looks pretty weird; part of that might be under-regularization; part might be the degree of the polynomial.

I could believe there is a rapid change around Dec & Jan that's oversmoothed in genlasso, but all these other little bumps seem suspect.
The small bumps may look particularly unnatural due to the degree of the spline, but we may sort of be constrained there by attempting apples-to-apples comparison with smoothing splines.

I think in the short term, given that upstream changes may need to be made, we should decide among:

A. tweaking this text to be more accurate:

In this particular example, the trend filtering estimates of growth rate appear to be much more stable than those from the smoothing spline, and also much more stable than the estimates from local relative changes and linear regressions.

B. maybe trying putting 1se as default, seeing if it makes things look better, and remembering to revisit if something changes upstream / the lambda range changes (which might not only change the "min" selection but also "1se"). Conceptually I like 1se's motivation as a default, but I'm afraid there will be even more weird situations than with min, & we're already hitting one with min. Rather than worry about them all now, we could just go with whatever makes this example look better & try to iterate if we/users encounter issues later.

C. both.

dajmcdon · 2025-01-29T22:09:34Z

I'm fine with A, but against B (and C).

Based on the above investigation, B won't have an effect here (it gives the same result as lambda min). CV is not a great estimator for this data, at least implemented as leave 1 fold out and folds are set as every vth observation. The defaults should remain.

We should be thinking of the previously "nice smooth result" as nothing more than an artefact that conveniently looked how we wanted. Along the lines of running gradient descent for only a few iterations and saying, "yeah, I stopped way early, but I like the answer!".

tests/testthat/test-growth_rate.R

The way `trendfilter` called `dspline_interp` seemed to need the fix in dspline 1.0.1, at least the way we used it in some epiprocess tests (cmu-delphi/epiprocess#595 (comment)).

dajmcdon and others added 14 commits December 11, 2024 13:07

rm genlasso

cc07ab0

fix trendfilter + growth_rate

975e783

add tests for parameter constructor and splines

a236706

pass minimal tests

967f16a

docs: document (GHA)

2f8b06f

style: styler (GHA)

b41b5fb

slight modifications to the vignette

bbf5a07

add install instructions to vignette

92f3eb0

Merge branch 'no-genlasso' of https://github.com/cmu-delphi/epiprocess …

a6e8785

…into no-genlasso

redocument

f2c9383

merge dev, bump version.

5a242a1

Merge branch 'dev' into no-genlasso # Conflicts: # DESCRIPTION

redocument

0e10f94

add tests, pass checks

5101627

style: styler (GHA)

6d7091c

dajmcdon requested review from dshemetov and brookslogan January 23, 2025 00:20

dajmcdon added 2 commits January 23, 2025 12:02

handle annoying lintr crud

61628fd

add missing fn for pkgdown

3b77ef6

Merge branch 'dev' into no-genlasso

bcdb216

docs: document (GHA)

8758128

brookslogan reviewed Jan 24, 2025

View reviewed changes

brookslogan self-requested a review January 24, 2025 23:17

dshemetov approved these changes Jan 25, 2025

View reviewed changes

brookslogan added 3 commits January 28, 2025 09:46

docs(growth_rate): reword + typo

5280faf

Use glmgen/trendfilter installation command as suggestion

0426ec4

Make growth_rate(method = "trendfilter") without {trendfilter} a ha…

393370f

…rd error

brookslogan and others added 2 commits January 28, 2025 10:06

refactor(growth_rate): preoptimize/normalize some ops

341be94

docs: document (GHA)

d10c440

brookslogan and others added 3 commits January 28, 2025 11:38

fix(growth_rate): single_lambda check

8c92285

Rename growth_rate_global_params() -> growth_rate_params()

b0d3236

docs: document (GHA)

2baa9e0

brookslogan and others added 7 commits January 28, 2025 13:03

Further NEWS rewording

730bc55

remove scaling by sd (now done internally by the method)

264fbcf

remove sorting of x0, not really needed

5a037bc

rm straggling sdy

2ce5daa

satisfy the linter

5612df0

fix rcmd check errors

85e6530

Revert "satisfy the linter"

d5c3498

This reverts commit 5612df0.

brookslogan reviewed Jan 29, 2025

View reviewed changes

tests/testthat/test-growth_rate.R Show resolved Hide resolved

brookslogan mentioned this pull request Jan 30, 2025

Better reactions to NAs in dspline_interp(), and implicit = TRUE generating garbage values glmgen/dspline#16

Closed

Re-lint test line lengths

059b968

brookslogan mentioned this pull request Feb 10, 2025

Require dspline >= 1.0.1 glmgen/trendfilter#13

Merged

brookslogan merged commit aa18382 into dev Feb 10, 2025
4 checks passed

brookslogan deleted the no-genlasso branch February 10, 2025 20:10

dajmcdon mentioned this pull request Feb 11, 2025

Hotfix growth rate cmu-delphi/epipredict#437

Merged

5 tasks

dshemetov mentioned this pull request Feb 12, 2025

fix: update recipes internal function name cmu-delphi/epipredict#439

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No genlasso #595

No genlasso #595

dajmcdon commented Jan 23, 2025

dshemetov commented Jan 24, 2025

github-actions bot commented Jan 24, 2025

brookslogan commented Jan 24, 2025

dajmcdon commented Jan 24, 2025

brookslogan left a comment

dshemetov left a comment •

edited

Loading

brookslogan commented Jan 28, 2025 •

edited

Loading

brookslogan commented Jan 28, 2025 •

edited

Loading

dajmcdon commented Jan 29, 2025 •

edited by brookslogan

Loading

dajmcdon commented Jan 29, 2025

brookslogan commented Jan 29, 2025

dajmcdon commented Jan 29, 2025

No genlasso #595

No genlasso #595

Conversation

dajmcdon commented Jan 23, 2025

Checklist

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

dshemetov commented Jan 24, 2025

github-actions bot commented Jan 24, 2025

brookslogan commented Jan 24, 2025

dajmcdon commented Jan 24, 2025

brookslogan left a comment

Choose a reason for hiding this comment

dshemetov left a comment • edited Loading

Choose a reason for hiding this comment

brookslogan commented Jan 28, 2025 • edited Loading

brookslogan commented Jan 28, 2025 • edited Loading

dajmcdon commented Jan 29, 2025 • edited by brookslogan Loading

dajmcdon commented Jan 29, 2025

brookslogan commented Jan 29, 2025

dajmcdon commented Jan 29, 2025

dshemetov left a comment •

edited

Loading

brookslogan commented Jan 28, 2025 •

edited

Loading

brookslogan commented Jan 28, 2025 •

edited

Loading

dajmcdon commented Jan 29, 2025 •

edited by brookslogan

Loading