-
Notifications
You must be signed in to change notification settings - Fork 8
No genlasso #595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No genlasso #595
Conversation
Merge branch 'dev' into no-genlasso # Conflicts: # DESCRIPTION
/preview-docs |
🚀 Deployed on https://679400526c295d36604a015e--epiprocess.netlify.app |
@dshemetov you got ahead of me, I was just wanting to test this out!! Very useful already. @dajmcdon how compatible are genlasso and trendfilter supposed to be? There appear to be some changes from dev: |
The implementation is completely different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumping some notes here. I plan to make at least some minor editing commits and maybe take another look / do actual testing when my mind's a bit more fresh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Installing
trendfilter
wasn't so bad after installing g++ on Ubuntu, so hopefully this alleviates installation issues, thanks! - Did some sanity checking of the code, as far as I could follow it (mostly the logic here, don't quite understand the external package parameters enough to comment on those)
- Did some testing of the functions to make sure we're getting sensible values. Assuming those are fine, this looks good to me.
When I run tests locally, I get a lot of warnings like:
and at some point something like "partial match of [Ah, now I can see both again. Here's an example of one:
and an example of the other:
] At least some from these parts of my .Rprofile options(
warnPartialMatchArgs = TRUE,
warnPartialMatchAttr = TRUE,
warnPartialMatchDollar = TRUE,
useFancyQuotes = FALSE
)
## seems to work on (currently installed as of time of writing) 4.0.5 though doesn't seem to be documented in ?Logic like in some later version
## thought this was only about [[ not [ though
Sys.setenv("_R_CHECK_LENGTH_1_LOGIC2_"=TRUE)
|
@dajmcdon I finished the rewords I was hoping to do + some of the minor tasks. There's still some lingering stuff that I'm hoping you'll be able to decide what to do with & handle. (I've linted those overly long lines in another PR.) |
OK. By my count, two outstanding issues (after the above commits)
|
@brookslogan following up on the point about underregularization. I did a bit of investigating. The previous implementation used the However, the I'm not sure how much more to beat on this. We could do some other default choice, but I think this is a larger problem (one I've encountered before) that we shouldn't necessarily handle here. We can put it on the tooling agenda for next time we meet with Ryan. x <- cases_deaths_subset |>
filter(geo_value == "pa" & time_value >= "2020-06-01") |>
select(tv = time_value, cases = cases_7d_av) |>
arrange(tv) |>
mutate(tv = as.numeric(tv))
# using internal defaults from current epiprocess implementation
gen <- genlasso::trendfilter(x$cases, ord = 3L, maxsteps = 1000L)
cvgen <- genlasso::cv.trendfilter(gen, k = 3)
tf <- trendfilter::cv_trendfilter(x$cases, x$tv, nfolds = 3L) # chosen to match
# both minimizers are at the minimum lambda
plot(cvgen)
plot(tf)
# convert to growth rates using lambda_min
genf <- gen$beta[,cvgen$i.min]
tff <- predict(tf, which_lambda = "lambda_min")
dgenf <- diff(genf) / diff(x$tv)
dgenf <- c(dgenf, dgenf[length(dgenf)])
dtff <- diff(tff) / diff(x$tv)
dtff <- c(dtff, dtff[length(dtff)])
plot(dgenf / genf, ty = "l", col = 1)
lines(dtff / tff, col = 2) |
Nice catches! I do think the current plot looks pretty weird; part of that might be under-regularization; part might be the degree of the polynomial.
I think in the short term, given that upstream changes may need to be made, we should decide among: A. tweaking this text to be more accurate:
B. maybe trying putting 1se as default, seeing if it makes things look better, and remembering to revisit if something changes upstream / the lambda range changes (which might not only change the "min" selection but also "1se"). Conceptually I like 1se's motivation as a default, but I'm afraid there will be even more weird situations than with min, & we're already hitting one with min. Rather than worry about them all now, we could just go with whatever makes this example look better & try to iterate if we/users encounter issues later. C. both. |
I'm fine with A, but against B (and C). Based on the above investigation, B won't have an effect here (it gives the same result as lambda min). CV is not a great estimator for this data, at least implemented as leave 1 fold out and folds are set as every vth observation. The defaults should remain. We should be thinking of the previously "nice smooth result" as nothing more than an artefact that conveniently looked how we wanted. Along the lines of running gradient descent for only a few iterations and saying, "yeah, I stopped way early, but I like the answer!". |
The way `trendfilter` called `dspline_interp` seemed to need the fix in dspline 1.0.1, at least the way we used it in some epiprocess tests (cmu-delphi/epiprocess#595 (comment)).
Checklist
Please:
PR).
brookslogan, nmdefries.
DESCRIPTION
. Always incrementthe patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
process.
Change explanations for reviewer
This completes the process of removing the
genlasso
dependency (because it required a C compiler).glmgen/trendfilter
.growth_rate()
to be a bit less clunky. It could (and often did) return vectors of unexpected lengths (due to NAs or duplicated x-values). This does not play nice withmutate()
which was the main example use case.Note that this does entail breaking changes.
Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch
glmgen
in growth rate function when ready #57