Skip to content

Feature request: more lenient check_nondata_cols #3835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
teunbrand opened this issue Feb 25, 2020 · 8 comments
Closed

Feature request: more lenient check_nondata_cols #3835

teunbrand opened this issue Feb 25, 2020 · 8 comments

Comments

@teunbrand
Copy link
Collaborator

Currently data.frame columns are dropped when they don't satisfy the checks of check_nondata_cols which looks like the following:

check_nondata_cols <- function(x) {
  idx <- (vapply(x, function(x) is.null(x) || rlang::is_vector(x), logical(1)))
  names(x)[which(!idx)]
}

The reason I would like it to be more lenient is such that vector-like columns which technically aren't vectors could be accepted, specifically for the virtual S4Vectors::Vector-class (and non-virtual classes containing this). This would open up channels to have Bioconductor work with ggplot2 more naturally, provided some extra infrastructure is implemented at the scale/coordinate level (which could befit a ggplot extention package).

I would propose to change check_nondata_cols to the following:

check_nondata_cols <- function(x) {
  idx <- (vapply(x, function(x) is.null(x) || rlang::is_vector(x) || inherits(x, "Vector"), logical(1)))
  names(x)[which(!idx)]
}

Thanks you for reading and considering!

@clauswilke
Copy link
Member

I don't think the intend was ever to exclude meaningful data columns. This was just to filter out nonsensical things, such as functions.

However, the question is why rlang::is_vector(x) fails. Maybe it requires modification? Could you provide a simple reprex where you create an object of a type that would fail the check but can be reasonably considered as a vector?

@teunbrand
Copy link
Collaborator Author

I think rlang::is_vector() (and also base::is.vector()) always return FALSE for many bioconductor S4 classes since their internal representation is not as a vector but most relevant data is stored in slots, akin to S3 attributes, that often behave in a parallel way to vectors.

Below is a minimal reprex, keep in mind though that the Bioconductor S4 Vector virtual class actually has methods that make it behave the way vectors behave, while this reprex does not.

library(ggplot2)

setClass("fakeinteger", representation = list(x = "integer"), 
         prototype = list(x = integer()))

y <- new("fakeinteger", x = 1:5)

ggplot2:::check_nondata_cols(list(y = y))
#> [1] "y"

Created on 2020-02-25 by the reprex package (v0.3.0)

Below is a more realistic reprex for two actual bioconductor classes that behave near identical to vectors.

# BiocManager::install("S4Vectors")
suppressPackageStartupMessages(library(S4Vectors))
library(ggplot2)

# Factor is a class wherein levels can be pretty much anything
x <- Factor(levels = complex(real = 1:5, imaginary = 5:1), index = 1:3)
# Rle is a run length encoded Vector that behaves like a vector, 
# in contrast to base::rle
y <- Rle(rep(1:5, 5:1))

# They fail the non-data column check
ggplot2:::check_nondata_cols(list(x = x, y = y))
#> [1] "x" "y"

Created on 2020-02-25 by the reprex package (v0.3.0)

The Rle and Factor classes have length(), subsetting, subassignment, concatenation and most of the usual behaviour you'd expect from an vector, it is just the internal representation that is different.

@clauswilke
Copy link
Member

Ok. In this case, if inherits(x, "Vector") captures a large proportion of the data types that might likely have this feature, I think explicitly allowing it is the right strategy. Do you want to prepare a PR? Just make sure you document it so we understand in the future what this check is there for.

@teunbrand
Copy link
Collaborator Author

Yes, most S4 data classes in Bioconductor I know of probably inherit from Vector in one way or another.
A small complication likely lies in discriminating linear vectors from rectangular 'vectors' (such as S4Vectors::DataFrame), so I'd suppose the check would have to be

(inherits(x, "Vector") && is.null(dim))

I'll go start preparing a PR!

@clauswilke
Copy link
Member

Are rectangular vectors a problem? We've talked about allowing data frame columns, and this wouldn't be any different I think. No need to ensure dim is NULL.

@teunbrand
Copy link
Collaborator Author

teunbrand commented Feb 25, 2020

I don't necessarily think they are problematic. Just to check: data.frame columns are okay then? Out of curiosity, could you point me to the relevant conversation? (so I might glimpse an example)

@clauswilke
Copy link
Member

Yes, data frame columns are Ok:

rlang::is_vector(data.frame(x = 1, y = 2))
#> [1] TRUE

Created on 2020-02-25 by the reprex package (v0.3.0)

Out of curiosity, could you point me to the relevant conversation?

All relevant conversations happen on Twitter: https://twitter.com/hadleywickham/status/1009674367624777728?s=20

clauswilke pushed a commit that referenced this issue Mar 8, 2020
* Let check_nondata_cols allow Vector class

* Remove as_tibble from as_gg_data_frame

* Add NEWS.md bullet

* Rephrase news bullet
@clauswilke
Copy link
Member

Closed via #3837.

sthagen added a commit to sthagen/tidyverse-ggplot2 that referenced this issue Mar 9, 2020
Let check_nondata_cols allow Vector class (tidyverse#3835) (tidyverse#3837)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants