Skip to content

SparseSeries accepts scipy.sparse.spmatrix in constructor #16617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

kernc
Copy link
Contributor

@kernc kernc commented Jun 6, 2017

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a quick look. this is probably going to be tricky to make work.

@@ -25,6 +25,9 @@ Enhancements
has been added to return the group order (:issue:`11642`); see
:ref:`here <groupby.ngroup>`.


- ``SparseSeries`` and ``SparseArray`` now support 1d ``scipy.sparse.spmatrix`` in constructor. Additionally, ``SparseDataFrame`` can be assigned columns of ``scipy.sparse.spmatrix``; see :ref:`here <sparse.scipysparse_series>`. (:issue:`15634`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be for 0.21.0

else:
# 2d; make it iterable
value = list(value.tocsc().T)
super().__setitem__(key, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the fully qualified call

@@ -722,6 +726,9 @@ def combine_first(self, other):

def to_coo(self, row_levels=(0, ), column_levels=(1, ), sort_labels=False):
"""
DEPRECATED; instead, make a SparseSeries with a two-level index,
unstack it, then use .to_coo() on the resulting SparseDataFrame.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the deprecated sphinx directive (I think we are changing these all over)

@@ -779,6 +786,9 @@ def to_coo(self, row_levels=(0, ), column_levels=(1, ), sort_labels=False):
@classmethod
def from_coo(cls, A, dense_index=False):
"""
DEPRECATED; instead, pass 1d scipy.sparse matrices directly into
SparseSeries constructor, and 2d into SparseDataFrame constructor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type labels Jun 6, 2017
@kernc
Copy link
Contributor Author

kernc commented Jun 6, 2017

this is probably going to be tricky to make work.

Why do you think so? It does seem to work for the moment. 😃

@codecov
Copy link

codecov bot commented Jun 7, 2017

Codecov Report

Merging #16617 into master will decrease coverage by <.01%.
The diff coverage is 84.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16617      +/-   ##
==========================================
- Coverage   90.96%   90.95%   -0.01%     
==========================================
  Files         161      161              
  Lines       49263    49287      +24     
==========================================
+ Hits        44810    44827      +17     
- Misses       4453     4460       +7
Flag Coverage Δ
#multiple 88.71% <84.84%> (-0.01%) ⬇️
#single 40.22% <30.3%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/sparse/series.py 95.1% <100%> (+0.03%) ⬆️
pandas/core/indexing.py 94.01% <100%> (ø) ⬆️
pandas/core/sparse/frame.py 94.77% <100%> (+0.51%) ⬆️
pandas/core/sparse/array.py 91.62% <100%> (+0.16%) ⬆️
pandas/core/internals.py 93.56% <54.54%> (+0.13%) ⬆️
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10c17d4...293bb47. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 7, 2017

Codecov Report

Merging #16617 into master will decrease coverage by <.01%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16617      +/-   ##
==========================================
- Coverage   90.96%   90.95%   -0.01%     
==========================================
  Files         161      161              
  Lines       49263    49287      +24     
==========================================
+ Hits        44810    44827      +17     
- Misses       4453     4460       +7
Flag Coverage Δ
#multiple 88.71% <85.71%> (-0.01%) ⬇️
#single 40.22% <28.57%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/sparse/frame.py 94.77% <100%> (+0.51%) ⬆️
pandas/core/sparse/array.py 91.62% <100%> (+0.16%) ⬆️
pandas/core/sparse/series.py 95.1% <100%> (+0.03%) ⬆️
pandas/core/indexing.py 94.01% <100%> (ø) ⬆️
pandas/core/internals.py 93.56% <61.53%> (+0.13%) ⬆️
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10c17d4...ef03e73. Read the comment docs.

kernc added 2 commits June 7, 2017 18:17
In Python 2, inner-block anonymous exception seems to overwrite the
outer-block anonymous exception. We're supposed to re-raise the latter.
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good. a couple of comments.

sdf[['z', 'w']] = sp_arr[:, [7, 8]]
sdf.iloc[:, -5:]

Below interface is deprecated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say that this is deprecated in 0.21.0

@@ -25,6 +25,10 @@ New features
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)

- ``SparseSeries`` and ``SparseArray`` now support 1d ``scipy.sparse.spmatrix`` in constructor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the constructor

@@ -25,6 +25,10 @@ New features
- Added `__fspath__` method to :class`:pandas.HDFStore`, :class:`pandas.ExcelFile`,
and :class:`pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)

- ``SparseSeries`` and ``SparseArray`` now support 1d ``scipy.sparse.spmatrix`` in constructor.
Additionally, ``SparseDataFrame`` can be assigned columns of ``scipy.sparse.spmatrix``;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this 2nd sentence a separate bullet point (you can use same issue on both of them, or 2nd one should be the PR number maybe)

kind=self._default_kind)
else:
# 2d; make it iterable
value = list(value.tocsc().T)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this materialize?

spm = csr_matrix(np.arange(len(sdf))).T
sdf['X'] = spm
assert _equal(sdf[['X']].to_coo(), spm)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comparision on the scipy side is fine, but also let's compare with assert_sparse_series/frame_equal


# 1d row -- changing series contents not yet supported
spm = csr_matrix(np.arange(sdf.shape[1], dtype=float))
idx = np.zeros(sdf.shape[0], dtype=bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test with .loc/.iloc as well (might already be another issue about this, if not and its too complicated for here, then create a new issue)

@jreback
Copy link
Contributor

jreback commented Jul 19, 2017

can you rebase / update

@jreback
Copy link
Contributor

jreback commented Sep 10, 2017

@kernc if you have time can you rebase / update

@kernc
Copy link
Contributor Author

kernc commented Sep 10, 2017

For some reason (probably assert_raises in test_setitem_spmatrix()), I had a while ago decided to first implement setitem on SparseSeries that I have 80% done waiting in a feature branch somewhere. If you insist, I can push this first, however.

@jreback
Copy link
Contributor

jreback commented Sep 10, 2017

oh we can do the other first
was just looking thru open prs

what was that number?

@jreback jreback removed this from the 0.21.0 milestone Sep 23, 2017
@kernc
Copy link
Contributor Author

kernc commented Oct 4, 2017

Right, it's #17785.

@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

closing as stale. if you want to continue working, pls ping.

@jreback jreback closed this Jan 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API/DEPR: deprecate SparseSeries.from_coo and accept in constructor
3 participants