Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Incremental (delta) update #928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental (delta) update #928
Changes from all commits
fa09d0b
99a5327
f01b3a2
8fa1534
67824e6
c11d797
ee6640d
5e446b5
71c3469
83366aa
a22916c
d9e4f26
c622ba4
58c27f0
ad5ee5a
079ba7a
de026e3
d1d066e
046731b
9f52c8b
802a934
f3a7b12
3e7fa37
07968ac
d504461
d7b8623
0464c16
8093000
b505df7
0dd71a2
932f64b
b2430c4
c02d689
638e737
2b29498
626ba5a
e1c0325
b0958a8
400a991
59d1713
ce35a6e
e085280
735af02
f3ebf97
59b7666
055ce91
eaa8dc6
7d0a283
95a206b
de72329
55269ab
b7b16ba
723a1a6
9135b2d
c670e33
773b22d
e8de5f2
e8d6f2d
803345a
08a4c1b
b0470ce
5470104
114e80a
2ab1759
594ef7d
567d63f
5ca0689
3debff1
10b95cd
e1f60c7
be704a2
5decfeb
ab9f9a3
e2f5bf3
ee27458
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Check warning on line 84 in src/datachain/delta.py
src/datachain/delta.py#L84
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this logic work with studio flags ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I assumed all datasets are present locally and I think this will always be the case since the only case where one of the dependencies is not present locally, but in Studio, is when we pull dataset from it (we don't pull it's dependencies). But when using delta update, user is creating / saving dataset so this is not about pulling existing one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, will I be able to use something like
from_dataset(studio=True, delta=True)
? (and it automatically pulls the versions of the dataset that I need .. wdyt?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in this example:
This will happen:
.from_dataset(studio=True)
works generally.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if someone else is running this code and there are already a few versions of the dataset on Studio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be the same, the last version of
my-studio-ds
would be fetched locally and from that on, it would use that local dataset.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, should we actually check / pull two versions from Studio in this case (initial run) - to be able to get the diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, this one is also not resolved yet ... can be a followup, but it seems like an issue for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does it mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change I had some issues in
diff
codebase (stopped working for some cases). I've tried to investigated but it took longer time than expected so for temporary fix I did a workaround and wanted to create an issue for it to understand real cause. Looks like I just put TODO and forgot to create an issue - will do it now.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it fixed? can we cleanup this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I will create separate issue to deal with this later on. I don't think it's high priority as this workaround works fine for now. It's tricky to fix though so that's why I don't want to spent too much time on it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when does it happen? if we have an issue with context - let's put a link in this todo please ... it should be clear as much as possible from the context what is happening here (and why)
what is
sys=True
- it's not documented it seems (why?)Check warning on line 164 in src/datachain/diff/__init__.py
src/datachain/diff/__init__.py#L164
Check warning on line 194 in src/datachain/lib/dc/datachain.py
src/datachain/lib/dc/datachain.py#L194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it public? do we want it to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason why it shouldn't be public. User could check if some chain is in "delta" mode or not. It is also used in some other internal methods for which it doesn't need to be public.
I can make it private as well, don't have strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, that's fine .. but if we keep it public we need proper docs for it then ... and an example if you have something in mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor things that is left here @ilongin ... let's please take care of it