Generated from 140,000 most starred projects on GitHub in October 2016. Legacy pipeline, no splitting and stemming, later converted with quality loss.
Example:
from sourced.ml.models import Id2Vec
id2vec = Id2Vec().load("92609e70-f79c-46b5-8419-55726e873cfc")
print("Number of tokens:", len(id2vec))
ID | 92609e70-f79c-46b5-8419-55726e873cfc |
Uploaded | 2017-06-18 17:37:06.255615 |
Version | 1.0.0 |
File | https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fid2vec%2F92609e70-f79c-46b5-8419-55726e873cfc.asdf |
Size | 1.1 GB |
Data collection date | October 2016 |
Number of (sub)tokens | 5,720,096 |
Number of repositories | 112,273 |
License |