Skip to content

Commit 4c15108

Browse files
authored
Merge pull request #2 from bearloga/master
Addresses minor prereq issues
2 parents edf5571 + 108becf commit 4c15108

File tree

3 files changed

+4
-5
lines changed

3 files changed

+4
-5
lines changed

README.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Here are some practical, related topics we will cover for each algorithm:
2727
- Software
2828
- Scalability
2929

30-
Instructions for how to install the neccessary software for this tutorial is available [here](tutorial-installation.md). Data for the tutorial can be downloaded by running `./data/get-data.sh`.
30+
Instructions for how to install the neccessary software for this tutorial is available [here](tutorial-installation.md). Data for the tutorial can be downloaded by running `./data/get-data.sh` (requires **wget**).
3131

3232
## Dimensionality Issues
3333
Certain algorithms don't scale well when there are millions of features. For example, decision trees require computing some sort of metric (to determine the splits) on all the feature values (or some fraction of the values as in Random Forest and Stochastic GBM). Therefore, computation time is linear in the number of features. Other algorithms, such as GLM, scale much better to high-dimensional (n << p) and wide data with appropriate regularization (e.g. Lasso, Elastic Net, Ridge).
@@ -66,7 +66,6 @@ For each algorithm, we will provide examples of open source R packages that impl
6666
## Scalability
6767
We will address scalability issues inherent to the algorithm and discuss algorithmic or technological solutions to scalability concerns for "big data."
6868

69-
7069
# Resources
7170

7271
Where to learn more?

data/get-mnist.sh

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,6 @@ wget https://h2o-public-test-data.s3.amazonaws.com/bigdata/laptop/mnist/test.csv
66
gunzip train.csv.gz
77
gunzip test.csv.gz
88

9-
mv train.csv mnist_train.cv
10-
mv test.csv mnist_test.cv
9+
mv train.csv mnist_train.csv
10+
mv test.csv mnist_test.csv
1111

tutorial-installation.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ pip3 install -U jupyter
4242
Install the IRkernel binary in R. More info [here](https://irkernel.github.io/installation/).
4343

4444
```r
45-
install.packages(c('repr', 'pbdZMQ', 'devtools'))
45+
install.packages(c('repr', 'pbdZMQ', 'devtools'), repos = c(CRAN = "https://cran.rstudio.com"))
4646
library(devtools)
4747
devtools::install_github('IRkernel/IRdisplay')
4848
devtools::install_github('IRkernel/IRkernel')

0 commit comments

Comments
 (0)