Skip to content

Commit 43f5b0c

Browse files
committed
initial port of v4 files
1 parent 06dbbdf commit 43f5b0c

File tree

255 files changed

+309801
-302370
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

255 files changed

+309801
-302370
lines changed

CHANGELOG.md

+52-5
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,53 @@
11
## Changelog
22

3-
### 2022-08-03 - [V3.0]
3+
### Roadmap
4+
5+
- Incorporate FlashConv implementation of faster FFT convolution.
6+
- Add setup.py file for independent installation
7+
- Some small improvements to S4 and improve experiment configs (e.g. LRA)
8+
9+
10+
### [4.0.0a] - 2023-02-28
11+
12+
13+
#### Breaking Changes to Models
14+
- The CUDA kernel has been updated and must be recompiled.
15+
- The shape of the `log_dt` parameter inside the S4 kernel now has shape `(H, 1)` instead of `(H,)`, where `H` is the model dimension (`d_model`).
16+
- The DPLR S4 kernel had a parameter `inv_w_real` and the diagonal S4 kernel (S4D) had a parameter called called `inv_A_real`. They are now both called `A_real`. The NPLR S4 kernel's parameter `w_imag` is now called `A_imag`.
17+
18+
To address differences between models trained on earlier versions and the current V4:
19+
- The CUDA kernel should be re-compiled if moving between versions of this codebase.
20+
- The script `checkpoints/port_v3_to_v4.py` can be used to convert models (see below).
21+
22+
#### Repository Restructuring
23+
24+
- Information about specific papers and models (e.g. model description, overview of code, documentation of experiments) have been moved into the `models/` folder.
25+
- Standalone S4 module has been moved from `src/models/s4/` to `models/s4/`.
26+
- General sequence modeling framework under [src/models/sequence/](src/models/sequence/) has been reorganized. The old state space modules `src/models/sequence/ss/` have been removed; the S4 module has been broken into a generic convolution block in [src/models/sequence/modules/](src/models/sequence/modules/) and the inner linear SSM kernel moved to [src/models/sequence/kernels/](src/models/sequence/kernels/).
27+
- More experiments have been added to [configs/experiments/](configs/experiments/) with improved structuring.
28+
29+
30+
#### New CUDA Kernels
31+
- The Cauchy CUDA kernel has been updated and must be recompiled.
32+
- There is now a CUDA kernel for the Vandermonde operation of S4D, speeding it up over the naive and `pykeops` versions. S4D should now be faster than S4 in all versions (naive, pykeops, or CUDA kernel).
33+
34+
#### New Utility Scripts
35+
- The `/checkpoints/` folder can be used to score checkpoints and contains several scripts for working with them. See `/checkpoints/README.md` for detailed usage.
36+
- `/checkpoints/evaluate.py` takes a trained model and prints metrics on evaluation datasets.
37+
- `/checkpoints/port_v3_to_v4.py` converts a model from V3 to V4 code.
38+
39+
40+
#### New models
41+
- [S4ND](models/s4nd/)
42+
- Recent new models based on or closely related to S4, such as [GSS and Mega](models/related/)
43+
- Other [long convolution kernels](src/models/sequence/kernels/) such as simple "wide kernel CNN" baseline (`model.layer.mode=conv`)
44+
45+
#### S4 layer
46+
- `model.layer.measure` has been renamed to `model.layer.init`. The name `measure` originally referred to approximation measures in the HiPPO theory, but they are only used as initialization in trainable SSM models. There are also many more initializations not based on the HiPPO theory, even the simple S4D-Lin model from the [minimal S4D standalone](models/s4/).
47+
- TODO document some of the new features
48+
49+
50+
### [3.0.0] - 2022-08-03
451

552
#### Models and Features
653
- Updated version of S4 module, including new measures and theory from [[How to Train Your HiPPO](https://arxiv.org/abs/2206.12037)] (https://github.com/HazyResearch/state-spaces/issues/21, https://github.com/HazyResearch/state-spaces/issues/54)
@@ -41,18 +88,18 @@ Note that there have been various refactors and miscellaneous changes which may
4188
- Reorganized the [README](README.md) and added much more [documentation](README.md#readmes) for using this codebase
4289

4390

44-
### 2022-05-01 - [V2.1]
91+
### [2.1.0] - 2022-05-01
4592
- Minor updates to S4 modules
4693
- By default, S4 no longer requires installing Pykeops or a custom CUDA kernel.
4794
- New S4D (S4-diagonal) standalone model found at `src/models/sequence/ss/standalone/s4d.py`. Simple variant using diagonal SSMs that recovers S4's performance on most tasks. Can be run with any existing experiment config with the additional flag `model/layer=s4d` on the command line.
4895
- New [LRA configs](#long-range-arena-lra) for updated S4 code, with an average score of ~86
4996

50-
### 2022-02-27 - [V2]
97+
### [2.0.0] - 2022-02-27
5198
Code release for SaShiMi audio model
5299

53-
### 2022-01-29 - [V1.1]
100+
### [1.1.0] - 2022-01-29
54101
Added configs for time series datasets from the Informer paper (https://github.com/HazyResearch/state-spaces/issues/4)
55102

56-
### 2021-11-18 - [V1]
103+
### [1.0.0] - 2021-11-18
57104
First release of this repository containing the S4 module and configs to reproduce sCIFAR, Speech Commands, Long Range Arena, and WikiText-103 results
58105

README.md

+29-129
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,12 @@
11
# Structured State Spaces for Sequence Modeling
22

3-
This repository provides implementations and experiments for the following papers.
3+
This repository provides the official implementations and experiments for models related to [S4](https://arxiv.org/abs/2111.00396),
4+
including [HiPPO](https://arxiv.org/abs/2008.07669), [LSSL](https://arxiv.org/abs/2110.13985), [SaShiMi](https://arxiv.org/abs/2202.09729),
5+
[DSS](https://arxiv.org/abs/2203.14343), [HTTYH](https://arxiv.org/abs/2206.12037), [S4D](https://arxiv.org/abs/2206.11893),
6+
and [S4ND](https://arxiv.org/abs/2210.06583).
47

5-
## S4D
6-
7-
![S4D](assets/s4d.png "S4D: The diagonal variant of S4")
8-
> **On the Parameterization and Initialization of Diagonal State Space Models**\
9-
> Albert Gu, Ankit Gupta, Karan Goel, Christopher Ré\
10-
> Paper: https://arxiv.org/abs/2206.11893
11-
12-
Other variants including [DSS](https://github.com/ag1988/dss) and [GSS](https://arxiv.org/abs/2206.13947) are also supported. DSS is the predecessor to S4D that is also available in its own [fork](https://github.com/ag1988/dss).
13-
14-
## HTTYH
15-
16-
![HTTYH](assets/httyh.png "Basis Functions for S4 Variants")
17-
> **How to Train Your HiPPO: State Spaces with Generalized Orthogonal Basis Projections**\
18-
> Albert Gu*, Isys Johnson*, Aman Timalsina, Atri Rudra, Christopher Ré\
19-
> Paper: https://arxiv.org/abs/2206.12037
20-
21-
## SaShiMi (ICML 2022 - Long Talk)
22-
23-
![SaShiMi](assets/sashimi.png "SaShiMi Architecture")
24-
> **It's Raw! Audio Generation with State-Space Models**\
25-
> Karan Goel, Albert Gu, Chris Donahue, Christopher Ré\
26-
> Paper: https://arxiv.org/abs/2202.09729
27-
28-
## S4 (ICLR 2022 - Outstanding Paper HM)
29-
30-
![Structured State Spaces](assets/s4.png "Properties of Structured State Spaces")
31-
> **Efficiently Modeling Long Sequences with Structured State Spaces**\
32-
> Albert Gu, Karan Goel, Christopher Ré\
33-
> Paper: https://arxiv.org/abs/2111.00396
34-
35-
## LSSL (NeurIPS 2021)
36-
37-
![Linear State Space Layer](assets/splash.png "Properties of State Spaces")
38-
> **Combining Recurrent, Convolutional, and Continuous-time Models with the Linear State Space Layer**\
39-
> Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré\
40-
> Paper: https://arxiv.org/abs/2110.13985
41-
42-
## HiPPO (NeurIPS 2020 - Spotlight)
43-
![HiPPO Framework](assets/hippo.png "HiPPO Framework")
44-
> **HiPPO: Recurrent Memory with Optimal Polynomial Projections**\
45-
> Albert Gu*, Tri Dao*, Stefano Ermon, Atri Rudra, Christopher Ré\
46-
> Paper: https://arxiv.org/abs/2008.07669
8+
Project-specific information for each of these models, including overview of the source code and specific experiment reproductions,
9+
can be found under [models/](models/).
4710

4811

4912
## Table of Contents
@@ -52,15 +15,10 @@ Setting up the environment and porting S4 to external codebases:
5215
- [Setup](#setup)
5316
- [Getting Started with S4](#getting-started-with-s4)
5417

55-
Reproducing experiments from the papers:
56-
- [Experiments](#experiments)
57-
- [SaShiMi](sashimi/)
58-
5918
Using this repository for training models:
6019
- [Training](#training)
6120
- [Generation](#generation)
6221
- [Repository Structure](#overall-repository-structure)
63-
- [READMEs](#readmes)
6422
- [Citation](#citation)
6523

6624
### Changelog
@@ -78,18 +36,18 @@ See [CHANGELOG.md](CHANGELOG.md)
7836
This repository requires Python 3.8+ and Pytorch 1.10+.
7937
Other packages are listed in [requirements.txt](./requirements.txt).
8038

81-
### Cauchy Kernel
39+
### Structured Kernels
8240

83-
A core operation of S4 is the "Cauchy kernel" described in the [paper](https://arxiv.org/abs/2111.00396).
84-
This is actually a very simple operation; a naive implementation of this operation can be found in the [standalone](src/models/s4/s4.py) in the function `cauchy_naive`.
41+
A core operation of S4 are the Cauchy and Vandermonde kernels described in the [paper](https://arxiv.org/abs/2111.00396).
42+
These are very simple matrix multiplications; a naive implementation of these operation can be found in the [standalone](models/s4/s4.py) in the function `cauchy_naive` and `log_vandermonde_naive`.
8543
However, as the paper describes, this has suboptimal memory usage that currently requires a custom kernel to overcome in PyTorch.
8644

8745
Two more efficient methods are supported. The code will automatically detect if either of these is installed and call the appropriate kernel.
8846

8947
#### Custom CUDA Kernel
9048

9149
This version is faster but requires manual compilation for each machine environment.
92-
Run `python setup.py install` from the directory `extensions/cauchy/`.
50+
Run `python setup.py install` from the directory `extensions/kernels/`.
9351

9452
#### Pykeops
9553

@@ -109,20 +67,20 @@ See [notebooks/](notebooks/) for visualizations explaining some concepts behind
10967
### Example Train Script (External Usage)
11068

11169
[example.py](example.py) is a self-contained training script for MNIST and CIFAR that imports the standalone S4 file. The default settings `python example.py` reaches 88% accuracy on sequential CIFAR with a very simple S4D model of 200k parameters.
112-
This script can be used as an example for using S4 in external repositories.
70+
This script can be used as an example for using S4 variants in external repositories.
11371

11472
### Training with this Repository (Internal Usage)
11573

11674
This repository aims to provide a very flexible framework for training sequence models. Many models and datasets are supported.
11775

118-
Basic usage is `python -m train`, or equivalently
76+
The basic entrypoint is `python -m train`, or equivalently
11977
```
12078
python -m train pipeline=mnist model=s4
12179
```
12280
which trains an S4 model on the Permuted MNIST dataset.
12381
This should get to around 90% after 1 epoch which takes 1-3 minutes depending on GPU.
12482

125-
More examples of using this repository can be found in [Experiments](#experiments) and [Training](#training).
83+
More examples of using this repository are documented throughout. See [Training](#training) for an overview.
12684

12785
### Optimizer Hyperparameters
12886

@@ -136,14 +94,12 @@ See the method `register` in the model (e.g. [s4d.py](src/models/s4/s4d.py)) and
13694
Our logic for setting these parameters can be found in the `OptimModule` class under `src/models/sequence/ss/kernel.py` and the corresponding optimizer hook in `SequenceLightningModule.configure_optimizers` under `train.py`
13795
-->
13896

139-
### HiPPO/S4 Visualizations
140-
141-
Figures from the HTTYH and S4D papers can be visualized from [notebooks/](notebooks/). These include [animations](notebooks/hippo_function_approximation.ipynb) of HiPPO and S4 that were used in various S4 talks. The animation code can also be found in a [.py file](src/models/hippo/visualizations.py) instead of notebook.
14297

143-
## Experiments
98+
## Training
14499

145-
Instructions for reproducing experiments from the papers can be found in [experiments.md](experiments.md).
100+
The core training infrastructure of this repository is based on [Pytorch-Lightning](https://pytorch-lightning.readthedocs.io/en/latest/) with a configuration scheme based on [Hydra](https://hydra.cc/docs/intro/).
146101

102+
The main entrypoint is `train.py` and configs are found in `configs/`.
147103

148104
### Data
149105

@@ -156,15 +112,8 @@ The README inside this subdirectory documents how to download and organize other
156112
Models are defined in [src/models](src/models). See the README in this subdirectory for an overview.
157113

158114

159-
160-
## Training
161-
162-
The core training infrastructure of this repository is based on [Pytorch-Lightning](https://pytorch-lightning.readthedocs.io/en/latest/) with a configuration scheme based on [Hydra](https://hydra.cc/docs/intro/).
163-
164-
The main entrypoint is `train.py` and configs are found in `configs/`.
165-
166115
### Configs and Hyperparameters
167-
Pre-defined configs for many end-to-end experiments are provided (see [experiments.md](experiments.md)).
116+
Pre-defined configs reproducing end-to-end experiments from the papers are provided, found under project-specific information in [models/](models/), such as for the [original S4 paper](models/s4/experiments.md).
168117

169118
Configs can also be easily modified through the command line.
170119
An example experiment is
@@ -175,6 +124,7 @@ This uses the Permuted MNIST task with an S4 model with a specified number of la
175124

176125
See [configs/README.md](configs/) for more detailed documentation about the configs.
177126

127+
178128
#### Hydra
179129

180130
It is recommended to read the [Hydra documentation](https://hydra.cc/docs/intro/) to fully understand the configuration framework. For help launching specific experiments, please file an issue.
@@ -274,80 +224,30 @@ This option only needs the path to the Hydra experiment folder and the desired c
274224

275225
## Overall Repository Structure
276226
```
277-
configs/ config files for model, data pipeline, training loop, etc.
278-
data/ default location of raw data
279-
extensions/ CUDA extension for Cauchy kernel
280-
src/ main source code for models, datasets, etc.
281-
callbacks/ training loop utilities (e.g. checkpointing)
282-
dataloaders/ dataset and dataloader definitions
283-
models/ model definitions
284-
tasks/ encoder/decoder modules to interface between data and model backbone
227+
configs/ Config files for model, data pipeline, training loop, etc.
228+
data/ Default location of raw data
229+
extensions/ CUDA extensions (Cauchy and Vandermonde kernels)
230+
src/ Main source code for models, datasets, etc.
231+
callbacks/ Training loop utilities (e.g. checkpointing)
232+
dataloaders/ Dataset and dataloader definitions
233+
models/ Model definitions
234+
tasks/ Encoder/decoder modules to interface between data and model backbone
285235
utils/
286-
sashimi/ SaShiMi README and additional code (generation, metrics, MTurk)
236+
models/ Model-specific information (code, experiments, additional resources)
287237
example.py Example training script for using S4 externally
288238
train.py Training entrypoint for this repo
289239
generate.py Autoregressive generation script
290240
```
291241

292-
## READMEs
293-
In addition to this top level README, several READMEs detailing the usage of this repository are organized in subdirectories.
294-
295-
- [src/dataloaders/README.md](src/dataloaders/)
296-
- [src/models/README.md](src/models/)
297-
- [src/models/s4/README.md](src/models/s4/)
298-
- [experiments.md](experiments.md)
299-
- [configs/README.md](configs/)
300-
- [configs/model/README.md](configs/model/)
301-
- [configs/experiment/README.md](configs/experiment/)
302-
- [sashimi/README.md](sashimi/)
303-
304-
305-
306242

307243
## Citation
308-
If you use this codebase, or otherwise found our work valuable, please cite:
309-
```
310-
@article{gu2022s4d,
311-
title={On the Parameterization and Initialization of Diagonal State Space Models},
312-
author={Gu, Albert and Gupta, Ankit and Goel, Karan and R\'e, Christopher},
313-
journal={arXiv preprint arXiv:2206.11893},
314-
year={2022}
315-
}
316-
317-
@article{gu2022hippo,
318-
title={How to Train Your HiPPO: State Space Models with Generalized Basis Projections},
319-
author={Gu, Albert and Johnson, Isys and Timalsina, Aman and Rudra, Atri and R\'e, Christopher},
320-
journal={arXiv preprint arXiv:2206.12037},
321-
year={2022}
322-
}
323-
324-
@article{goel2022sashimi,
325-
title={It's Raw! Audio Generation with State-Space Models},
326-
author={Goel, Karan and Gu, Albert and Donahue, Chris and R{\'e}, Christopher},
327-
journal={International Conference on Machine Learning ({ICML})},
328-
year={2022}
329-
}
244+
If you use this codebase, or otherwise found our work valuable, please cite the S4 paper and [other relevant papers](models/README.md#citations)
330245

246+
```
331247
@inproceedings{gu2022efficiently,
332248
title={Efficiently Modeling Long Sequences with Structured State Spaces},
333249
author={Gu, Albert and Goel, Karan and R\'e, Christopher},
334250
booktitle={The International Conference on Learning Representations ({ICLR})},
335251
year={2022}
336252
}
337-
338-
@article{gu2021combining,
339-
title={Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers},
340-
author={Gu, Albert and Johnson, Isys and Goel, Karan and Saab, Khaled and Dao, Tri and Rudra, Atri and R{\'e}, Christopher},
341-
journal={Advances in neural information processing systems},
342-
volume={34},
343-
year={2021}
344-
}
345-
346-
@article{gu2020hippo,
347-
title={HiPPO: Recurrent Memory with Optimal Polynomial Projections},
348-
author={Gu, Albert and Dao, Tri and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
349-
journal={Advances in neural information processing systems},
350-
volume={33},
351-
year={2020}
352-
}
353253
```

assets/properties.png

204 KB
Loading

assets/s4nd.png

2.38 MB
Loading

checkpoints/README.md

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
Scripts for working with model checkpoints. Also serves as a convenient place to store specific checkpoints.
2+
3+
4+
### Generation script
5+
The generation script lies outside this folder and is documented in the main README.
6+
An example usage is
7+
```
8+
python -m generate experiment=lm/s4-wt103 checkpoint_path=checkpoints/s4-wt103.ckpt n_samples=1 l_sample=16384 l_prefix=8192 decode=text
9+
10+
```
11+
12+
### Evaluation script
13+
14+
The evaluation script `evaluate.py` follows a similar interface to the generation script.
15+
```
16+
python -m evaluate wandb=null experiment=lm/s4-wt103 train.ckpt='/dfs/scratch1/albertgu/projects/hippo/checkpoints/new_wt103_test_new.ckpt' trainer.devices=1 loader.batch_size=1
17+
```
18+
Note that the numbers reported in papers are those logged during training, not numbers reported by this script, which may differ slightly.
19+
20+
### Converting .ckpt (PyTorch Lightning) checkpoint to .pt (PyTorch)
21+
```
22+
python -m checkpoints.convert_pl_to_pt checkpoints/<name>.ckpt
23+
```
24+
This example creates a file `checkpoints/<name>.pt`.
25+
26+
### Converting V3 model to V4
27+
```
28+
python -m checkpoints.port_v3_to_v4 checkpoint_path=checkpoints/s4-wt103-v3.ckpt
29+
```
30+
This script follows the structure of the generation script and supports a few more advanced options. You can convert the model and test it on a batch immediately by passing in `test_model=true`. This requires a valid experiment configuration so that a model and dataloader can be constructed. The two options for loading the `generate.py` script from either a `checkpoint_path` or `experiment_path` argument also apply here.
31+
```
32+
python -m checkpoints.port_v3_to_v4 test_model=true checkpoint_path=checkpoints/s4-wt103-v3.ckpt experiment=lm/s4-wt103 trainer.devices=1 loader.batch_size=1
33+
```

checkpoints/convert_pl_to_pt.py

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
import argparse
2+
import torch
3+
from pathlib import Path
4+
5+
from train import SequenceLightningModule
6+
7+
8+
parser = argparse.ArgumentParser()
9+
10+
parser.add_argument("ckpt_path", type=str)
11+
args = parser.parse_args()
12+
13+
ckpt = torch.load(args.ckpt_path, map_location='cuda')
14+
state_dict = ckpt['state_dict']
15+
16+
torch.save(state_dict, Path(args.ckpt_path).with_suffix(".pt"))

0 commit comments

Comments
 (0)