You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+52-5
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,53 @@
1
1
## Changelog
2
2
3
-
### 2022-08-03 - [V3.0]
3
+
### Roadmap
4
+
5
+
- Incorporate FlashConv implementation of faster FFT convolution.
6
+
- Add setup.py file for independent installation
7
+
- Some small improvements to S4 and improve experiment configs (e.g. LRA)
8
+
9
+
10
+
### [4.0.0a] - 2023-02-28
11
+
12
+
13
+
#### Breaking Changes to Models
14
+
- The CUDA kernel has been updated and must be recompiled.
15
+
- The shape of the `log_dt` parameter inside the S4 kernel now has shape `(H, 1)` instead of `(H,)`, where `H` is the model dimension (`d_model`).
16
+
- The DPLR S4 kernel had a parameter `inv_w_real` and the diagonal S4 kernel (S4D) had a parameter called called `inv_A_real`. They are now both called `A_real`. The NPLR S4 kernel's parameter `w_imag` is now called `A_imag`.
17
+
18
+
To address differences between models trained on earlier versions and the current V4:
19
+
- The CUDA kernel should be re-compiled if moving between versions of this codebase.
20
+
- The script `checkpoints/port_v3_to_v4.py` can be used to convert models (see below).
21
+
22
+
#### Repository Restructuring
23
+
24
+
- Information about specific papers and models (e.g. model description, overview of code, documentation of experiments) have been moved into the `models/` folder.
25
+
- Standalone S4 module has been moved from `src/models/s4/` to `models/s4/`.
26
+
- General sequence modeling framework under [src/models/sequence/](src/models/sequence/) has been reorganized. The old state space modules `src/models/sequence/ss/` have been removed; the S4 module has been broken into a generic convolution block in [src/models/sequence/modules/](src/models/sequence/modules/) and the inner linear SSM kernel moved to [src/models/sequence/kernels/](src/models/sequence/kernels/).
27
+
- More experiments have been added to [configs/experiments/](configs/experiments/) with improved structuring.
28
+
29
+
30
+
#### New CUDA Kernels
31
+
- The Cauchy CUDA kernel has been updated and must be recompiled.
32
+
- There is now a CUDA kernel for the Vandermonde operation of S4D, speeding it up over the naive and `pykeops` versions. S4D should now be faster than S4 in all versions (naive, pykeops, or CUDA kernel).
33
+
34
+
#### New Utility Scripts
35
+
- The `/checkpoints/` folder can be used to score checkpoints and contains several scripts for working with them. See `/checkpoints/README.md` for detailed usage.
36
+
-`/checkpoints/evaluate.py` takes a trained model and prints metrics on evaluation datasets.
37
+
-`/checkpoints/port_v3_to_v4.py` converts a model from V3 to V4 code.
38
+
39
+
40
+
#### New models
41
+
-[S4ND](models/s4nd/)
42
+
- Recent new models based on or closely related to S4, such as [GSS and Mega](models/related/)
43
+
- Other [long convolution kernels](src/models/sequence/kernels/) such as simple "wide kernel CNN" baseline (`model.layer.mode=conv`)
44
+
45
+
#### S4 layer
46
+
-`model.layer.measure` has been renamed to `model.layer.init`. The name `measure` originally referred to approximation measures in the HiPPO theory, but they are only used as initialization in trainable SSM models. There are also many more initializations not based on the HiPPO theory, even the simple S4D-Lin model from the [minimal S4D standalone](models/s4/).
47
+
- TODO document some of the new features
48
+
49
+
50
+
### [3.0.0] - 2022-08-03
4
51
5
52
#### Models and Features
6
53
- Updated version of S4 module, including new measures and theory from [[How to Train Your HiPPO](https://arxiv.org/abs/2206.12037)] (https://github.com/HazyResearch/state-spaces/issues/21, https://github.com/HazyResearch/state-spaces/issues/54)
@@ -41,18 +88,18 @@ Note that there have been various refactors and miscellaneous changes which may
41
88
- Reorganized the [README](README.md) and added much more [documentation](README.md#readmes) for using this codebase
42
89
43
90
44
-
### 2022-05-01 - [V2.1]
91
+
### [2.1.0] - 2022-05-01
45
92
- Minor updates to S4 modules
46
93
- By default, S4 no longer requires installing Pykeops or a custom CUDA kernel.
47
94
- New S4D (S4-diagonal) standalone model found at `src/models/sequence/ss/standalone/s4d.py`. Simple variant using diagonal SSMs that recovers S4's performance on most tasks. Can be run with any existing experiment config with the additional flag `model/layer=s4d` on the command line.
48
95
- New [LRA configs](#long-range-arena-lra) for updated S4 code, with an average score of ~86
49
96
50
-
### 2022-02-27 - [V2]
97
+
### [2.0.0] - 2022-02-27
51
98
Code release for SaShiMi audio model
52
99
53
-
### 2022-01-29 - [V1.1]
100
+
### [1.1.0] - 2022-01-29
54
101
Added configs for time series datasets from the Informer paper (https://github.com/HazyResearch/state-spaces/issues/4)
55
102
56
-
### 2021-11-18 - [V1]
103
+
### [1.0.0] - 2021-11-18
57
104
First release of this repository containing the S4 module and configs to reproduce sCIFAR, Speech Commands, Long Range Arena, and WikiText-103 results

8
-
> **On the Parameterization and Initialization of Diagonal State Space Models**\
9
-
> Albert Gu, Ankit Gupta, Karan Goel, Christopher Ré\
10
-
> Paper: https://arxiv.org/abs/2206.11893
11
-
12
-
Other variants including [DSS](https://github.com/ag1988/dss) and [GSS](https://arxiv.org/abs/2206.13947) are also supported. DSS is the predecessor to S4D that is also available in its own [fork](https://github.com/ag1988/dss).
13
-
14
-
## HTTYH
15
-
16
-

17
-
> **How to Train Your HiPPO: State Spaces with Generalized Orthogonal Basis Projections**\
18
-
> Albert Gu*, Isys Johnson*, Aman Timalsina, Atri Rudra, Christopher Ré\
@@ -78,18 +36,18 @@ See [CHANGELOG.md](CHANGELOG.md)
78
36
This repository requires Python 3.8+ and Pytorch 1.10+.
79
37
Other packages are listed in [requirements.txt](./requirements.txt).
80
38
81
-
### Cauchy Kernel
39
+
### Structured Kernels
82
40
83
-
A core operation of S4 is the "Cauchy kernel" described in the [paper](https://arxiv.org/abs/2111.00396).
84
-
This is actually a very simple operation; a naive implementation of this operation can be found in the [standalone](src/models/s4/s4.py) in the function `cauchy_naive`.
41
+
A core operation of S4 are the Cauchy and Vandermonde kernels described in the [paper](https://arxiv.org/abs/2111.00396).
42
+
These are very simple matrix multiplications; a naive implementation of these operation can be found in the [standalone](models/s4/s4.py) in the function `cauchy_naive` and `log_vandermonde_naive`.
85
43
However, as the paper describes, this has suboptimal memory usage that currently requires a custom kernel to overcome in PyTorch.
86
44
87
45
Two more efficient methods are supported. The code will automatically detect if either of these is installed and call the appropriate kernel.
88
46
89
47
#### Custom CUDA Kernel
90
48
91
49
This version is faster but requires manual compilation for each machine environment.
92
-
Run `python setup.py install` from the directory `extensions/cauchy/`.
50
+
Run `python setup.py install` from the directory `extensions/kernels/`.
93
51
94
52
#### Pykeops
95
53
@@ -109,20 +67,20 @@ See [notebooks/](notebooks/) for visualizations explaining some concepts behind
109
67
### Example Train Script (External Usage)
110
68
111
69
[example.py](example.py) is a self-contained training script for MNIST and CIFAR that imports the standalone S4 file. The default settings `python example.py` reaches 88% accuracy on sequential CIFAR with a very simple S4D model of 200k parameters.
112
-
This script can be used as an example for using S4 in external repositories.
70
+
This script can be used as an example for using S4 variants in external repositories.
113
71
114
72
### Training with this Repository (Internal Usage)
115
73
116
74
This repository aims to provide a very flexible framework for training sequence models. Many models and datasets are supported.
117
75
118
-
Basic usage is `python -m train`, or equivalently
76
+
The basic entrypoint is `python -m train`, or equivalently
119
77
```
120
78
python -m train pipeline=mnist model=s4
121
79
```
122
80
which trains an S4 model on the Permuted MNIST dataset.
123
81
This should get to around 90% after 1 epoch which takes 1-3 minutes depending on GPU.
124
82
125
-
More examples of using this repository can be found in [Experiments](#experiments) and [Training](#training).
83
+
More examples of using this repository are documented throughout. See [Training](#training) for an overview.
126
84
127
85
### Optimizer Hyperparameters
128
86
@@ -136,14 +94,12 @@ See the method `register` in the model (e.g. [s4d.py](src/models/s4/s4d.py)) and
136
94
Our logic for setting these parameters can be found in the `OptimModule` class under `src/models/sequence/ss/kernel.py` and the corresponding optimizer hook in `SequenceLightningModule.configure_optimizers` under `train.py`
137
95
-->
138
96
139
-
### HiPPO/S4 Visualizations
140
-
141
-
Figures from the HTTYH and S4D papers can be visualized from [notebooks/](notebooks/). These include [animations](notebooks/hippo_function_approximation.ipynb) of HiPPO and S4 that were used in various S4 talks. The animation code can also be found in a [.py file](src/models/hippo/visualizations.py) instead of notebook.
142
97
143
-
## Experiments
98
+
## Training
144
99
145
-
Instructions for reproducing experiments from the papers can be found in [experiments.md](experiments.md).
100
+
The core training infrastructure of this repository is based on [Pytorch-Lightning](https://pytorch-lightning.readthedocs.io/en/latest/) with a configuration scheme based on [Hydra](https://hydra.cc/docs/intro/).
146
101
102
+
The main entrypoint is `train.py` and configs are found in `configs/`.
147
103
148
104
### Data
149
105
@@ -156,15 +112,8 @@ The README inside this subdirectory documents how to download and organize other
156
112
Models are defined in [src/models](src/models). See the README in this subdirectory for an overview.
157
113
158
114
159
-
160
-
## Training
161
-
162
-
The core training infrastructure of this repository is based on [Pytorch-Lightning](https://pytorch-lightning.readthedocs.io/en/latest/) with a configuration scheme based on [Hydra](https://hydra.cc/docs/intro/).
163
-
164
-
The main entrypoint is `train.py` and configs are found in `configs/`.
165
-
166
115
### Configs and Hyperparameters
167
-
Pre-defined configs for many end-to-end experiments are provided (see [experiments.md](experiments.md)).
116
+
Pre-defined configs reproducing end-to-end experiments from the papers are provided, found under project-specific information in [models/](models/), such as for the [original S4 paper](models/s4/experiments.md).
168
117
169
118
Configs can also be easily modified through the command line.
170
119
An example experiment is
@@ -175,6 +124,7 @@ This uses the Permuted MNIST task with an S4 model with a specified number of la
175
124
176
125
See [configs/README.md](configs/) for more detailed documentation about the configs.
177
126
127
+
178
128
#### Hydra
179
129
180
130
It is recommended to read the [Hydra documentation](https://hydra.cc/docs/intro/) to fully understand the configuration framework. For help launching specific experiments, please file an issue.
@@ -274,80 +224,30 @@ This option only needs the path to the Hydra experiment folder and the desired c
274
224
275
225
## Overall Repository Structure
276
226
```
277
-
configs/ config files for model, data pipeline, training loop, etc.
278
-
data/ default location of raw data
279
-
extensions/ CUDA extension for Cauchy kernel
280
-
src/ main source code for models, datasets, etc.
281
-
callbacks/ training loop utilities (e.g. checkpointing)
282
-
dataloaders/ dataset and dataloader definitions
283
-
models/ model definitions
284
-
tasks/ encoder/decoder modules to interface between data and model backbone
227
+
configs/ Config files for model, data pipeline, training loop, etc.
228
+
data/ Default location of raw data
229
+
extensions/ CUDA extensions (Cauchy and Vandermonde kernels)
230
+
src/ Main source code for models, datasets, etc.
231
+
callbacks/ Training loop utilities (e.g. checkpointing)
232
+
dataloaders/ Dataset and dataloader definitions
233
+
models/ Model definitions
234
+
tasks/ Encoder/decoder modules to interface between data and model backbone
285
235
utils/
286
-
sashimi/ SaShiMi README and additional code (generation, metrics, MTurk)
236
+
models/ Model-specific information (code, experiments, additional resources)
287
237
example.py Example training script for using S4 externally
288
238
train.py Training entrypoint for this repo
289
239
generate.py Autoregressive generation script
290
240
```
291
241
292
-
## READMEs
293
-
In addition to this top level README, several READMEs detailing the usage of this repository are organized in subdirectories.
This script follows the structure of the generation script and supports a few more advanced options. You can convert the model and test it on a batch immediately by passing in `test_model=true`. This requires a valid experiment configuration so that a model and dataloader can be constructed. The two options for loading the `generate.py` script from either a `checkpoint_path` or `experiment_path` argument also apply here.
0 commit comments