|
1 |
| -# prior_knowledge_matrix_for_sequence_tagging |
2 |
| - |
3 |
| -**Paper name (useful phrases)**: |
4 |
| -* Bring prior knowledge |
5 |
| -* Constraints in sequence tagging (maybe not only) |
6 |
| -* Use it knowledge in training process |
7 |
| -* Add it to loss |
8 |
| -* Prior knowledge matrix |
9 |
| -* Faster convergence |
10 |
| -* Unsupervised (we don't need labels to compute gumbel loss) |
11 |
| - |
12 |
| - |
13 |
| -**Data**: |
14 |
| -* [Datasets for Entity Recognition](https://github.com/juand-r/entity-recognition-datasets) - use this! |
15 |
| -* [Annotated Corpus for Named Entity Recognition](https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus/kernels) |
16 |
| - |
17 |
| - |
18 |
| -**Links for relevant papers, articles, implementations**: |
19 |
| -* [Categorical Reparameterization with Gumbel-Softmax](https://arxiv.org/abs/1611.01144) |
20 |
| -* [The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables](https://arxiv.org/abs/1611.00712) |
21 |
| -* [Neural Networks gone wild! They can sample from discrete distributions now!](https://anotherdatum.com/gumbel-gan.html) |
22 |
| -* [ ] [Anticipation-RNN: enforcing unary constraints in sequence generation, with application to interactive music generation](https://link.springer.com/article/10.1007/s00521-018-3868-4) |
23 |
| -* [ ] [Enhancing Neural Sequence Labeling with Position-Aware Self-Attention](https://arxiv.org/pdf/1908.09128.pdf) |
24 |
| -* [ ] [Inference constraints (not in training) in allennlp](https://github.com/allenai/allennlp/blob/master/allennlp/modules/conditional_random_field.py) |
25 |
| -* [ ] [Prior initialization for impossible transitions (-10000)](https://github.com/threelittlemonkeys/lstm-crf-pytorch/blob/master/model.py) |
26 |
| - |
27 |
| - |
28 |
| -**Useful links**: |
29 |
| -* [Attention? Attention!](https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html) - Lilian Weng Blog |
30 |
| -* [The Illustrated Transformer](http://jalammar.github.io/illustrated-transformer/) - Jay Alammar Blog |
31 |
| -* [The Annotated Transformer](http://nlp.seas.harvard.edu/2018/04/03/attention.html) - Harvard Blog |
32 |
| -* [15 Free Datasets and Corpora for Named Entity Recognition (NER)](https://lionbridge.ai/datasets/15-free-datasets-and-corpora-for-named-entity-recognition-ner/) |
33 |
| -* [PyTorch-Tutorial-to-Sequence-Labeling](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Sequence-Labeling) |
34 |
| -* [Sequence tagging example](http://www.cse.chalmers.se/~richajo/nlp2019/l6/Sequence%20tagging%20example.html) |
35 |
| -* NER: [tutorial](https://cs230.stanford.edu/blog/namedentity/) - [github](https://github.com/cs230-stanford/cs230-code-examples) |
36 |
| -* [Approaching a Named Entity Recognition (NER) — End to End Steps](https://mc.ai/approaching-a-named-entity-recognition-ner%E2%80%8A-%E2%80%8Aend-to-end-steps/) |
37 |
| -* [Named Entity Recognition on CoNLL dataset using BiLSTM+CRF implemented with Pytorch](https://pythonawesome.com/named-entity-recognition-on-conll-dataset-using-bilstm-crf-implemented-with-pytorch/) |
38 |
| -* [Named Entity Recognition with BiLSTM-CNNs](https://medium.com/illuin/named-entity-recognition-with-bilstm-cnns-632ba83d3d41) |
39 |
| -* [How does pytorch backprop through argmax?](https://stackoverflow.com/questions/54969646/how-does-pytorch-backprop-through-argmax) |
40 |
| -* [Differentiable Argmax!](https://lucehe.github.io/differentiable-argmax/) |
41 |
| -* [Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation](https://arxiv.org/pdf/1308.3432.pdf) |
42 |
| -* [nn.Embedding source code](https://pytorch.org/docs/stable/_modules/torch/nn/modules/sparse.html#Embedding) |
43 |
| -* [What is the correct way to use OHE lookup table for a pytorch RNN?](https://stackoverflow.com/questions/57632084/what-is-the-correct-way-to-use-ohe-lookup-table-for-a-pytorch-rnn) |
44 |
| -* Beam Search: [Andrew Ng YouTube](https://youtu.be/RLWuzLLSIgw) - [How to Implement a Beam Search Decoder for Natural Language Processing](https://machinelearningmastery.com/beam-search-decoder-natural-language-processing/) |
45 |
| -* [Backpropgating error to emedding matrix](https://datascience.stackexchange.com/questions/33041/backpropgating-error-to-emedding-matrix) |
46 |
| -* [Inside–outside–beginning (tagging)](https://en.wikipedia.org/wiki/Inside–outside–beginning_(tagging)) |
47 |
| -* [Gumbel-Softmax trick vs Softmax with temperature](https://datascience.stackexchange.com/questions/58376/gumbel-softmax-trick-vs-softmax-with-temperature) |
48 |
| -* PyTorchLightning: [github](https://github.com/PyTorchLightning/pytorch-lightning) - [docs](https://pytorch-lightning.readthedocs.io/en/stable/) - [medium](https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09) |
49 |
| -* [PyTorch Lightning vs PyTorch Ignite vs Fast.ai](https://towardsdatascience.com/pytorch-lightning-vs-pytorch-ignite-vs-fast-ai-61dc7480ad8a) |
50 |
| - |
51 |
| - |
52 |
| -**Architecture**: |
53 |
| -* RNN with attention |
54 |
| -* BERT and friends |
55 |
| - |
56 |
| - |
57 |
| -**Hypotheses**: |
58 |
| -* [ ] Does *gumbel-softmax* take log probabilities as input? |
59 |
| -* [ ] Basic configuration for *two-tokens constraints* (matrix), the *three-tokens constraints* and more (tensor), compare convergence and results (although NN could capture difficult dependencies from data, it still strangle to capture temporal dependencies over tokens, especially from long sequence patterns) |
60 |
| -* [ ] Check filtering PAD token in gumble loss term |
61 |
| -* [ ] Play with parameter *"hard"* in *PyTorch F.gumbel_softmax* |
62 |
| -* [ ] Play with *Beam search* (see links) |
63 |
| -* [ ] Check *FP/FN* when using *gumbel-softmax* |
64 |
| -* [ ] Try to use *softmax* probabilities instead of *gumbel-softmax* samples, what strategy to use |
65 |
| -* [ ] Try *F.argmax* with backpropagation (see in links) |
66 |
| -* [ ] Since loss is a sum of *cross-entropy loss* and *prior-knowledge loss*, compare its order of magnitude and find out how to select *lambda* automatically (ratio of two losses) |
67 |
| -* [ ] Play with reverse matrix (replace 0 to 1 and 1 to 0) and understand, how it affects final score |
68 |
| -* [ ] Make *nn.Module* for efficient computing and write it in paper (see link about embedding) |
69 |
| -* [ ] Use two classifiers (neural networks) - first for *tokenization*, second for *classification* (halve num of classes) |
70 |
| -* [ ] Play with knowledge matrix init (with 0/1 or something else) |
71 |
| -* [ ] (additional) Try seq2seq architecture for sequence labelling (maybe find relevant paper and add to references) |
72 |
| -* [ ] (additional) Try seq2seq constraints (output length matches input length) |
73 |
| - |
74 |
| - |
75 |
| -**TODO**: |
76 |
| -* [ ] Understand, how presubmit arxiv paper, what rules to publish, and other about conferences |
77 |
| -* [ ] Find datasets to compare with (NER, POS, etc) |
78 |
| -* [ ] Make table on another page (or another board) for hypothesis results |
79 |
| -* [ ] Add POS-tags in useful links |
80 |
| -* [ ] Decide, what papers to use for structure (use conferences restrictions) |
81 |
| -* [ ] Compare our implementation vs cs230-code-examples (on their dataset) |
82 |
| -* [ ] (additional) Maybe use docker during experiments and after (final project version) |
83 |
| - |
84 |
| - |
85 |
| -**Rules**: |
86 |
| -* Save all references to use it later in paper |
87 |
| -* Make issues for hypothesis |
88 |
| -* Use PyTorchLightning |
89 |
| -* Use different folder for different hypothesis |
90 |
| - |
91 |
| - |
92 |
| -**Metrics**: |
93 |
| - |
94 |
| - |
95 |
| -**NER Datasets** |
96 |
| - |
97 |
| -project to collect datasets for named entity recognition using git lfs |
98 |
| -* [15 Free Datasets and Corpora for Named Entity Recognition (NER)](https://lionbridge.ai/datasets/15-free-datasets-and-corpora-for-named-entity-recognition-ner/) |
99 |
| -* [NeuroNER](https://github.com/Franck-Dernoncourt/NeuroNER) |
100 |
| -* [Datasets for Entity Recognition](https://github.com/juand-r/entity-recognition-datasets) |
| 1 | +# Prior Knowledge Layer for Sequence Tagging |
0 commit comments