Skip to content

Commit 445ddeb

Browse files
authored
Merge pull request #1 from dayyass/develop
first prototype
2 parents 42cb99c + 7477e83 commit 445ddeb

File tree

7 files changed

+324
-107
lines changed

7 files changed

+324
-107
lines changed

Diff for: .gitignore

+12-5
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,14 @@
1-
*.ipynb_checkpoints/
21
*.DS_Store
3-
*__pycache__/
4-
*.cache/
5-
*.pyc
6-
/venv/
2+
*.ipynb_checkpoints
3+
4+
# cache
5+
*__pycache__
6+
**.mypy_cache
7+
*.cache
8+
9+
# coverage
10+
*.coverage
11+
*coverage.xml
12+
13+
venv
714
.idea

Diff for: .pre-commit-config.yaml

+5-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,11 @@ repos:
1111
- repo: https://github.com/pre-commit/pre-commit-hooks
1212
rev: v2.3.0
1313
hooks:
14+
- id: check-yaml
15+
- id: name-tests-test
16+
args: ['--django']
1417
- id: debug-statements
18+
- id: end-of-file-fixer
1519
- id: trailing-whitespace
1620
- id: check-docstring-first
1721
- id: requirements-txt-fixer
@@ -20,4 +24,4 @@ repos:
2024
- repo: https://github.com/pre-commit/mirrors-mypy
2125
rev: v0.790
2226
hooks:
23-
- id: mypy
27+
- id: mypy

Diff for: README.md

+1-100
Original file line numberDiff line numberDiff line change
@@ -1,100 +1 @@
1-
# prior_knowledge_matrix_for_sequence_tagging
2-
3-
**Paper name (useful phrases)**:
4-
* Bring prior knowledge
5-
* Constraints in sequence tagging (maybe not only)
6-
* Use it knowledge in training process
7-
* Add it to loss
8-
* Prior knowledge matrix
9-
* Faster convergence
10-
* Unsupervised (we don't need labels to compute gumbel loss)
11-
12-
13-
**Data**:
14-
* [Datasets for Entity Recognition](https://github.com/juand-r/entity-recognition-datasets) - use this!
15-
* [Annotated Corpus for Named Entity Recognition](https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus/kernels)
16-
17-
18-
**Links for relevant papers, articles, implementations**:
19-
* [Categorical Reparameterization with Gumbel-Softmax](https://arxiv.org/abs/1611.01144)
20-
* [The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables](https://arxiv.org/abs/1611.00712)
21-
* [Neural Networks gone wild! They can sample from discrete distributions now!](https://anotherdatum.com/gumbel-gan.html)
22-
* [ ] [Anticipation-RNN: enforcing unary constraints in sequence generation, with application to interactive music generation](https://link.springer.com/article/10.1007/s00521-018-3868-4)
23-
* [ ] [Enhancing Neural Sequence Labeling with Position-Aware Self-Attention](https://arxiv.org/pdf/1908.09128.pdf)
24-
* [ ] [Inference constraints (not in training) in allennlp](https://github.com/allenai/allennlp/blob/master/allennlp/modules/conditional_random_field.py)
25-
* [ ] [Prior initialization for impossible transitions (-10000)](https://github.com/threelittlemonkeys/lstm-crf-pytorch/blob/master/model.py)
26-
27-
28-
**Useful links**:
29-
* [Attention? Attention!](https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html) - Lilian Weng Blog
30-
* [The Illustrated Transformer](http://jalammar.github.io/illustrated-transformer/) - Jay Alammar Blog
31-
* [The Annotated Transformer](http://nlp.seas.harvard.edu/2018/04/03/attention.html) - Harvard Blog
32-
* [15 Free Datasets and Corpora for Named Entity Recognition (NER)](https://lionbridge.ai/datasets/15-free-datasets-and-corpora-for-named-entity-recognition-ner/)
33-
* [PyTorch-Tutorial-to-Sequence-Labeling](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Sequence-Labeling)
34-
* [Sequence tagging example](http://www.cse.chalmers.se/~richajo/nlp2019/l6/Sequence%20tagging%20example.html)
35-
* NER: [tutorial](https://cs230.stanford.edu/blog/namedentity/) - [github](https://github.com/cs230-stanford/cs230-code-examples)
36-
* [Approaching a Named Entity Recognition (NER) — End to End Steps](https://mc.ai/approaching-a-named-entity-recognition-ner%E2%80%8A-%E2%80%8Aend-to-end-steps/)
37-
* [Named Entity Recognition on CoNLL dataset using BiLSTM+CRF implemented with Pytorch](https://pythonawesome.com/named-entity-recognition-on-conll-dataset-using-bilstm-crf-implemented-with-pytorch/)
38-
* [Named Entity Recognition with BiLSTM-CNNs](https://medium.com/illuin/named-entity-recognition-with-bilstm-cnns-632ba83d3d41)
39-
* [How does pytorch backprop through argmax?](https://stackoverflow.com/questions/54969646/how-does-pytorch-backprop-through-argmax)
40-
* [Differentiable Argmax!](https://lucehe.github.io/differentiable-argmax/)
41-
* [Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation](https://arxiv.org/pdf/1308.3432.pdf)
42-
* [nn.Embedding source code](https://pytorch.org/docs/stable/_modules/torch/nn/modules/sparse.html#Embedding)
43-
* [What is the correct way to use OHE lookup table for a pytorch RNN?](https://stackoverflow.com/questions/57632084/what-is-the-correct-way-to-use-ohe-lookup-table-for-a-pytorch-rnn)
44-
* Beam Search: [Andrew Ng YouTube](https://youtu.be/RLWuzLLSIgw) - [How to Implement a Beam Search Decoder for Natural Language Processing](https://machinelearningmastery.com/beam-search-decoder-natural-language-processing/)
45-
* [Backpropgating error to emedding matrix](https://datascience.stackexchange.com/questions/33041/backpropgating-error-to-emedding-matrix)
46-
* [Inside–outside–beginning (tagging)](https://en.wikipedia.org/wiki/Inside–outside–beginning_(tagging))
47-
* [Gumbel-Softmax trick vs Softmax with temperature](https://datascience.stackexchange.com/questions/58376/gumbel-softmax-trick-vs-softmax-with-temperature)
48-
* PyTorchLightning: [github](https://github.com/PyTorchLightning/pytorch-lightning) - [docs](https://pytorch-lightning.readthedocs.io/en/stable/) - [medium](https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09)
49-
* [PyTorch Lightning vs PyTorch Ignite vs Fast.ai](https://towardsdatascience.com/pytorch-lightning-vs-pytorch-ignite-vs-fast-ai-61dc7480ad8a)
50-
51-
52-
**Architecture**:
53-
* RNN with attention
54-
* BERT and friends
55-
56-
57-
**Hypotheses**:
58-
* [ ] Does *gumbel-softmax* take log probabilities as input?
59-
* [ ] Basic configuration for *two-tokens constraints* (matrix), the *three-tokens constraints* and more (tensor), compare convergence and results (although NN could capture difficult dependencies from data, it still strangle to capture temporal dependencies over tokens, especially from long sequence patterns)
60-
* [ ] Check filtering PAD token in gumble loss term
61-
* [ ] Play with parameter *"hard"* in *PyTorch F.gumbel_softmax*
62-
* [ ] Play with *Beam search* (see links)
63-
* [ ] Check *FP/FN* when using *gumbel-softmax*
64-
* [ ] Try to use *softmax* probabilities instead of *gumbel-softmax* samples, what strategy to use
65-
* [ ] Try *F.argmax* with backpropagation (see in links)
66-
* [ ] Since loss is a sum of *cross-entropy loss* and *prior-knowledge loss*, compare its order of magnitude and find out how to select *lambda* automatically (ratio of two losses)
67-
* [ ] Play with reverse matrix (replace 0 to 1 and 1 to 0) and understand, how it affects final score
68-
* [ ] Make *nn.Module* for efficient computing and write it in paper (see link about embedding)
69-
* [ ] Use two classifiers (neural networks) - first for *tokenization*, second for *classification* (halve num of classes)
70-
* [ ] Play with knowledge matrix init (with 0/1 or something else)
71-
* [ ] (additional) Try seq2seq architecture for sequence labelling (maybe find relevant paper and add to references)
72-
* [ ] (additional) Try seq2seq constraints (output length matches input length)
73-
74-
75-
**TODO**:
76-
* [ ] Understand, how presubmit arxiv paper, what rules to publish, and other about conferences
77-
* [ ] Find datasets to compare with (NER, POS, etc)
78-
* [ ] Make table on another page (or another board) for hypothesis results
79-
* [ ] Add POS-tags in useful links
80-
* [ ] Decide, what papers to use for structure (use conferences restrictions)
81-
* [ ] Compare our implementation vs cs230-code-examples (on their dataset)
82-
* [ ] (additional) Maybe use docker during experiments and after (final project version)
83-
84-
85-
**Rules**:
86-
* Save all references to use it later in paper
87-
* Make issues for hypothesis
88-
* Use PyTorchLightning
89-
* Use different folder for different hypothesis
90-
91-
92-
**Metrics**:
93-
94-
95-
**NER Datasets**
96-
97-
project to collect datasets for named entity recognition using git lfs
98-
* [15 Free Datasets and Corpora for Named Entity Recognition (NER)](https://lionbridge.ai/datasets/15-free-datasets-and-corpora-for-named-entity-recognition-ner/)
99-
* [NeuroNER](https://github.com/Franck-Dernoncourt/NeuroNER)
100-
* [Datasets for Entity Recognition](https://github.com/juand-r/entity-recognition-datasets)
1+
# Prior Knowledge Layer for Sequence Tagging

Diff for: experiments/init_layer.ipynb

+217
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
{
2+
"metadata": {
3+
"language_info": {
4+
"codemirror_mode": {
5+
"name": "ipython",
6+
"version": 3
7+
},
8+
"file_extension": ".py",
9+
"mimetype": "text/x-python",
10+
"name": "python",
11+
"nbconvert_exporter": "python",
12+
"pygments_lexer": "ipython3",
13+
"version": "3.9.5"
14+
},
15+
"orig_nbformat": 4,
16+
"kernelspec": {
17+
"name": "python3",
18+
"display_name": "Python 3.9.5 64-bit ('venv': venv)"
19+
},
20+
"interpreter": {
21+
"hash": "845c5602ab99810ab764514180c741e7c3ea060beb690dec8bc374d83bb9cb81"
22+
}
23+
},
24+
"nbformat": 4,
25+
"nbformat_minor": 2,
26+
"cells": [
27+
{
28+
"cell_type": "code",
29+
"execution_count": 1,
30+
"metadata": {},
31+
"outputs": [],
32+
"source": [
33+
"import sys\n",
34+
"sys.path.append(\"..\")"
35+
]
36+
},
37+
{
38+
"cell_type": "code",
39+
"execution_count": 2,
40+
"metadata": {},
41+
"outputs": [
42+
{
43+
"output_type": "execute_result",
44+
"data": {
45+
"text/plain": [
46+
"<torch._C.Generator at 0x112b812d0>"
47+
]
48+
},
49+
"metadata": {},
50+
"execution_count": 2
51+
}
52+
],
53+
"source": [
54+
"import torch\n",
55+
"torch.manual_seed(42)"
56+
]
57+
},
58+
{
59+
"cell_type": "code",
60+
"execution_count": 3,
61+
"metadata": {},
62+
"outputs": [],
63+
"source": [
64+
"from prior_knowledge_layer import PriorKnowledgeLayer, init_prior_knowledge_matrix"
65+
]
66+
},
67+
{
68+
"cell_type": "code",
69+
"execution_count": 4,
70+
"metadata": {},
71+
"outputs": [],
72+
"source": [
73+
"labels = [\"O\", \"B-PER\", \"B-LOC\", \"B-ORG\", \"I-PER\", \"I-LOC\", \"I-ORG\"]"
74+
]
75+
},
76+
{
77+
"cell_type": "code",
78+
"execution_count": 5,
79+
"metadata": {},
80+
"outputs": [],
81+
"source": [
82+
"prohibited_transitions = {\n",
83+
" \"O\": (\"I-PER\", \"I-LOC\", \"I-ORG\"),\n",
84+
" \"B-PER\": (\"I-LOC\", \"I-ORG\"),\n",
85+
" \"B-LOC\": (\"I-PER\", \"I-ORG\"),\n",
86+
" \"B-ORG\": (\"I-PER\", \"I-LOC\"),\n",
87+
" \"I-PER\": (\"I-LOC\", \"I-ORG\"),\n",
88+
" \"I-LOC\": (\"I-PER\", \"I-ORG\"),\n",
89+
" \"I-ORG\": (\"I-PER\", \"I-LOC\"),\n",
90+
"}"
91+
]
92+
},
93+
{
94+
"cell_type": "code",
95+
"execution_count": 6,
96+
"metadata": {},
97+
"outputs": [],
98+
"source": [
99+
"prior_knowledge_matrix = init_prior_knowledge_matrix(\n",
100+
" labels=labels,\n",
101+
" prohibited_transitions=prohibited_transitions,\n",
102+
" prohibited_transition_value=1,\n",
103+
")"
104+
]
105+
},
106+
{
107+
"cell_type": "code",
108+
"execution_count": 7,
109+
"metadata": {},
110+
"outputs": [
111+
{
112+
"output_type": "execute_result",
113+
"data": {
114+
"text/plain": [
115+
"tensor([[0., 0., 0., 0., 1., 1., 1.],\n",
116+
" [0., 0., 0., 0., 0., 1., 1.],\n",
117+
" [0., 0., 0., 0., 1., 0., 1.],\n",
118+
" [0., 0., 0., 0., 1., 1., 0.],\n",
119+
" [0., 0., 0., 0., 0., 1., 1.],\n",
120+
" [0., 0., 0., 0., 1., 0., 1.],\n",
121+
" [0., 0., 0., 0., 1., 1., 0.]])"
122+
]
123+
},
124+
"metadata": {},
125+
"execution_count": 7
126+
}
127+
],
128+
"source": [
129+
"prior_knowledge_matrix"
130+
]
131+
},
132+
{
133+
"cell_type": "code",
134+
"execution_count": 8,
135+
"metadata": {},
136+
"outputs": [],
137+
"source": [
138+
"layer = PriorKnowledgeLayer(prior_knowledge_matrix)"
139+
]
140+
},
141+
{
142+
"cell_type": "code",
143+
"execution_count": 9,
144+
"metadata": {},
145+
"outputs": [
146+
{
147+
"output_type": "execute_result",
148+
"data": {
149+
"text/plain": [
150+
"PriorKnowledgeLayer()"
151+
]
152+
},
153+
"metadata": {},
154+
"execution_count": 9
155+
}
156+
],
157+
"source": [
158+
"layer"
159+
]
160+
},
161+
{
162+
"cell_type": "code",
163+
"execution_count": 10,
164+
"metadata": {},
165+
"outputs": [
166+
{
167+
"output_type": "execute_result",
168+
"data": {
169+
"text/plain": [
170+
"torch.Size([8, 25, 7])"
171+
]
172+
},
173+
"metadata": {},
174+
"execution_count": 10
175+
}
176+
],
177+
"source": [
178+
"batch_size = 8\n",
179+
"seq_len = 25\n",
180+
"n_classes = len(labels)\n",
181+
"\n",
182+
"logits = torch.randn(batch_size, seq_len, n_classes)\n",
183+
"\n",
184+
"logits.shape"
185+
]
186+
},
187+
{
188+
"cell_type": "code",
189+
"execution_count": 11,
190+
"metadata": {},
191+
"outputs": [
192+
{
193+
"output_type": "execute_result",
194+
"data": {
195+
"text/plain": [
196+
"torch.Size([8, 24])"
197+
]
198+
},
199+
"metadata": {},
200+
"execution_count": 11
201+
}
202+
],
203+
"source": [
204+
"output = layer(logits)\n",
205+
"\n",
206+
"output.shape"
207+
]
208+
},
209+
{
210+
"cell_type": "code",
211+
"execution_count": null,
212+
"metadata": {},
213+
"outputs": [],
214+
"source": []
215+
}
216+
]
217+
}

Diff for: prior_knowledge_layer/__init__.py

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
from .main import PriorKnowledgeLayer, init_prior_knowledge_matrix
2+
3+
__all__ = [
4+
"PriorKnowledgeLayer",
5+
"init_prior_knowledge_matrix",
6+
]

0 commit comments

Comments
 (0)