Skip to content

Commit 370f75a

Browse files
authored
Text Recognition: Add script to evaluate text recognition by ICDAR2003 (#71)
* update readme * add another script * revise details for this pr
1 parent ae1d754 commit 370f75a

File tree

9 files changed

+288
-2
lines changed

9 files changed

+288
-2
lines changed

models/text_recognition_crnn/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,24 @@
22

33
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
44

5+
Results of accuracy evaluation with [tools/eval](../../tools/eval) at different text recognition datasets.
6+
7+
| Model name | ICDAR03(%) | IIIT5k(%) | CUTE80(%) |
8+
|--------------|------------|-----------|-----------|
9+
| CRNN_EN | 81.66 | 74.33 | 52.78 |
10+
| CRNN_EN_FP16 | 82.01 | 74.93 | 52.34 |
11+
| CRNN_CH | 71.28 | 80.90 | 67.36 |
12+
| CRNN_CH_FP16 | 78.63 | 80.93 | 67.01 |
13+
14+
\*: 'FP16' stands for 'model quantized into FP16'.
15+
516
Note:
617
- Model source:
718
- `text_recognition_CRNN_EN_2021sep.onnx`: https://docs.opencv.org/4.5.2/d9/d1e/tutorial_dnn_OCR.html (CRNN_VGG_BiLSTM_CTC.onnx)
19+
- `text_recognition_CRNN_CH_2021sep.onnx`: https://docs.opencv.org/4.x/d4/d43/tutorial_dnn_text_spotting.html (crnn_cs.onnx)
820
- `text_recognition_CRNN_CN_2021nov.onnx`: https://docs.opencv.org/4.5.2/d4/d43/tutorial_dnn_text_spotting.html (crnn_cs_CN.onnx)
921
- `text_recognition_CRNN_EN_2021sep.onnx` can detect digits (0\~9) and letters (return lowercase letters a\~z) (view `charset_36_EN.txt` for details).
22+
- `text_recognition_CRNN_CH_2021sep.onnx` can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), and some special characters (view `charset_94_CH.txt` for details).
1023
- `text_recognition_CRNN_CN_2021nov.onnx` can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), some Chinese characters and some special characters (view `charset_3944_CN.txt` for details).
1124
- For details on training this model series, please visit https://github.com/zihaomu/deep-text-recognition-benchmark.
1225

@@ -16,6 +29,7 @@ Note:
1629
- This demo uses [text_detection_db](../text_detection_db) as text detector.
1730
- Selected model must match with the charset:
1831
- Try `text_recognition_CRNN_EN_2021sep.onnx` with `charset_36_EN.txt`.
32+
- Try `text_recognition_CRNN_CH_2021sep.onnx` with `charset_94_CH.txt`
1933
- Try `text_recognition_CRNN_CN_2021sep.onnx` with `charset_3944_CN.txt`.
2034

2135
Run the demo detecting English:
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
0
2+
1
3+
2
4+
3
5+
4
6+
5
7+
6
8+
7
9+
8
10+
9
11+
a
12+
b
13+
c
14+
d
15+
e
16+
f
17+
g
18+
h
19+
i
20+
j
21+
k
22+
l
23+
m
24+
n
25+
o
26+
p
27+
q
28+
r
29+
s
30+
t
31+
u
32+
v
33+
w
34+
x
35+
y
36+
z
37+
A
38+
B
39+
C
40+
D
41+
E
42+
F
43+
G
44+
H
45+
I
46+
J
47+
K
48+
L
49+
M
50+
N
51+
O
52+
P
53+
Q
54+
R
55+
S
56+
T
57+
U
58+
V
59+
W
60+
X
61+
Y
62+
Z
63+
!
64+
"
65+
#
66+
$
67+
%
68+
&
69+
'
70+
(
71+
)
72+
*
73+
+
74+
,
75+
-
76+
.
77+
/
78+
:
79+
;
80+
<
81+
=
82+
>
83+
?
84+
@
85+
[
86+
\
87+
]
88+
^
89+
_
90+
`
91+
{
92+
|
93+
}
94+
~

models/text_recognition_crnn/crnn.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,9 @@ def _preprocess(self, image, rbbox):
5454
rotationMatrix = cv.getPerspectiveTransform(vertices, self._targetVertices)
5555
cropped = cv.warpPerspective(image, rotationMatrix, self._inputSize)
5656

57-
if 'CN' in self._model_path:
57+
# 'CN' can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), and some special characters
58+
# 'CH' can detect digits (0\~9), upper/lower-case letters (a\~z and A\~Z), some Chinese characters and some special characters
59+
if 'CN' in self._model_path or 'CH' in self._model_path:
5860
pass
5961
else:
6062
cropped = cv.cvtColor(cropped, cv.COLOR_BGR2GRAY)

models/text_recognition_crnn/text_recognition_CRNN_CH_2021sep.onnx

Whitespace-only changes.

tools/eval/README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ Supported datasets:
1919
- [ImageNet](#imagenet)
2020
- [WIDERFace](#widerface)
2121
- [LFW](#lfw)
22+
- [ICDAR](#icdar)
23+
- [IIIT5K](#iiit5k)
2224

2325
## ImageNet
2426

@@ -137,4 +139,55 @@ Run evaluation with the following command:
137139

138140
```shell
139141
python eval.py -m sface -d lfw -dr /path/to/lfw
142+
```
143+
144+
## ICDAR2003
145+
146+
### Prepare data
147+
148+
Please visit http://iapr-tc11.org/mediawiki/index.php/ICDAR_2003_Robust_Reading_Competitions to download the ICDAR2003 dataset and the labels.
149+
150+
```shell
151+
$ tree -L 2 /path/to/icdar
152+
.
153+
├── word
154+
│   ├── 1
155+
│ │ ├── self
156+
│ │ ├── ...
157+
│ │ └── willcooks
158+
│   ├── ...
159+
│   └── 12
160+
└── word.xml
161+
  
162+
```
163+
164+
### Evaluation
165+
166+
Run evaluation with the following command:
167+
168+
```shell
169+
python eval.py -m crnn -d icdar -dr /path/to/icdar
170+
```
171+
172+
### Example
173+
174+
```shell
175+
download zip file from http://www.iapr-tc11.org/dataset/ICDAR2003_RobustReading/TrialTrain/word.zip
176+
upzip file to /path/to/icdar
177+
python eval.py -m crnn -d icdar -dr /path/to/icdar
178+
```
179+
180+
## IIIT5K
181+
182+
### Prepare data
183+
184+
Please visit https://github.com/cv-small-snails/Text-Recognition-Material to download the IIIT5K dataset and the labels.
185+
186+
### Evaluation
187+
188+
All the datasets in the format of lmdb can be evaluated by this script.<br>
189+
Run evaluation with the following command:
190+
191+
```shell
192+
python eval.py -m crnn -d iiit5k -dr /path/to/iiit5k
140193
```

tools/eval/datasets/__init__.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
from .imagenet import ImageNet
22
from .widerface import WIDERFace
33
from .lfw import LFW
4+
from .icdar import ICDAR
5+
from .iiit5k import IIIT5K
46

57
class Registery:
68
def __init__(self, name):
@@ -16,4 +18,6 @@ def register(self, item):
1618
DATASETS = Registery("Datasets")
1719
DATASETS.register(ImageNet)
1820
DATASETS.register(WIDERFace)
19-
DATASETS.register(LFW)
21+
DATASETS.register(LFW)
22+
DATASETS.register(ICDAR)
23+
DATASETS.register(IIIT5K)

tools/eval/datasets/icdar.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
import os
2+
import numpy as np
3+
import cv2 as cv
4+
import xml.dom.minidom as minidom
5+
from tqdm import tqdm
6+
7+
class ICDAR:
8+
def __init__(self, root):
9+
self.root = root
10+
self.acc = -1
11+
self.inputSize = [100, 32]
12+
self.val_label_file = os.path.join(root, "word.xml")
13+
self.val_label = self.load_label(self.val_label_file)
14+
15+
@property
16+
def name(self):
17+
return self.__class__.__name__
18+
19+
def load_label(self, label_file):
20+
label = list()
21+
dom = minidom.getDOMImplementation().createDocument(None, 'Root', None)
22+
root = dom.documentElement
23+
dom = minidom.parse(self.val_label_file)
24+
root = dom.documentElement
25+
names = root.getElementsByTagName('image')
26+
for name in names:
27+
key = os.path.join(self.root, name.getAttribute('file'))
28+
value = name.getAttribute('tag').lower()
29+
label.append([key, value])
30+
31+
return label
32+
33+
def eval(self, model):
34+
right_num = 0
35+
pbar = tqdm(self.val_label)
36+
for fn, label in pbar:
37+
pbar.set_description("Evaluating {} with {} val set".format(model.name, self.name))
38+
39+
img = cv.imread(fn)
40+
41+
rbbox = np.array([0, img.shape[0], 0, 0, img.shape[1], 0, img.shape[1], img.shape[0]])
42+
pred = model.infer(img, rbbox)
43+
if label == pred:
44+
right_num += 1
45+
46+
self.acc = right_num/(len(self.val_label) * 1.0)
47+
48+
49+
def get_result(self):
50+
return self.acc
51+
52+
def print_result(self):
53+
print("Accuracy: {:.2f}%".format(self.acc*100))

tools/eval/datasets/iiit5k.py

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
import lmdb
2+
import os
3+
import numpy as np
4+
import cv2 as cv
5+
from tqdm import tqdm
6+
7+
class IIIT5K:
8+
def __init__(self, root):
9+
self.root = root
10+
self.acc = -1
11+
self.inputSize = [100, 32]
12+
13+
self.val_label = self.load_label(self.root)
14+
15+
@property
16+
def name(self):
17+
return self.__class__.__name__
18+
19+
def load_label(self, root):
20+
lmdb_file = root
21+
lmdb_env = lmdb.open(lmdb_file)
22+
lmdb_txn = lmdb_env.begin()
23+
lmdb_cursor = lmdb_txn.cursor()
24+
label = list()
25+
for key, value in lmdb_cursor:
26+
image_index = key.decode()
27+
if image_index.split('-')[0] == 'image':
28+
img = cv.imdecode(np.fromstring(value, np.uint8), 3)
29+
label_index = 'label-' + image_index.split('-')[1]
30+
value = lmdb_txn.get(label_index.encode()).decode().lower()
31+
label.append([img, value])
32+
else:
33+
break
34+
return label
35+
36+
def eval(self, model):
37+
right_num = 0
38+
pbar = tqdm(self.val_label)
39+
for img, value in pbar:
40+
pbar.set_description("Evaluating {} with {} val set".format(model.name, self.name))
41+
42+
43+
rbbox = np.array([0, img.shape[0], 0, 0, img.shape[1], 0, img.shape[1], img.shape[0]])
44+
pred = model.infer(img, rbbox).lower()
45+
if value == pred:
46+
right_num += 1
47+
48+
self.acc = right_num/(len(self.val_label) * 1.0)
49+
50+
51+
def get_result(self):
52+
return self.acc
53+
54+
def print_result(self):
55+
print("Accuracy: {:.2f}%".format(self.acc*100))

tools/eval/eval.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,11 @@
7373
name="SFace",
7474
topic="face_recognition",
7575
modelPath=os.path.join(root_dir, "models/face_recognition_sface/face_recognition_sface_2021dec-act_int8-wt_int8-quantized.onnx")),
76+
crnn=dict(
77+
name="CRNN",
78+
topic="text_recognition",
79+
modelPath=os.path.join(root_dir, "models/text_recognition_crnn/text_recognition_CRNN_EN_2021sep.onnx"),
80+
charsetPath=os.path.join(root_dir, "models/text_recognition_crnn/charset_36_EN.txt")),
7681
)
7782

7883
datasets = dict(
@@ -87,6 +92,12 @@
8792
name="LFW",
8893
topic="face_recognition",
8994
target_size=112),
95+
icdar=dict(
96+
name="ICDAR",
97+
topic="text_recognition"),
98+
iiit5k=dict(
99+
name="IIIT5K",
100+
topic="text_recognition"),
90101
)
91102

92103
def main(args):

0 commit comments

Comments
 (0)