Skip to content

Commit e77e969

Browse files
committed
update readme.md
1 parent 2afd7f3 commit e77e969

6 files changed

Lines changed: 230 additions & 39 deletions

File tree

README.md

Lines changed: 108 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
# StripeReassembly
22

3-
This repository is a C++ implementation for
4-
```
5-
Reassembling Shredded Document Stripes Using Word-path Metric and Greedy Composition Optimal Matching Solver
6-
```
3+
This repository is a C++ implementation for TMM 19 paper
4+
> Liang, Yongqing, and Xin Li. "Reassembling Shredded Document Stripes Using Word-path Metric and Greedy Composition Optimal Matching Solver." IEEE Transactions on Multimedia (2019).
5+
76
If you use these codes in your research, please cite the paper.
87

9-
## Environment
8+
**Paper correction:** The Equation (5) should be
9+
![](eq_correct.png)
10+
11+
## 1. Environment
1012

1113
We build and evaluate our codes under Ubuntu 18.04 and Mac OS X 10.14.5. The following packages are used in this repository:
1214
1. OpenCV: 3.2.0
@@ -15,7 +17,7 @@ We build and evaluate our codes under Ubuntu 18.04 and Mac OS X 10.14.5. The fol
1517
4. g++: 7.4.0
1618
5. Python: 3.6.8
1719

18-
## DocDataset description
20+
## 2. DocDataset description
1921

2022
Click [here](http://t.lyq.me?d=DocDataset) to download the `DocDataset`. Unzip the package and copy the `gt` and `stripes` into the `/data/` folder of the repository.
2123

@@ -24,10 +26,106 @@ Click [here](http://t.lyq.me?d=DocDataset) to download the `DocDataset`. Unzip t
2426
2. 3 physically shredded document puzzles. They are named as `real*_*`.
2527
3. 1 randomly oriented puzzle named `doc3_36`.
2628

27-
The comparison performance results are reported in the paper.
29+
The comparison performance results are reported in the paper Table I, Table II, and Table III.
30+
31+
## 3. Usage
32+
33+
Download this repository, the source code can be compiled into `debug` and `release` executable files.
34+
35+
### 3.1 Compile
36+
37+
To generate the executable file in the `debug` mode:
38+
```
39+
./autogen debug
40+
```
41+
To generate the executable file in the `release` mode:
42+
```
43+
./autogen release
44+
```
45+
46+
### 3.2 Reassemble a stripe puzzle
2847

29-
## Usage
48+
A quick example to reassemble the synthesized stripe puzzle
49+
```
50+
./bin/release/solver --text doc0 --num 40 --comp 2 --metric 2 --samples 300
51+
```
52+
Another example to reassemble the real-word stripe puzzle:
53+
```
54+
./bin/release/solver -t real1 -n 27 -c 2 -m 2 -s 10000 -r --word_conf_thres 70 --lambda0 0.5 --lambda1 0.7 --u_a 1 --filter_rate 0.2 --candidate_factor 5
55+
```
3056

31-
### Compile
57+
The detailed document can be found in
58+
```
59+
./bin/release/solver --help
60+
```
61+
62+
### 3.3 Run benchmark
3263

33-
`CMakeLists.txt`
64+
We also provide an option to run the whole dataset instead of running each test case individually.
65+
```
66+
./benchmark.sh doc [gen]
67+
```
68+
69+
When you run the benchmark or add `--benchmark` option to `./bin/release/solver`, the results are saved in `data/scores`.
70+
71+
72+
#### 3.3.1 Generate stripe puzzles
73+
In most cases, we recommend you to use the provided dataset for fair comparison.
74+
75+
The `gen` provides you an alternative option when running the benchmark. It which will run the `./bin/release/generator` to generate the stripe puzzles from the groundtruth.
76+
77+
Details about randomly generating stripes from groundtruths can be found in `src/generator/generate_puzzle.cpp`.
78+
79+
#### 3.3.2 Recommend parameters
80+
For synthetic data, default parameters are good enough.
81+
```
82+
const double word_conf_thres = 70;
83+
const double lambda0 = 0.3;
84+
const double lambda1 = 0.5;
85+
const double U_a = 2;
86+
const double filter_rate = 0.7;
87+
const int candidate_factor = 4;
88+
```
89+
We recommend the `--samples` at least 150, 300, 1000, 8000 for 20-, 30-, 40-, and 60-stripe puzzles.
90+
91+
For real-word data, `real1`, `real2`, and `real3`. We report our results in the following parameters.
92+
93+
```
94+
const double word_conf_thres {70}; // or 60
95+
const double lambda0 = 0.5;
96+
const double lambda1 = 0.7;
97+
const double U_a = 1;
98+
99+
// For Real Case 1
100+
const double filter_rate = 0.2;
101+
const int candidate_factor {5};
102+
103+
// For Real Case 2
104+
const double filter_rate = 0.5;
105+
const int candidate_factor {3};
106+
107+
// For Real Case 3
108+
const double filter_rate = 0.6;
109+
const int candidate_factor {5};
110+
```
111+
We recommend the `--samples` at least larger than 8000.
112+
113+
### 3.4 Clean
114+
```
115+
./autoclean debug
116+
```
117+
or
118+
```
119+
./autoclean release
120+
```
121+
122+
## 4. Reference
123+
```
124+
@article{liang2019reassembling,
125+
title={Reassembling Shredded Document Stripes Using Word-path Metric and Greedy Composition Optimal Matching Solver},
126+
author={Liang, Yongqing and Li, Xin},
127+
journal={IEEE Transactions on Multimedia},
128+
year={2019},
129+
publisher={IEEE}
130+
}
131+
```

eq_correct.png

1.32 KB
Loading

include/solve_puzzle.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
#define SOLVE_PUZZLE
33

44
#include <unistd.h>
5+
#include <getopt.h>
56
#include <string>
67
#include <fstream>
78
#include <ctime>

include/stripes_solver.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ class StripesSolver {
5454
// Path
5555
PathManager path_manager;
5656

57-
StripesSolver(const string & _puzzle_foler, int _stripes_n, int _samples_n, bool _real_flag);
57+
StripesSolver(const string & _puzzle_folder, int _stripes_n, int _samples_n, bool _real_flag, double _word_conf_thres, double _lambda0, double _lambda1, double _U_a, double _filter_rate, int _candidate_factor);
5858
~StripesSolver();
5959

6060
void m_metric();
@@ -91,12 +91,12 @@ class StripesSolver {
9191
const string tesseract_model_path {"data/tesseract_model/"};
9292

9393
// -- For synthetic cases
94-
const double word_conf_thres {70};
94+
const double word_conf_thres = 70;
9595
const double lambda0 = 0.3;
9696
const double lambda1 = 0.5;
9797
const double U_a = 2;
9898
const double filter_rate = 0.7;
99-
const int candidate_factor {4};
99+
const int candidate_factor = 4;
100100
// ---------------------
101101

102102

src/solver/solve_puzzle.cpp

Lines changed: 110 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,26 @@
11
#include <solve_puzzle.h>
22

3-
void solve_stripes( const string & stripes_folder,
4-
const string & case_name,
5-
int vertical_n,
6-
int samples_n,
7-
StripesSolver::Metric metric_mode,
8-
StripesSolver::Composition composition_mode,
9-
bool benchmark_flag,
10-
bool real_flag) {
3+
// Default parameters
4+
string case_name = "doc0";
5+
PuzzleType puzzle_type = PuzzleType::STRIPES;
6+
int vertical_n = 4;
7+
int samples_n = 10;
8+
StripesSolver::Composition composition_mode = StripesSolver::GREEDY;
9+
StripesSolver::Metric metric_mode = StripesSolver::PIXEL;
10+
bool benchmark_flag = false;
11+
bool real_flag = false;
12+
13+
// -- Default params for synthetic cases
14+
double word_conf_thres = 70;
15+
double lambda0 = 0.3;
16+
double lambda1 = 0.5;
17+
double U_a = 2;
18+
double filter_rate = 0.7;
19+
int candidate_factor = 4;
20+
21+
void solve_stripes( const string & stripes_folder) {
1122

12-
StripesSolver stripes_solver(stripes_folder, vertical_n, samples_n, real_flag);
23+
StripesSolver stripes_solver(stripes_folder, vertical_n, samples_n, real_flag, word_conf_thres, lambda0, lambda1, U_a, filter_rate, candidate_factor);
1324
stripes_solver.reassemble(metric_mode, composition_mode, case_name, benchmark_flag);
1425

1526
#ifdef DEBUG
@@ -26,24 +37,71 @@ void solve_stripes( const string & stripes_folder,
2637
#endif
2738
}
2839

29-
int main(int argc, char ** argv) {
40+
void print_help() {
41+
cout <<
42+
"--test <str> The test case to evaluate.\n"
43+
"--num <int> The number of stripes.\n"
44+
"--comp <int> The composition type: 0 Greedy, 1 GCOM,\n"
45+
" 2 Greedy and GCOM, 3 Groundtruth,\n"
46+
" 4 user defined.\n"
47+
"--metric <int> The similarity metric type: 0 pixel-level,\n"
48+
" 1 character-level, 2 word-level.\n"
49+
"--samples <int> When GCOM is set, it defines how many\n"
50+
" sequences are sampled for word-level OCR. Note\n"
51+
" that high reassembly score requires enough samples.\n"
52+
" For 20 stripes, we recommend 150 samples.\n"
53+
" For 30 stripes, we recommend 300 samples.\n"
54+
" For 40 stripes, we recommend 1000 samples.\n"
55+
" For 60 stripes, we recommend 8000 samples.\n"
56+
"--benchmark Flag. Whether to write results in files.\n"
57+
"--real Flag. Whether the test case is in real-world.\n";
58+
" "
59+
"--word_conf_thres <float> OCR score threshold [0-100]. Default 70.\n"
60+
"--lambda0 <float> Balance char-level and pixel-level metrics in Eq. 5.\n"
61+
" [0-1]. Default 0.3.\n"
62+
"--lambda1 <float> Balance word-level and low-level metrics in Eq. 13.\n"
63+
" [0-1]. Default: 0.5.\n"
64+
"--u_a <float> Scale factor in Eq. 9. [0-100]. Default: 2.\n"
65+
"--filter_rate <float> Filter out stripe pairs that have low-level scores. \n"
66+
" [0-1]. Default: 0.7.\n"
67+
"--candidate_factor <int> Seq len = Stripe num / candidate_factor. Default 4.\n";
68+
}
3069

31-
// Default parameters
32-
string case_name = "doc0";
33-
PuzzleType puzzle_type = PuzzleType::STRIPES;
34-
int vertical_n = 4;
35-
int samples_n = 10;
36-
StripesSolver::Composition composition_mode = StripesSolver::GREEDY;
37-
StripesSolver::Metric metric_mode = StripesSolver::PIXEL;
38-
bool benchmark_flag = false;
39-
bool real_flag = false;
70+
int main(int argc, char ** argv) {
4071

4172
// Parse command line parameters
42-
const string opt_str = "t:n:c:m:s:br";
43-
int opt = getopt(argc, argv, opt_str.c_str());
73+
const string short_opts = "t:n:c:m:s:brh1:2:3:4:5:6:";
74+
const option long_opts[] = {
75+
{"test", required_argument, nullptr, 't'},
76+
{"num", required_argument, nullptr, 'n'},
77+
{"comp", required_argument, nullptr, 'c'},
78+
{"metric", required_argument, nullptr, 'm'},
79+
{"samples", required_argument, nullptr, 's'},
80+
{"benchmark", no_argument, nullptr, 'b'},
81+
{"real", no_argument, nullptr, 'r'},
82+
{"help", no_argument, nullptr, 'h'},
83+
{"word_conf_thres", required_argument, nullptr, 1},
84+
{"lambda0", required_argument, nullptr, 2},
85+
{"lambda1", required_argument, nullptr, 3},
86+
{"u_a", required_argument, nullptr, 4},
87+
{"filter_rate", required_argument, nullptr, 5},
88+
{"candidate_factor", required_argument, nullptr, 6},
89+
{nullptr, no_argument, nullptr, 0}
90+
};
91+
int opt_id;
92+
93+
while (true) {
94+
const auto opt = getopt_long(argc, argv, short_opts.c_str(), long_opts, &opt_id);
95+
96+
if (opt == -1) break;
4497

45-
while (opt != -1) {
4698
switch (opt) {
99+
100+
case 0:
101+
printf("option %s", long_opts[opt_id].name);
102+
if (optarg) printf(" with arg %s", optarg);
103+
printf("\n");
104+
break;
47105
case 't':
48106
case_name = string(optarg);
49107
break;
@@ -65,12 +123,34 @@ int main(int argc, char ** argv) {
65123
case 'r':
66124
real_flag = true;
67125
break;
126+
case 'h':
127+
print_help();
128+
exit(0);
129+
130+
case 1:
131+
word_conf_thres = atof(optarg);
132+
break;
133+
case 2:
134+
lambda0 = atof(optarg);
135+
break;
136+
case 3:
137+
lambda1 = atof(optarg);
138+
break;
139+
case 4:
140+
U_a = atof(optarg);
141+
break;
142+
case 5:
143+
filter_rate = atof(optarg);
144+
break;
145+
case 6:
146+
candidate_factor = atoi(optarg);
147+
break;
68148
default:
149+
print_help();
69150
cerr << "[ ERR] Unknon options " << opt << endl;
70151
exit(-1);
71152
}
72153

73-
opt = getopt(argc, argv, opt_str.c_str());
74154
}
75155

76156
const string metric_mode_str =
@@ -94,12 +174,18 @@ int main(int argc, char ** argv) {
94174
if (composition_mode == StripesSolver::Composition::GCOM || composition_mode == StripesSolver::Composition::GREEDY_GCOM) {
95175
cout << "Samples times: \t" << samples_n << endl;
96176
}
177+
cout << "word_conf_thres: \t" << word_conf_thres << endl;
178+
cout << "lambda0: \t" << lambda0 << endl;
179+
cout << "lambda1: \t" << lambda1 << endl;
180+
cout << "u_a: \t" << U_a << endl;
181+
cout << "filter_rate: \t" << filter_rate << endl;
182+
cout << "candidate_factor: \t" << candidate_factor << endl;
97183

98184
// Import stripes
99185
if (puzzle_type == PuzzleType::STRIPES) {
100186

101187
const string puzzle_folder = "data/stripes/" + case_name + "_" + to_string(vertical_n) + "/";
102-
solve_stripes(puzzle_folder, case_name, vertical_n, samples_n, metric_mode, composition_mode, benchmark_flag, real_flag);
188+
solve_stripes(puzzle_folder);
103189

104190
}
105191

src/solver/stripes_solver.cpp

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,17 @@
11
#include <stripes_solver.h>
22

3-
StripesSolver::StripesSolver(const string & _puzzle_folder, int _stripes_n, int _samples_n, bool _real_flag) :
3+
StripesSolver::StripesSolver(const string & _puzzle_folder, int _stripes_n, int _samples_n, bool _real_flag, double _word_conf_thres, double _lambda0, double _lambda1, double _U_a, double _filter_rate, int _candidate_factor) :
44
puzzle_folder(_puzzle_folder),
55
stripes_n(_stripes_n),
66
candidate_seqs_n(_samples_n),
77
path_manager(_stripes_n, _samples_n),
8-
real_flag(_real_flag) {
8+
real_flag(_real_flag),
9+
word_conf_thres(_word_conf_thres),
10+
lambda0(_lambda0),
11+
lambda1(_lambda1),
12+
U_a(_U_a),
13+
filter_rate(_filter_rate),
14+
candidate_factor(_candidate_factor) {
915

1016
// Timestamp array
1117
ts_arr.clear();

0 commit comments

Comments
 (0)