Skip to content

Commit 8b5c5f9

Browse files
committed
test
1 parent 5df9c92 commit 8b5c5f9

File tree

28 files changed

+498
-20
lines changed

28 files changed

+498
-20
lines changed

Diff for: Captcha1/ReadMe.md

+17-10
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,34 @@
1+
### 验证码识别项目第一版:Captcha1
2+
13
本项目采用Tesseract V3.01版本(V3.02版本在训练时有改动,多shapeclustering过程)
24

3-
Tesseract用法:
5+
**Tesseract用法:**
46
* 配置环境变量TESSDATA_PREFIX =“D:\Tesseract-ocr\”,即tessdata的目录,在源码中会到这个路径下查找相应的字库文件用来识别。
57
* 命令格式:
68
`tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]`
79
* 只识别成数字
810
`tesseract imagename outputbase -l eng digits`
911
* 解决empty page!!
10-
**-psm N**
12+
**-psm N**
1113

12-
7 = Treat the image as a single text line
13-
tesseract imagename outputbase -l eng -psm 7
14+
7 = Treat the image as a single text line
15+
tesseract imagename outputbase -l eng -psm 7
1416
* configfile 参数值为tessdata\configs 和 tessdata\tessconfigs 目录下的文件名:
1517
`tesseract imagename outputbase -l eng nobatch`
1618

1719

18-
**验证码识别项目使用方法1:**
19-
将下载的图片放到./pic目录下,
20+
**验证码识别项目使用方法1:**
21+
22+
* 将下载的图片放到./pic目录下,
2023

2124
验证码图片名称:get_random.jpg
22-
价格图片名称:get_price_img.png
23-
命令格式:
25+
价格图片名称:get_price_img.png
26+
27+
* 命令格式:
2428

2529
验证码图片识别:python tess_test.py ./pic/get_random.jpg
26-
价格图片识别:python tess_test.py ./pic/get_price_img.png
27-
打印出识别的结果,若要将结果存在临时文本文件temp.txt中,则修改pytessr_pro.py中代码"cleanup_scratch_flag = True"改为"cleanup_scratch_flag = False"
30+
价格图片识别:python tess_test.py ./pic/get_price_img.png
31+
32+
打印出识别的结果
33+
34+
若要将结果存在临时文本文件**temp.txt**中,则修改pytessr_pro.py中代码"**cleanup_scratch_flag = True**"改为"**cleanup_scratch_flag = False**"

Diff for: NewsSpider/ReadMe.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
### 网络爬虫之最基本的爬虫:爬取[网易新闻排行榜](http://news.163.com/rank/)
22

3-
一些说明:
3+
**一些说明:**
4+
45
* 使用urllib2或requests包来爬取页面。
6+
57
* 使用正则表达式分析一级页面,使用Xpath来分析二级页面。
8+
69
* 将得到的标题和链接,保存为本地文件。

Diff for: QunarSpider/ReadMe.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
### 网络爬虫之Selenium使用代理登陆:爬取[去哪儿](http://flight.qunar.com/)网站
22

3-
一些说明:
3+
**一些说明:**
4+
45
* 使用selenium模拟浏览器登陆,获取翻页操作。
6+
57
* 代理可以存入一个文件,程序读取并使用。
8+
69
* 支持多进程抓取。

Diff for: ReadMe.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ Selenium是一款自动化测试工具。它能实现操纵浏览器,包括字
224224

225225
可以利用开源的Tesseract-OCR系统进行验证码图片的下载及识别,将识别的字符传到爬虫系统进行模拟登陆。当然也可以将验证码图片上传到打码平台上进行识别。如果不成功,可以再次更新验证码识别,直到成功为止。
226226

227-
参考项目:[Captcha1](https://github.com/lining0806/PythonSpiderNotes/tree/master/Captcha1)
227+
参考项目:[验证码识别项目第一版:Captcha1](https://github.com/lining0806/PythonSpiderNotes/tree/master/Captcha1)
228228

229229
**爬取有两个需要注意的问题:**
230230

Diff for: Spider_Java/README.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1-
# Spider
1+
### Spider_Java
2+
23
抓取网址:华尔街见闻http://live.wallstreetcn.com/
3-
单线程抓取
4+
单线程抓取 Spider_Java1
5+
多线程抓取 Spider_Java2
File renamed without changes.
File renamed without changes.

Diff for: Spider_Java/Spider_Java2/.classpath

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<classpath>
3+
<classpathentry kind="src" path="src"/>
4+
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
5+
<classpathentry kind="lib" path="lib/mongo-java-driver-2.13.0-rc1.jar"/>
6+
<classpathentry kind="output" path="bin"/>
7+
</classpath>

Diff for: Spider_Java/Spider_Java2/.project

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<projectDescription>
3+
<name>Spider</name>
4+
<comment></comment>
5+
<projects>
6+
</projects>
7+
<buildSpec>
8+
<buildCommand>
9+
<name>org.eclipse.jdt.core.javabuilder</name>
10+
<arguments>
11+
</arguments>
12+
</buildCommand>
13+
</buildSpec>
14+
<natures>
15+
<nature>org.eclipse.jdt.core.javanature</nature>
16+
</natures>
17+
</projectDescription>
1.58 KB
Binary file not shown.
476 Bytes
Binary file not shown.
674 Bytes
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
590 KB
Binary file not shown.
+89
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
/**
2+
*
3+
*/
4+
package synchronizetest;
5+
6+
/**
7+
* @author FIRELING
8+
*
9+
*/
10+
public class Test
11+
{
12+
public static void main(String[] args)
13+
{
14+
Reservoir r = new Reservoir(100);
15+
Booth b1 = new Booth(r);
16+
Booth b2 = new Booth(r);
17+
Booth b3 = new Booth(r);
18+
}
19+
}
20+
/**
21+
* contain shared resource
22+
*/
23+
class Reservoir {
24+
private int total;
25+
public Reservoir(int t)
26+
{
27+
this.total = t;
28+
}
29+
/**
30+
* Thread safe method
31+
* serialized access to Booth.total
32+
*/
33+
public synchronized boolean sellTicket() // 利用synchronized修饰符同步了整个方法
34+
{
35+
if(this.total > 0) {
36+
this.total = this.total-1;
37+
return true; // successfully sell one
38+
}
39+
else {
40+
return false; // no more tickets
41+
}
42+
}
43+
}
44+
/**
45+
* create new thread by inheriting Thread
46+
*/
47+
class Booth extends Thread {
48+
private static int threadID = 0; // owned by Class object
49+
50+
private Reservoir release; // sell this reservoir
51+
private int count = 0; // owned by this thread object
52+
/**
53+
* constructor
54+
*/
55+
public Booth(Reservoir r) {
56+
super("ID:"+(++threadID));
57+
this.release = r; // all threads share the same reservoir
58+
this.start();
59+
}
60+
/**
61+
* convert object to string
62+
*/
63+
public String toString() {
64+
return super.getName();
65+
}
66+
/**
67+
* what does the thread do?
68+
*/
69+
public void run() {
70+
while(true) { // 循环体!!!
71+
if(this.release.sellTicket()) {
72+
this.count = this.count+1;
73+
System.out.println(this.getName()+":sell 1");
74+
try {
75+
sleep((int) Math.random()*100); // random intervals
76+
// sleep(100); // 若sleep时间相同,则每个窗口买票相当
77+
}
78+
catch (InterruptedException e) {
79+
throw new RuntimeException(e);
80+
}
81+
}
82+
else {
83+
break;
84+
}
85+
}
86+
System.out.println(this.getName()+" I sold:"+count);
87+
}
88+
}
89+

0 commit comments

Comments
 (0)