tesseract配置

Tesseract开源的OCR引擎,使用 Apache 2.0 license授权协议,可以直接使用或者使用API开发.并且支持多语言.

安装

install lib

1
2
3
4
5
6
7
8
9
sudo apt-get install autoconf automake libtool
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
sudo apt-get install zlib1g-dev
sudo apt-get install libicu-dev # (if you plan to make the training tools)
sudo apt-get install libpango1.0-dev # (if you plan to make the training tools)
sudo apt-get install libcairo2-dev # (if you plan to make the training tools)

install leptonica

http://www.leptonica.org/

1
2
3
4
5
6
./configure --prefix=/usr/local/leptonica
make
make install

sudo ldconfig

install tesseract

1
2
3
4
5
./autogen.sh
./configure --prefix=/usr/local/tesseract --with-extra-libraries=/usr/local/leptonica/lib
make
make install

chinese language data

将语言包解压后复制到share/tessdata

1
2
3
4
5
6
7
8
9
10
vagrant@aegir:~/tesseract$ sudo tar zxvf chi-tesseract-ocr.tar.gz
tesseract-ocr/tessdata/chi_sim.traineddata

cd tesseract-ocr/tessdata
sudo mv chi_sim.traineddata /usr/local/tesseract/share/tessdata/

export TESSDATA_PREFIX=/usr/local/tesseract/share
# if your tessdata path is '/usr/local/share/tessdata' you have to use 'export TESSDATA_PREFIX='/usr/local/share/


命令行

1
2
./tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

1
2
./tesseract /vagrant/image/7.png 1.txt -l chi_sim

https://code.google.com/p/tesseract-ocr/wiki/ReadMe

https://code.google.com/p/tesseract-ocr/wiki/Compiling

作者

张巍

发布于

2015-02-02

更新于

2015-02-02

许可协议

评论