How to ignore special characters in Tesseract OCR using java
In tesseract you can set TessBaseAPI.VAR_CHAR_WHITELIST
and TessBaseAPI.VAR_CHAR_BLACKLIST
in order to ignore some special characters.
Following would make tesseract only recognize A-Z and digits
String whiteList = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
tessBaseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST,whiteList);
Next snippet would allow you to recognize everything except for ~ and fl
String blackList = "~fl";
tessBaseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST,blackList );
No comments:
Post a Comment
Note: only a member of this blog may post a comment.