{"id":2814,"date":"2008-06-25T14:15:04","date_gmt":"2008-06-25T14:15:04","guid":{"rendered":"http:\/\/scienceblogs.com\/gregladen\/2008\/06\/25\/optical-character-recognition\/"},"modified":"2008-06-25T14:15:04","modified_gmt":"2008-06-25T14:15:04","slug":"optical-character-recognition","status":"publish","type":"post","link":"https:\/\/gregladen.com\/blog\/2008\/06\/25\/optical-character-recognition\/","title":{"rendered":"Optical Character Recognition in Linux"},"content":{"rendered":"<p>I don&#8217;t think Optical Character Recognition (OCR) works that well, frankly.  But it can be done and it can be better than retyping piles of text.It does seem to work nicely when the text is nice and clean on nice clean white paper with a good contract between ink and background and no garbage on the page.  But in my experience, when I have those conditions, it is because i have an electronic version already!  When I have a PDF file that consists of scans of photocopies, OCR tends to see flecks of yeck as accents (or entire letters) and things get messy.<!--more-->Nonetheless, it can be a useful technology and it works well in Linux.  One of the things you do in Linux that is different than, say, Windows, is to use brute force and hands on processing with OCR.  This is better than most other solutions because it allows you to make more adjustments and have more control over the process.  It takes more mucking around but you get better results,  can define a work flow for your particular needs, and have more fun.I mean, seriously, how much more fun can you have than running OCR from the command line???I bring all this up because I came across a reasonable overview of how to do it and wanted to share it with you.  It is <a href=\"http:\/\/www.linux.com\/feature\/138511\">here.<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I don&#8217;t think Optical Character Recognition (OCR) works that well, frankly. But it can be done and it can be better than retyping piles of text.It does seem to work nicely when the text is nice and clean on nice clean white paper with a good contract between ink and background and no garbage on &hellip; <a href=\"https:\/\/gregladen.com\/blog\/2008\/06\/25\/optical-character-recognition\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Optical Character Recognition in Linux<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"1","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[164,67],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p5fhV1-Jo","jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/gregladen.com\/blog\/wp-json\/wp\/v2\/posts\/2814"}],"collection":[{"href":"https:\/\/gregladen.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gregladen.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gregladen.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gregladen.com\/blog\/wp-json\/wp\/v2\/comments?post=2814"}],"version-history":[{"count":0,"href":"https:\/\/gregladen.com\/blog\/wp-json\/wp\/v2\/posts\/2814\/revisions"}],"wp:attachment":[{"href":"https:\/\/gregladen.com\/blog\/wp-json\/wp\/v2\/media?parent=2814"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gregladen.com\/blog\/wp-json\/wp\/v2\/categories?post=2814"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gregladen.com\/blog\/wp-json\/wp\/v2\/tags?post=2814"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}