Pdfsandwich 是将文本添加到图像形式的文本 PDF 文件 (如扫描书籍) 的工具。它使用光学字符识别(OCR)创建一个额外的图层,包含了原始页面已识别的文本。这对于复制和处理文本很有用。
Pdfsandwich 是一个命令行工具,与同类的软件相比,它在扫描图像时执行了预处理程序,如版面校正和去除黑边等。
最终的识别结果
Visionaries I I7 and silver ligree ornaments ; gold and silver ower-stands, etc. ; elaborate coloured patterns of carpets in brilliant tints are not uncommon. Another peculiarity resides in the extreme restlessness of my visual objects. It is often very difficult to keep them still, as well as from changing in character. They will rapidly oscil- late or else rotate to a most perplexing degree, and when the characters change at the same time a critical examination is almost impossible. When the process is in full activity,l feel as if I were a mere spectator at a diorama of a very eccentric kind, and was in no way concerned with the getting up of the performance. When a. succession of images has been passing, I sometimes alez ermz’ne to introduce an object, say a watch. Very often it is next to impossible to succeed. There is an evident struggle. The watch, pure and simple, will not come; but some hybrid structure appears something round, perhaps but it lapses into a warming-pan or other unexpected object. This practice has brought to my mind very clearly the dis- tinction between at least one form of automatism of the brain and volition; but the strength of the former is enormous, for the visual objects, when in full career of the change, are impera- tive in their refusal to be interfered with. [… ]
SVN Checkout
svn checkout svn://svn.code.sf.net/p/pdfsandwich/code/trunk/src pdfsandwich