Printed Paper to Digital Flowing Text
Digitization is a multistage process
Turning a paper book into an e-book or flowing-text document is an involved process. In this post we look at the text aspect, and leave aside questions of graphics. Reproduction/recreation/adaptation of imagery is a much more complex art. So far, the attempts to produce flowing-text files via automation have led to very poor results. Even if you begin with a machine-generated text file, you only bypass steps 2 and 3 below. Assuming that you do not have such files, this is the process:
1. Make certain that you have the rights to work upon the paper you intend to convert.
2. Scan the book, or document, and create a searchable pdf file (a pdf with an Optical Character Recognition-generated text layer) using special software with which to do this. To create seachable pdfs, I use the pictured CzurTek ET16 and its accompanying software. All of the following steps up through step 9 involve constant checking with the original document and going through your new text document again and again. I make it a habit to immediately correct any error I find at the moment I first see it. Via this process, as you go through the document page by page, again and again, eventually by the time you reach your final proofing of the text, hundreds or thousands of problems will have been corrected and there should be very few errors left to repair.
3. Copy the searchable text from each page of the pdf into a word processing document.
4. Strip out the hard returns.
5. Recreate the paragraphing and chapter header styles.
6. Recreate other bold and italic texts.
7. Move footnotes to chapter endnotes.
8. Use spell check to find obvious misspellings, remove hyphens, etc.
9. Scan, adapt, and insert the artwork and graphics to get the best possible display in small devices. Digitizing images involves many other problems and processes which I skip over here.
10. Finally, slowly read the entire book and correct any remaining anomalies.
11. Convert the word processing file into an epub file using Calibre software.
12. Test the epub file with the Pagina ePub Checker program.
13. Correct any errors pointed out by Pagina.
14. Test the ePub in an e-book reader.

