Hi have a question somebody here may know the answer to.
A colleague of mine is scanning back issues of journal she edits for online publication. She is using PDF with OCR to provide full-text searchability a la JSTOR. The issue is that the file sizes are really quite different. A 30pp article from a 1920s issue of Speculum, for example, seems to come in about 1.5-2.0 MB; 5-6 page article in my colleagues journals are coming in about the same size, and other files are well over 4 MB.
I haven't seen the settings used for the scanning or OCR yet, but the JSTOR and her files appear to be about the same resolution (eyeballing the page size when things are set to 100%). They look like they are being scanned in B&W, but I haven't checked (perhaps a colour channel is adding to the bulk?). Any other suggestions for things that might be causing the files to be abnormally large?
-dan