With SharePoint Scanning and Capture, as with any project, planning is essential to success. If you are going to use scanning software to send scanned images to a SharePoint Content Database, you need to lay some ground work. This is the first in a series of planning articles.
One of the key areas of planning for any scanning/capture implementation is sizing and storage. Many of the customers we work with have no real grasp on the volume of paper they deal with on a day to day basis, and when they make the migration to digitizing their paper, they are often quite surprised at the amount of paper they push through the system. Obviously, this can cause some serious issues on many different fronts. So how do you estimate the amount of paper? There are several key conversion factors used by the document management industry, as outlined below:
|Description||Number of Pages||Storage|
|1 Scanned Page – 8.5 x 11||1||50KB|
|1 Scanned Page – 11×17||1||100KB|
|1 File Cabinet – 4 drawers||10,0000||500MB|
|1 Linear Inch||100||5MB|
|1 E Size Engineering Drawing (48×36)||16 – 8.5×11||800KB|
This table is a basic planning tool, and can be used as a starting point. One thing to remember is that these are all standard pages. Not full image magazine pages, but full text pages. The other thing to keep in mind is that we have listed for boxes and file cabinets, the average number of pages contained within. In the imaging world, we deal with images, not pages. What is the difference? A page may have 2 sides, which are converted digitally into 2 images. So effectively, if you have a box with double sided pages you are scanning, you will have to double the storage required.
Some other key factors that can contribute to storage and sizing:
DPI Setting – one of the key questions we always receive is What DPI should I set on my scanner? For most basic scanning and archive applications, you can set your scanner to 200 DPI. If you are doing OCR or any type of advanced data extraction, you always want a 300 DPI image for maximum accuracy. Anything beyond that is just a space killer, will slow down your process and really bloat your files.
Black and White, Greyscale and Color – always use black and white scanning to keep file sizes at an absolute minimum. Greyscale and color scanning should only be used when absolutely necessary, as file sizes are just crazy. Below is a table of file sizes for the same letter. The letter was about 50% page coverage.
|Scanning Mode/DPI||File Size|
|Black and White – 200 DPI||26K|
|Black and White – 300 DPI||38K|
|Black and White – 400 DPI||51K|
|Black and White – 600 DPI||80K|
|Greyscale – 300 DPI||301K|
|Color- 300 DPI||577K|
Image Processing – image cleanup can significantly reduce file sizes, and it is very important to use this feature whenever you can. Despeckle, deshade, border removal, etc. will eliminate unnecessary noise in scanned images, and reduce your storage requirement by 10-30% depending on the quality of your documents.
Image Format – There is a lot of misinformation on the market about TIFF versus PDF. I always hear “We want to store as TIFF because PDFs are just too big.” Just not the case. An image scanned to PDF is just a TIFF in PDF clothing (Or a PDF wrapper to be more exact). The PDF overhead is almost negligible. The de facto standard in imaging today is rapidly becoming the PDF image with hidden text. This gives you a nice little file with the pristine image, and converted OCR text in the background. The text layer adds negligible size to the file.
So now, with all this info, you can estimate volume in images, and then come up with required storage on a monthly, yearly or project basis.