SharePoint Scanning Planning – Part 1 – Storage and Sizing

With SharePoint Scanning and Capture, as with any project, planning is essential to success.  If you are going to use scanning software to send scanned images to a SharePoint Content Database, you need to lay some ground work.  This is the first in a series of planning articles.

One of the key areas of planning for any scanning/capture implementation is sizing and storage.   Many of the customers we work with have no real grasp on the volume of paper they deal with on a day to day basis, and when they make the migration to digitizing their paper, they are often quite surprised at the amount of paper they push through the system.  Obviously, this can cause some serious issues on many different fronts.   So how do you estimate the amount of paper?  There are several key conversion factors used by the document management industry, as outlined below:

Description Number of Pages Storage
1 Scanned Page – 8.5 x 11 1 50KB
1 Scanned Page – 11×17 1 100KB
1 File Cabinet – 4 drawers 10,0000 500MB
1 Box 2500 125MB
1 Linear Inch 100 5MB
1 E Size Engineering Drawing (48×36) 16 – 8.5×11 800KB

This table is a basic planning tool, and can be used as a starting point.  One thing to remember is that these are all standard pages.  Not full image magazine pages, but full text pages.  The other thing to keep in mind is that we have listed for boxes and file cabinets, the average number of pages contained within.  In the imaging world, we deal with images, not pages.  What is the difference?  A page may have 2 sides, which are converted digitally into 2 images.  So effectively, if you have a box with double sided pages you are scanning, you will have to double the storage required.

Some other key factors that can contribute to storage and sizing:

DPI Setting – one of the key questions we always receive is What DPI should I set on my scanner?  For most basic scanning and archive applications, you can set your scanner to 200 DPI.  If you are doing OCR or any type of advanced data extraction, you always want a 300 DPI image for maximum accuracy.  Anything beyond that is just a space killer, will slow down your process and really bloat your files.

Black and White, Greyscale and Color – always use black and white scanning to keep file sizes at an absolute minimum.  Greyscale and color scanning should only be used when absolutely necessary, as file sizes are just crazy.  Below is a table of file sizes for the same letter.  The letter was about 50% page coverage.

Scanning Mode/DPI File Size
Black and White – 200 DPI 26K
Black and White – 300 DPI 38K
Black and White – 400 DPI 51K
Black and White – 600 DPI 80K
Greyscale – 300 DPI 301K
Color- 300 DPI 577K

Image Processing – image cleanup can significantly reduce file sizes, and it is very important to use this feature whenever you can.  Despeckle, deshade, border removal, etc. will eliminate unnecessary noise in scanned images, and reduce your storage requirement by 10-30% depending on the quality of your documents.

Image Format – There is a lot of misinformation on the market about TIFF versus PDF.  I always hear “We want to store as TIFF because PDFs are just too big.”  Just not the case.  An image scanned to PDF is just a TIFF in PDF clothing (Or a PDF wrapper to be more exact).  The PDF overhead is almost negligible.  The de facto standard in imaging today is rapidly becoming the PDF image with hidden text.  This gives you a nice little file with the pristine image, and converted OCR text in the background.  The text layer adds negligible size to the file.

So now, with all this info, you can estimate volume in images, and then come up with required storage on a monthly, yearly or project basis.

Tagged with: , , , ,
Posted in capture, planning, scanning, sharepoint, sharepoint 2010, sizing, storage
5 comments on “SharePoint Scanning Planning – Part 1 – Storage and Sizing
  1. Excelente sumary.
    Pedro Encinas

  2. toce says:

    Nicely done. Looking forward to Part 2 and beyond . . .

  3. […] that I have covered Sizing and Storage in Part 1, and Document Separation in Part 2, now we can start to take a look at scanning […]

  4. […] When imaging to SharePoint or Office 365, you need to make sure you plan for not only storage requirements, but also figure out the loading on your network.  Scanning, if done incorrectly, can great a huge burden on your network and bloat your content databases.  More info here: SharePoint Scanning Storage Planning […]

  5. […] Lack of scanning volume research – many organizations really have no idea how many pages they scan on a monthly or annual basis.  This is critical in the overall scanning project planning and storage design.  The average image file is 10-20 times the size of a word file.  If you really have no feel for your scan volume, work with your copier vendor to grab statistics from the hardware.   If you use dedicated desktop scanners, most drivers will maintain scan counts for preventive maintenance reasons and you can access them quite easily.   So why the counts?  Nothing brings a SharePoint farm to its knees like an organization scanning 10x estimated volume, from both a network traffic perspective, and a backup perspective.  Nee some info, checkout this post all about SharePoint Scanning Storage and Sizing. […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Follow Scanning with Microsoft SharePoint on
BLOG Categories
Current Poll
%d bloggers like this: