So you want to use Microsoft SharePoint as storage for scanned images? Take a quick breath and don’t charge in too fast, as there are many facets of this type of project that need to be considered.
What type of volume are you scanning on a daily basis?
You need to take a deep dive into departmental and end user needs, and really look at the volume of pages they need to image and capture. This brings up a point I discus on a daily basis: Do you want to scan or capture? You may read this and say, what in the world are you talking about, but here is an explanation below:
Let’s create a definition and define a feature set for scanning applications. A scanning application is just a means to take paper, and quickly and easily convert it from paper to digital form. They are well suited to environments with very basic needs, and what I call “onsie-twosie” scanning, or low volume environments. Their feature sets provide very basic functionality, and may allow the use of basic separation, and very basic integrations with SharePoint. The majority of scanning hardware vendors bundle these applications with their hardware, although there are vendors that have taken it to the next level, and provide enhanced scanning capabilities beyond the typical bundled software.
Document Capture software can be utilized for basic scanning needs, but takes you to a whole new level from a “capture” perspective. These applications typically have a number of ways to “slice and dice” documents, and really focus on efficiency, and minimizing the time required to scan, index and capture data. Capture software provides numerous ways to automatically populate columns, including barcode reading, database lookups, OCR, and data extraction. True capture applications provide integration with scanners, folders with images, SharePoint Web Dav folders, etc. Any organization that is serious about processing paper documents, and want to do it in the most efficient, standardized manner, should look seriously at advanced capture applications.
Capture applications are typically well suited to high volume situations or in situations where data can be extracted automatically. Scanning applications are suited for very simple operations, and usually suited to low volume.
What type of scanning device(s) are you going to utilize?
There are only a few applications out there that will provide you with the ability to scan from any type of device. Are you going to use network based scanning devices or direct connect scanners? Look into support in these specific areas:
• What type of drivers are supported? ISIS, TWAIN, and VRS should all be allowed.
• Can hot folder functionality provide the auto-import and processing of all different image types, PDF included? Hot folder functionality should span local, network and WebDav folders.
Beware of “panel” based applications. They are typically very static, and can provide a line at the MFP/Copier as people are entering information about their documents at the actual device.
What output format do you want in the SharePoint libraries?
Scanning and capture applications today provide a broad array of image output formats, but the standard seems to be PDF Image with Hidden Text. This provides an all in one container for the original image and the searchable text. Install the PDF iFilter, and you have a searchable content store. There are some specialized usages that may require other formats. For instance, if you are importing JPEGs with EXIF tags with your advanced capture application, you will want to keep the original JPEG file with tags intact rather than performing a conversion.
What Scanning and Capture features will be necessary in your environment?
What features should you look for? This is the most difficult question of them all, and you really need to find an application that has a broad and expansive feature set to make sure you can cover today’s needs, and the needs of your organization in the future. This BLOG post is a great place to start:
Trends in Scanning and Capture
Just a few stats here to get you on your way:
• The standard scanned page can be estimated at 50K in size (at 300DPI)
• A file cabinet contains between 10,000 and 12,000 pages
This can give you a quick idea of how much storage will be required, and let you do some growth estimation over time.
You should also use these numbers to see if you should use the SharePoint DB for content storage, or utilize Remote BLOB Storage (RBS). SharePoint 2010 with SQL 2008 R2 allows this without the need for additional software through the FILESTREAM provider.
How will I view images once they are in SharePoint?
Without a viewer add-on, SharePoint will require you to open an image to view pages. This can be problematic if you are serving up large image files. Definitely take a look at some of the image viewer add ons to SharePoint. My favorite, VizitSP SharePoint Viewer, provides the ability to view/preview, annotate, image process, search (column based and full text) and have multiple images open in a tabbed view. This is an absolute necessity if you are going to give end users the best experience possible.
Just some questions to get the gears turning and make sure you get all the pieces to the puzzle.