PDF Wasteland: Are Your SharePoint PDFs Barren?

Metadata and PDFs in SharePoint: Rules to Follow

Scan to SharePoint PDF Rules

Barren PDFs?


PDFs have become the standard in many organizations for archiving files as records.   Whether you are scanning paper files to SharePoint for long-term archival, or converting your Office documents to PDF / A for long-term storage, there are some key things you need to know.  From a scanning perspective, most scanners just produce an image based PDF, barren if you will of all metadata.  PDFs are a rich format that can become a long-term “suitcase” of metadata for storage and information.  Here are some tips on how to make your PDFs complete records:

1.  Make sure your Scanning or PDF converter supports the PDF /A standard.  PDF /A is a long-term archive standard for image files.  It ensures the viability of the file in the long-term, allows embedding of metadata and can prevent alteration of the record.  This is a must for any long-term archival of documents.  For a summary on the PDF Archive standard, see Adobe’s summary PDFs for Long Term Archive

2.  Make sure to Populate the Standard PDF Headers.  When creating a PDF through a document capture or conversion process, make sure you populate the PDF headers with metadata.  The standard headers include: author, subject, keywords and title.  Populating these fields can speed up searching and indexing, and makes sure critical information is secured about the record.  Below is an example of an invoice that was scanned with a document capture application where the standard headers were packed with PDF information:

PDF Headers in SHarePoint

PDF Standard Headers

3.  Build Complete Custom Headers for SharePoint Metadata.  Advanced conversion software will build out custom PDF header information, and allow you to “tag” your documents.   With this, the PDF can now become a redundant container for SharePoint Metadata column information with column name and metadata values.  This is the ultimate in metadata packing, and creates a true portable PDF with all pertinent information.  Below is an example of custom headers or properties, where invoice number, date, total and vendor are entered:

SharePoint PDF Custom Data

PDF Custom Metadata

4.  Always create PDFs that include OCR Text.  Using an Optical Character Recognition (OCR) process will convert the image in the PDF into searchable text that can be crawled by SharePoint for the ultimate in searchability.  This is a must for all documents.


Did I miss anything?  Please comment with anything I missed.




Tagged with: , ,
Posted in PDF, PDF/A, sharepoint

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Follow Scanning with Microsoft SharePoint on WordPress.com
BLOG Categories
Current Poll
%d bloggers like this: