Automatically Assigning Templates
Automatically assigning templates – for installations where the requirement is to process large (100s – 1000s) of documents on a daily basis we must automate the allocation of the correct template to the source PDF. This article covers the various options that exist to automate this part of the process. For the purposes of the article we will assume that the PDFs arrive as email attachments.
Let’s start with a scenario where we have PDFDataNet processing PDFs coming from a number of different companies (suppliers/customers (partners) Alpha Ltd, Beta Ltd, Gamma Ltd…) into a single company (Our Company Ltd.).
Option Using An Email Macro
We can ask our partners to email their sales orders/invoices/remittance advices etc to a designated email address (email@example.com) and an email macro can monitor this email address for new documents.
In the initial stages of deploying and testing PDFDataNet it will probably be our inhouse customer services/accounts teams that forward messages onto the email address but eventually we will instruct our partners to use that email address when sending us their documents.
So the email macro will have access to the original sender’s email address – firstname.lastname@example.org, email@example.com, firstname.lastname@example.org – and we can use the senders email address to set the template that should be used to extract the data from the attached PDF. This can be done in two ways. The first is to simply copy the attached PDF into a folder structure such as:-
//servername/PDFs to be processed/Alpha Ltd
//servername/PDFs to be processed/Beta Ltd
//servername/PDFs to be processed/Gamma Ltd
The PDFDataNet program (typically running as a Windows service) can be configured to associate any files in a particular folder to a specific document template. We now have the PDF and the correct template to use and we can automatically extracted the required data.
Another option is that the email macro creates a simple XML Control File to instruct PDFDataNet on how to process the received PDF. The XML control file will have the following format.
<?xml version=”1.0″ encoding=”utf-8″?>
<PDFSourcePath>\\servername\PDFs to be processed\INV435263.pdf</PDFSourcePath>
The advantage of the XML option is that if you have 1000s of partners you do not have to maintain the folder structure and the XML option offers more control over what happens to the PDF during and after processing. Note also that we can set specific instructions within the XML file such as where to move the PDF on error. An error in this case would be defined within the template such as there must be our PO number or an invoice, the total of the order line values must equal the total goods value, etc.
And a couple of others points to consider:-
- if PDFDataNet is deployed at a group level and is processing documents for multiple companies within the group then variables can be passed within the XML file to define the relevant company.
- if PDFDataNet will process different types of documents (sales oders, AP invoices, quote requests, etc.) then you should consider having a mail box for each different document type – email@example.com, firstname.lastname@example.org, email@example.com, etc
Option Using Automtaic Document Recognition Module.
The email macro option offers the simplest method of determining the correct template to apply to a PDF but if a macro is not available then we can use the PDFDataNet Preprocessor Engine to automatically identify the type and source (i.e. partner) of the PDF and assign the appropriate document template.
The PDFDataNet Preprocessor Engine uses an Automatic Document Recognition module that can be trained to recognise the document. The recognition can be trained to distinguish between an invoice or a credit note from the same supplier even when the only difference on the document are the words ‘Invoice’ or ‘Credit Note’.
This is a very powerful option but does require manual configuration and therefore requires more ‘work’ long term than the email macro option.
For very small volumes of documents a simple option is for the user to drag the PDF into the document indexing program and then select the template to use. With this option the user can then view and amend the extracted data before passing the data onto the downstream process for uploading or importing into your system.