PDFDataNet Configurator:
Data Zones

PDFDataNet Configurator: Data Zones Documentation

Please read carefully through the documentation below.

NOTE: To enlarge a screenshot photo – please click on the image.

Section 1

Here you select the template to configure.

NOTE: To enlarge a screenshot photo – please click on the image.

1.1 Template Name

Select the template you want to configure. 

1.2 Refresh

Having made changes to the settings you can your the refresh option to force the reload of the extracted data.

1.3 Save

SAVE YOUR CHANGES!

1.4 Import

Use this option to import settings from an existing template.

1.5 Clear Entries

When you need to start again…..

Section 2

Intro text about this section to go here.

NOTE: To enlarge a screenshot photo – please click on the image.

2.1 Prompt

This is the field prompt taken from the Document Class section.

If you right hand click on the prompt this selects the row and presents options to:

  • Add  – if new fields have been added to the document class after the template was created you can use this option to add them to an existing template.
  • Remove – if a field is not required for a particular template you can remove the field.
  • Clear Data Field – this option clears all the settings from the field.
  • Set Zone – this allows you to first select the field and then draw the zone on the PDF.
2.2 Raw Text

This shows the data collected from the zone on the PDF.

2.3 Value

This shows the value of the field after any transformations, functions, regexes, conversions, etc. have been applied to the raw data.

2.4 Default Value

This option allows you to set a default value for this field for this template. Note that you can set a global field default in the document class section.

2.5 Zone

In the PDF display panel use the mouse to lasoo an area around where you want to extract the data. Then right hand click and you will be presented with various options:

  • Edit Zone – option to resize the zone
  • Remove Zone 
  • Set Zone For – select what variable or the table the zone captures.
  • Set Related Zone – for Anchors where you have selected a ‘Zone’ option.
  • Set Anchor For – for fields that are not in a fixed position you can define an anchor ‘Total Goods Value’ and then define how to extract the data in relation to the anchor (left, right, up, down zone). When zone is used you must then draw the new ‘related’ zone and use the related zone option above to link the related zone to the anchor.
  • Set Zone For Anchor – in some situations the desired anchor text is not unique within the PDF and this option allows you to define an area within which the program will serch for the anchor.

 

2.6 Anchor

The three blue dots indicate that there is a zone setting for this field. By double clicking on the blue dots you can view and amend the stored settings.

2.7 Advanced

The three blue dots indicate that there is aare advanced settings for this field. By double clicking on the blue dots you can view and amend the stored settings. The advanced settings are as follows:

Regex Capture: – this option is used to specify a regex to select data matching a particular format. Regex is a very powerful toll that can be used for example to extract a post code from an address block. The post code regex would look like:-

([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA|SK3 OSB)$

a simpler regex to extract a customer’s product code in the format  3 alpha 5 numeric from a line description block would be [A-Z]{3}[0-9]{5}

Explaining how to create regexes is beyond the scope of this documentation but there are lots of web sites on reating and testing regexes includung    https://www.regular-expressions.info   https://regex101.com

Conversion

This option is a simple method to convert strings from one format into another.

From: $|€|£
To:      USD|EUR|GBP

Cleanup

Cleanup allows you to perform simple or complex operations on the captured data.

For a list of functions see https://pdfdatanet.com/syntax-guide/

Function:  Extract(Value,”-“,3)

Function: TranslateUsingCSVFile(Value,”c:\oic\ProductLookup.csv”,”NOT FOUND”,true,1,2)

Validation Rule

This option is used to specify rules that ensure the data captured from within the PDF has been collected correctly.

Rule:  GoodsValue=ColumnTotal(“InvoiceLines”,”LineTotalPrice”)
Error Message: The Goods Value must equal the total of the line totals

GoodsValue+VATTotal=InvoiceTotal
Goods Value plus VAT Total does not equal the Invoice Total

This option is very important when processing the PDFs as a windows servce without any user confirmation of the extracted data.

 

 

 

Section 3

In this section we define how data is extracted from tables.

NOTE: To enlarge a screenshot photo – please click on the image.

3.1 Product

Use the PDF viewer panel to draw (lasoo) the area of the table. Right click and use the ‘Set Zone For’ and select Table and then select the required table from the drop down list. Tables are defined in the document class section.

3.2 Clear

Use this option to start again.

3.3 Detect

Use this option to automatically determine the multi-line columns.

3.4 First / Next / Last Page

These options can be used when a PDF has different page layouts on the frist, continuation and final pages.

3.5 Anchor Text (Start Of Table)

The string that defines the start of the table (usually a unique column heading)

3.6 Header Height

Use this option to exclue the header rows from the collected data.

3.7 Use Zone

As an alternative to using anchor text, this option can be used to define a zone for the table.

3.8 3.5 Anchor Text (End Of Table)

The string that defines the end of the text.

3.9 Ignore Col. Footer

Don’t include the data at the base of the table.

3.10 Footer Height

No of rows to ignore at the base of the table.

3.11 Watch Column

The variable selected here defines which column controls the start of a new line. Note that the use of a regex in the ‘Advanced’ section can also define when a new line starts.

3.12 Ignore Inter-row Data

Defines that it in only the data on the new line row that should be collected.

3.13 Inner Watch Column

Used when there are new lines within lines.

3.14 Not Used

Specifies if the inner watch column setting is used.

Section 4

In this section the raw collected table data is shown. 

NOTE: To enlarge a screenshot photo – please click on the image.

4.1 Column Name

 The prompt of the line varibale being collected.

Click on the row to see the position of the column in the PDF viewer panel. You can move or adjust the width of the column using the frame controls.

A right click on a row shows you the following options.

Set Column – define the variable to be collected from this column
Split Column – divide the column in two.
Add Column – add a new column
Remove Column – delete the selected column
Set Columns by Default – recalculate the columns.

 

 

4.2 Ignore Inner-row

Only collect the data for this variable from the first row.

4.3 Values

Displays the collected raw data.

Pin It on Pinterest