Program Scripts Library

See below some of the program scripts that can be used/exist within our PDFDataNet Configurator program.

Regex Extraction Patterns

Program Scripts

Regular expressions (regex) is a language used for pattern-matching text content. Using these expressions within our program will allow you to narrow down the data you extract from your chosen fields. We have included some commonly used regex examples below – for further guidance please visit these links:

General

In the Regex column please ignore the open and closed "<" ">" brackets, you just need what's between this.

Regex Test Text Result
< [A-z ]+ > MY TELEPHONE number is: 0123456789

MY TELEPHONE number is

Returns all text (with spaces)

< [0-9]+ > MY TELEPHONE number is: 0123456789

0123456789

Returns all numbers

< [0-9]+[.][0-9]{2} > The 4th cost is £14.25

14.25

Returns numbers with decimals up to places

< (?!0*(.0+)?$)([0-9,.]+)$ > The unit price is 0.01

0.01

Returns decimals from 0.01 onwards (ignores 0.00)

< ([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))s?[0-9][A-Za-z]{2}) >

9 Bank Road, Kingswood, Bristol, BS15 8LS, United Kingdom

BS15 8LS

Returns UK postal codes

< d{5}(?:[- ]?d{4})? > 23 N Santa Cruz Ave, Los Gatos, CA 95030, United States

95030

Returns US zip codes

< [0-9]{2}/[0-9]{2}/[0-9]{4} > The date today is 02/04/2024

02/04/2024

Returns dates in this specific format: dd/mm/yyyy or mm/dd/yyyy

< b[w-.]+@([w-]+.)+[w-]{2,4}b > My email is test@test.com

test@test.com

Returns email addresses

< s?(?:+?1[-.? ]?)?(?([0-9]{3}))?[-.? ]?([0-9]{3})[-.? ]?([0-9]{4})s? > My contact number is +14083993880

+14083993880

Returns US phone numbers

Functions

Program Scripts
Capitalise

In the Function column please ignore the open and closed "<" ">" brackets, you just need what's between this.

Function < Capitalise(RawValue) >
Purpose Changes lowercase letters to uppercase, for the first letter of each word.
Test Text this is our website
Result This Is Our Website

Date

In the Function column please ignore the open and closed "<" ">" brackets, you just need what's between this.

Function
  1. < Date(RawValue):dd-MM-yyyy >
  2. < Date(RawValue):ddd >
  3. < Date(RawValue):dddd >
  4. < Date():HH:mm >
Purpose This is used to configure the format for the captured date. You can remove "RawValue" from any of the examples above for it to use today's date (for point 4 this returns the time, hour and minute, and therefore only works for today's date).
Test Text 12/08/2020
Result
  1. 12/08/2020
  2. Wed
  3. Wednesday
  4. 16:12

12/08/2020

 

ExtractByRegex

In the Function column please ignore the open and closed "<" ">" brackets, you just need what's between this.

Function < ExtractByRegex(RawValue,'([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))s?[0-9][A-Za-z]{2})') >
Purpose Returns any regex, in this case it will extract all UK postal codes.
Test Text 9 Bank Road
Kingswood
Bristol
BS15 8LS
United Kingdom
Result BS15 8LS

ExtractM

In the Function column please ignore the open and closed "<" ">" brackets, you just need what's between this.

Function < ExtractM(RawValue,"|",2) >
Purpose Extracts only line 2 (the | symbol indicates a line break within PDFDataNet) . If you wanted to extract line 2, 3 and 4 you could use: < ExtractM(RawValue,"|",2,4) >
Test Text 9 Bank Road
Kingswood
Bristol
BS15 8LS
United Kingdom
Result Kingswood

ExtractMW

In the Function field please ignore the open and closed "<" ">" brackets, you just need what's between this.

Function < ExtractMW(RawValue,"PDFDataNet Services:","Try") >
Purpose Extracts data from one word/phrase up to another.
Test Text This is a piece of test text. PDFDataNet Services: Extracting text from documents. Try our product today.
Result Extracting text from documents.

GetFilename

In the Function column please ignore the open and closed "<" ">" brackets, you just need what's between this.

Function
  1. < GetFilename(SourcePath) >
  2. < GetFilenameWithoutExtension(SourcePath) >
Purpose Returns the file name of the source file path, with or without the file extension. You can also replace "SourcePath" with "OutputPath".
Test Text PDN-INVOICE-001.pdf
Result
  1. PDN-INVOICE-001.pdf
  2. PDN-INVOICE-001

12/08/2020

 

RemoveSpaces

In the Function field please ignore the open and closed "<" ">" brackets, you just need what's between this.

Function
  1. < RemoveSpaces(RawValue) >
  2. < RemoveExtraSpaces(RawValue) >
Purpose Returns a string that is stripped of all spaces. You can also use the second option to remove trailing and leading spaces and sequences of 2 or more spaces from a string.
Test Text < This is a line of text. >
Result

Replace

In the Function field please ignore the open and closed "<" ">" brackets, you just need what's between this.

Function < Replace(Value,'Company','Corporation') >
Purpose Replaces the first text with the second piece of text.
Test Text O Imaging Company
Result O Imaging Corporation

Ternary

In the Function field please ignore the open and closed "<" ">" brackets, you just need what's between this.

Function < Ternary(RawValue="OIC","OIC001",RawValue)>
Purpose Returns a string from choices decided by conditions. The value up t0 first comma is for setting the condition. The value up to the second comma is what will be returned if the condition is true, whereas the last value is what will be returned for a false condition.
Test Text
  1. OIC
  2. PDF001
Result
  1. OIC001
  2. PDF001

Rules

Program Scripts

Validation rules are used within our program, set within your chosen field, to validate the set expression. When the expression is not validated an error will flag up within our program, for your attention.

Origami

Program Scripts

Need support?

Visit the portal.

Pin It on Pinterest