ReportMiner Tutorial
ReportMiner
Tutorial
Table of Contents
Overview3
Creating a Report Model
3
Extracting Header Data
5
Adding Fields 7
Renaming Fields
8
Creating a Data Region
8
Creating a Collection Region
9
Saving and Testing a Report Model
11
Data Statistics and Summary
11
Exporting Data
12
Selecting Fields and Regions
13
Managing Field and Region Properties
13
Renaming Fields and Regions
14
Deleting Fields and Regions
14
Customizing Fields
14
Identifying Text Patterns for Regions
15
Using Dataflows 15
Creating Dataflows from Export Settings 17
Page 2
Overview
Key Features
In this tutorial, we will explore the features of ReportMiner. ReportMiner’s new and
• Extract information from documents in
improved interface enables business users with little or no technical background to
popular formats such as PDF, PRN, TXT, XLS,
easily accomplish a wide range of data extraction tasks without employing expensive
XLSX Map and export data to a plethora of
IT resources.
destinations, including databases like SQL
Server, Access, MySQL, PostgreSQL, and
To extract data from a printed document, called data mining or report mining, you
any ODBC-compatible database, as well
first need to create a report model that contains the definition of the report’s struc-
as formats such as fixed length, delimited,
ture and then export it to your destination of choice. You can also use your report as
Excel, and XML
a source object in a dataflow, where you can take advantage of the advanced transformations and conversion features of ReportMiner. Let’s demonstrate how this can
be accomplished.
• The single-click preview capability shows
extracted data and any conversion or validation errors, enabling users to verify and test
Creating a Report Model
A report model normally has a data region and fields belonging to this region. Depending on the structure of the data, you can create a separate Header and Footer,
report models as they are being built
• Save time by reusing report models for
subsequent conversions
and append regions with their own fields. ReportMiner supports true hierarchical
data extraction such that a data region can have child data regions and the child
regions can have their own children and so on.
• The Astera high-performance parallel-processing engine processes large data volumes quickly and efficiently
To create a new report layout, go to File -> New and select Report Model (Figure 1).
• Map extracted data to a dataflow and take
advantage of advanced transformations and
conversion features for a number of data
integration and management processes
Figure 1
Page 3
ReportMiner supports extracting unstructured data from text, EDI, Excel, PRN, and PDF files. All file types fall under the content type
Report except for Excel, which has its own content type (Figure 2).
Figure 2
Select the data file to be used as a sample file. We will use data from this file to create our report model. Depending on the content
type of your data, reading options will change. For example, if you have a PDF file, you can select the scaling factor, font, tab size, and
passwords.
We selected a sample data file for Orders as shown in the screenshot below. The selected file is loaded into the Report Definition Editor
(Figure 3).
Figure 3
Page 4
Note: You can also load a different data file in the report definition editor at a later time.
Click the
icon on the toolbar and navigate to the file you want to load.
Let’s take a look at this report. At the top of our sample is general order information, such as Company Name, Order Date and Time,
Customer Name, Account Number, and others. Following it is the detailed order information, such as the order items making up the
order.
If you are interested in extracting header data, please read through the next section, Extracting Header Data. Otherwise, to learn only
about extracting order records, you can jump to the Adding Fields section.
Extracting Header Data
Our sample report has two logical regions, the Header region and the Data region. Unlike some other common reports, this report has
no Footer region.
The Header is at the very top of the report, spanning three lines starting at the line with the order date (Figure 4).
Figure 4
So the first step in creating our report model will be to define the Header for our report.
In the Report Definition Editor, select the top three lines. This is the area that covers the Header. Right click on your selection and using
the context menu select one of the following options, shown in the context menu in Figure 5.
Figure 5
Page 5
Since we are creating the Header, select Add Page Header Region.
The Report Browser on the left hand side now shows a new node called Header (Figure 6).
Figure 6
Now, let’s take a closer look at the Header. The Header in our sample always starts with a date, shown at the very first line and in the
very first character position of the Header. We can use the date as an identifying pattern for the header.
Any time the
pattern occurs in the file, ReportMiner will treat it as the beginning of the Header.
Let’s enter the
wildcard characters denoting digits, as shown in Figure 7.
Figure 7
Any time this pattern occurs inside the file, ReportMiner will treat it as the starting point of the Header.
Notice that the Report Definition Editor now highlights the header in purple. The Header spans three lines, as shown by the purple block
in the editor. The height of the Header or any other region (i.e., the number of lines that the header spans) is controlled by the Line
Count input below the Report Toolbar.
The next step is to create the fields making up the Header.
Page 6
Adding Fields
There are two ways to create fields.
1. Highlight a field, right click and select Add Field (Figure 8).
Figure 8
2. Right click within the header area, and select Auto Create Fields.
ReportMiner will scan the sample data and identify any changing values within any occurrences of the header. These changing values
will be marked as fields.
In our example, the Auto Create Fields feature added five fields. They are now displayed in the Report Browser under the header node.
Notice that our new fields are also highlighted in darker purple in the Report Definition Editor (Figure 9).
Figure 9
The fields created this way are assigned unique names, such as Field_0, Field_1, and so on.
Page 7
Renaming Fields
You can rename a field if needed. Let’s rename our newly created fields to make them more descriptive. We can use any of the three
methods described below.
1. A field in the Report Browser, double click and enter the new name
2. Select a field in the Report Browser, right click it and select Rename
3. Select a field in the Report Definition Editor (the selected field is highlighted in yellow), right click, and select Rename from the context
menu. The selected field is always highlighted in yellow in the Report Definition Editor.
We can also change the field’s data type, if needed. In our example, ReportMiner correctly assigned field data types from our sample
report (Figure 10).
Figure 10
Creating a Data Region
Now that we created the definition of the Header, let’s look into the main region of the report. As we saw earlier, the main region starts
with the Customer Name and then includes Account Number, Contact Name, and, finally, specific order details. Let’s assume that we are
interested in extracting only the order details and order items for the respective orders.
Let’s select the order lines, then right click it and select the Add Data region from the context menu (Figure 11).
Figure 11
Page 8
This will add a new Data node in the Report Browser. This new node has no fields at this point (Figure 12).
Figure 12
Now we will identify the using appropriate masks. In this case, it’s easy to identify orders as they always start with ORDER ID: at the same
position. Place the cursor at the position where the text ORDER ID begins, as shown in the screenshot, and enter ORDER in the pattern
text input (Figure 13).
Figure 13
The Report Definition Editor highlights any occurrences of the Data region in report. Remember that we can easily adjust the height of
the region by using the Line Count input.
Let’s rename our region Order. Now our report has two regions: Header and Order.
Now, let’s identify the fields making up the Order region. The Order region has two fields – Order ID and Ship Date. Let’s add these fields
to the region. If needed, scroll back to the Adding Fields section where we talked about adding fields in the context of a Header region.
Creating a Collection Region
Next, let’s take a closer look at the Order region. Notice that each customer can have one or more orders, and each order may have
several order items in it. In ReportMiner terms, we say that the region has a collection of items, or to put it simply, a Collection.
Let’s add order items to the Order. After selecting the Orders node in the model, we select a row underneath the order that represents
an order item and then right click it and select Add Data Region from the context menu.
Page 9
We can identify this region by the repeating pattern of item code. We are going to use a data mask in the text pattern input to match
with the item code. To that end, enter Match Any Alphabet three times, followed by a hyphen, and then Match Any Digit five times as
shown in Figure 14.
Figure 14
Whenever a node has a collection of items, we need to turn on its Is Collection property as shown in Figure 15. Notice that the appearance of the icon for the Item node in the Report Browser changes to help identify this node as a collection. When we add a Collection
Data Region via the context menu, the Is Collection property is enabled automatically.
Right click anywhere within our region, and select Auto Create Fields. This creates the Order Number field and the Ship Date field,
named Field_0 and Field_1 respectively. Let’s give these fields more user-friendly names. After assigning proper names, the model is
completed and looks as shown in Figure 15.
Figure 15
Page 10
Saving and Testing a Report Model
Report definitions are used by ReportMiner to correctly parse, interpret, and assign data as it is fed from the report source. Report definitions are assigned an *.rmd extension.
Let’s save our report model by clicking the Save icon on the main toolbar. Now we can test the model by previewing our data to see
how it is parsed by ReportMiner.
To test the model and preview the extracted data, click the
icon on the top toolbar.
This opens the Data Preview window, showing the entire report structure with the actual values for all the fields we have defined above
(Figure 16).
Figure 16
Data Statistics and Summary
ReportMiner enables users to verify the summary of extracted data fields like sum, average, count, etc.
To view detailed statistics of extracted data, click on the
button in the toolbar.
The Quick Profile window will open with detailed statistics of extracted data as shown in Figure 17.
Figure 17
Page 11
Exporting Data
ReportMiner exports data to any destination you choose. You can export data to Excel, delimited files, and fixed length files, or a to a
database such as Microsoft SQL Server, Access, PostgreSQL, and MySQL.
For example, if you wish to export data to Excel, click on the
button in the Model Layout toolbar. An export wizard will pop up
and walk you through the steps to configure the export.
In the first screen, you will choose the output file location. Clicking on the next button will take you to the layout grid that shows all the
fields to be exported, their sequence, header text, and the source field used to extract data from the source file. When you click on OK,
the wizard screen will close and begin the extraction. You can see the progress in the progress window (Figure 18).
Figure 18
The Data Export Settings window is also highlighted and a reusable export setting is added to the list. You can manage your reusable
export settings in this window. You can edit existing settings, remove them, or add a new one. You can trigger a fresh transfer from this
window as well.
After the export has finished you can see the progress and a link to the destination file as well as the log file. If your transfer encountered any errors, you can click on the hyperlink for the log file and view the error log.
Page 12
In this case, the transfer was successful and the output Excel file is shown in Figure 19.
Figure 19
You can create transfer settings and export data to delimited files or databases using the
and
toolbar buttons respec-
tively.
Let’s now take a look at some additional functionality that ReportMiner offers to help you customize your extraction.
Selecting Fields and Regions
To select a field, left click on it in the Report Browser’s tree. The field is highlighted in yellow in the Report Definition Editor. Some of the
more common field properties are displayed in the top pane of the editor (Figure 20).
Figure 20
To select a region, click on it in the Report Browser’s tree. The region is highlighted in light purple in the Report Definition Editor, and
the fields in the selected region are also highlighted in darker purple. The top pane shows the properties that are applicable for the
region.
Managing Field and Region Properties
To view and update all other properties of a field or a region, right click on a field (or region) inside the Report Browser, and select Edit
Field (or Edit Region) from the context menu.
The same functionality is also available on the top toolbar, by pressing the
icon.
You can also access field properties by right clicking the field in the Report Definition Editor and selecting Field Properties from the
context menu.
Renaming Fields and Regions
To rename a field, double click it on the tree in the Report Browser and enter a new name.
To rename a region, double click it on the tree in the Report Browser and enter a new name.
You can also rename a field or a region by entering the new name in the Name input on the top pane.
Deleting Fields and Regions
To delete a field, right click it in the Report Browser or Report Definition Editor and select Delete Field.
To delete a region, right click on a region (or a field inside the region) and select Delete Region from the context menu. Note that this
action will also delete any fields in that region.
Customizing Fields
After your field has been created, you can change its start position by moving it a number of characters to the left or to the right. Right
click on a field and select Move Field Marker Right One Character or Move Field Marker Left One Character from the context menu. Repeat as needed to move the field the desired number of characters.
Note that the same functionality is also accessible from the top toolbar via the
and
icons accordingly.
You can also change the field length by selecting Decrease Field Length By One Character and Increase Field Length By One Character
from the context menu. Repeat as many times as needed to change the field length by the desired number of characters.
Note that the same functionality is also accessible from the top toolbar via the
and
icons.
To auto determine field length based on the available sample data, right click a field and select Auto Determine Field Length from the
context menu. Or click the
icon on the top toolbar.
Alternatively, you can also move all fields within the same region left or right by a specified number of characters. To do this, right click
on a region or field and select Move All Field Markers Left One Character or Move All Field Markers Right One Character. You can also use
the
and
Note: To undo
icons on the top toolbar.
any action in the editor, use the Undo dropdown menu on the toolbar or press CTRL + Z.
Page 14
Identifying Text Patterns for Regions
The following options are available to help you create a text pattern that will identify the starting point of a field or region.
Match any alphabet
Match any digit
Match any alphabet or digit
Match any non-blank
Match any blank character
For example, to match the date 12/15/2011, you can use the pattern
where
is “match any digit.”
Using Dataflows
ReportMiner enables users to build and run dataflows. A dataflow is a graphical representation for sources, destinations, transformations, and maps. Report models can be used as sources in dataflows in order to leverage the advanced transformation features in
ReportMiner. Let’s add the report model to a dataflow so we can read the entire source report and feed it to a destination object.
Go to File -> New -> Dataflow. This creates a new dataflow.
Using the Toolbox pane, expand the Sources category, and select Report Source.
Drag and drop Report Source onto the Designer.
Double click the ReportModel1 object that we just added (or right click it and select Properties) to open the Properties dialog.
Using the Properties dialog, enter the path to the report source file and the report model. The report model location should point to the
report model we created and saved earlier (Figure 21).
Page 15
Figure 21
Click OK to close the dialog. The Report_Source object shows the report structure according to the report model we created (Figure 22).
Figure 22
You may need to expand the tree nodes to see all the child nodes under the root node. Our new report source is ready to feed data to t
he downstream objects in our dataflow.
Page 16
Creating Dataflows from Export Settings
There is a way to create dataflows directly from the Export Setting Browser. Look for the button in the Export Settings Browser toolbar. Select an existing export setting and click on this button. A new dataflow will be created and opened in a new window as shown in
Figure 23. Please refer to the Astera Centerprise Data Integrator user manual to learn more about dataflows.
Figure 23
www.astera.com
Contact us for more information or to request a free trial
sales @astera.com
8888-77-ASTERA
Copyright © 2014 Astera Software Incorporated. All rights reserved. Astera and Centerprise are registered trademarks of
Astera Software Incorporated in the United States and / or other countries.
Other marks are the property of their respective owners.
Page 17
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement