ETD-MARC/Perl Module Design - VTechWorks

ETD-MARC/Perl Module Design
Presented by
Yan Liao (Clara) & Mary Finn
University Libraries of Virginia Tech
October 20, 2005
ETD-MARC/Perl Module Design
l 
Overview
l 
Creating ETD-MARC/Perl Module (I):
General Design Procedure and Preparations
l 
Creating ETD-MARC/Perl Module (II):
ETD-MARC/Perl Module
l 
Limitations
l 
Further Research and Applications
l 
References
Overview
l 
Life Cycle of ETD
l 
ETD Cataloging Before Perl Module
l 
ETD Cataloging After Perl Module
Overview
ETD Cataloging
ETD Cataloging Before Perl Module
Professional catalogers developed policies
and procedures
l  Cataloging from submission form
l  Create new records in OCLC
l  Copy/Paste data to MARC fields (5 minutes)
l  Authority control
l  Download to Addison
l 
ETD Cataloging After Perl Module
Cataloging from submission forms
l  Generating MARC records automatically
l  Uploading the records to OCLC
l  Authority control
l  Download to Addison
l 
Creating ETD-MARC/Perl Module (I)
l 
General Design Procedure
l 
Preparations:
ETD metadata meets MARC (leader, fixed field,
etc.)
v  ETD-MARC/Perl module Functions
v  Perl
v  MARC::Record package
v 
General Design Procedure
l 
Catalogers cooperated with system staff
l 
Match ETD Metadata with MARC
l 
Design working module
l 
Test and use the module
ETD Metadata meets MARC
l 
Set up templates for constant data, such
as leaders, some fixed fields (006,
007,008), and variable data fields.
l 
Map ETD variables to MARC tags
MARC Coding for Leader
Leader
Position
Code
00-04: Logical record length
05: Record status
n: New
06: Type of record
a: Language material
07: Bib level
m: Monograph/item
08: Type of control
_: No specific type of control
09: Character coding scheme
: MARC - 8
10: Indicator count
11: Subfield code count
12-16: Base address of data
17: Encoding level
K: Less than full level
18: Cataloging form
a: AACR2
19: Linked record requirement
20: Length of the length-of-field portion
21: Length of the starting-character-position portion
22: Length of the implementation-defined portion
23: Undefined
MARC Coding for 006
006 Field
Position
00: Form of material
01-04: Undefined
05: Target audience
06-08: Undefined
09: Type of computer file
10: Undefined
11: Govt. publication
12-17: Undefined
Code
m: Computer file
d:Document
s:State, provincial, etc.
MARC Coding for 007
007 Field
Position
00: Category of material
01: Specific material designation
02: Undefined
03: Color
04: Dimension
05: Sound on medium
06-08: Image bit depth
09: File format
10: Quality assurance targets
11: Antecedent/souce
12: Level of compression
13: Reformatting quality
Code
c: Computer file
r: Remote
u: Unknown
n: Not applicable
u: Unknown
MARC Coding for 008
00-05: Date entered on file
06: Publication status
07-10: Date 1 year
11-14: Date 2
15-17: Place of publication
18-21: Illustration 1-4
22: Target audience
23: Form of item
24: Contents 1
25: Contents 2
26: Contents 3
27: Contents 4
28: Govt. publication
29: Conf. Publication
30: Festschrift
31: Index
32: Undefined
33: Literary form
34: Biography
35-37: Language
38: Modified record
39: Cataloging source
s: single known date/probable date
vau: Virginia
a: Illustrations
s: electronic resource
b: bibliographies
s:
0:
0:
0:
State, provincial, etc.
not a conference publication
not a festschrift
no index
0: not fiction
_: no biographical material
eng: english
d: other
Constant data in variable MARC fields
040: Cataloging Source
$a VPI
$c VPI
245: Title
$h [electronic resource]
260: Publication
$a [Blacksburg, Va.] $b University
Libraries, Virginia Polytechnic Institute
and State University
500: Notes
$a Title from electronic submission form
500: Notes
$a Vita
504: Bibliographies
$a Includes bibliographical references
538: System requirements
$a System requirements: World Wide Web
browser and PDF reader.
MAP ETD Variables to MARC
ETD variable name
urn: Universal Resource Name
year: release year
type: text Document type
title: Title of document
first_name: First name of author
middle_name: Middle name of author
last_name: Last name of author
comp_file: computer file characteristics
degree: degree(M.S., M.A., Ph.D. etc.)
abstract: abstract of document
url: URL of ETD
keywords
department:
MARC tag(s)
035
008, 099, 260, 440, 502
099
245
100, 245
100, 245
100, 245
256
440, 502
520
856
653
440
ETD-MARC Perl Module Functions
l 
Queries the ETD database to extract Metadata
v  connect ETD database
Perl Script
v  fetch metadata from database
l 
Creates a MARC record for each ETD
v  placing the appropriate metadata in the appropriate
MARC tags
MARC::Record Framework
(MARC.pm)
Perl
l 
Practical Extraction and Report Language : a generalpurpose programming language invented in 1987 by Larry
Wall
l 
Facilities: text processing, database accessing,
networking, etc.
l 
Perl Module: a set of related functions that are packaged
together into a library file that has an extension of “.pm”,
such as CGI.pm, MARC.pm
l 
CPAN: Comprehensive Perl Archive Network
(www.cpan.org)
MARC::Record Package
l 
MARC.pm
v  a piece of open source software developed by librarians for
librarians in the summer of 1999
v  contains functions to read in USMARC data; to add,
remove, and modify fields; to search through data; to save
MARC data
l 
MARC::Record (1.39)
v  Latest and enhanced version of MARC.pm
v  contains: MARC::Batch; MARC::Field; MARC::Record;
MARC::File; MARC:: Lint (separate package since Dec.
2004)
Creating ETD-MARC/Perl Module (II)
l 
ETD-MARC Module
v 
Core part
v 
Crucial subfunction: Encode_USMARC
v 
Cataloging problems and Perl Script
n 
Title problem: 245 indicators
n 
Author name problem: 100 and 245
ETD-MARC Module: core part
Connect ETD
our $dbh = db_connect ();
database
my @urns = fetch_urns ($limit);
Fetch data
for each my $urn (@urns) {
my %record = fetch_record($urn);
my $marc = encode_usmarc(\%record);
Generate MARC
print OUT $marc;
record
}
$dbh->disconnect();
Disconnect ETD database
ETD-MARC Module: encode_usmarc(I)
l 
Assign data value to subfunction variables,
for example:
v 
l 
my $url = $record{main}->{url};
Create MARC record, for example:
v 
$marc->append_fields(MARC::Field->new(‘856’, ‘4’,
‘0’, u=>”$url”),);
ETD-MARC Module: encode_usmarc(II)
l 
v 
v 
v 
v 
v 
l 
v 
MARC::Record : primary class represents a MARC
record, being a container for multiple MARC::Field objects
New() :
Leader():
Append_fields():
As_formated():
As_usmarc():
my $marc = MARC::Record->new();
$marc->leader(‘
nam 2200000Ka 4500’),
$marc->append_fields(MARC::Field->new(‘006’, ‘m
d s ’),);
return $marc->asformated();
return $marc->as_usmarc();
MARC::Field: object for representing the indicators and
subfields of a single MARC field
new():
MARC::Field->new(‘040’, ‘’, ‘’, a=> ‘VPI’, c=> ‘VPI’),
245 indicator problem
My @mTitle;
If ($title = ~ /A \b*/){
@mTitle=(‘245’, ‘1’, ‘2’, a=>”$title”);
}
elseif ($title=~/An \b*/){
@mTitle=(‘245’, ‘1’, ‘3’, a=>”$title”);
}
elseif ($title=~/The \b*/){
@mTitle=(‘245’, ‘1’, ‘4’, a=>”$title”);
}
else {
@mTitle=(‘245’, ‘1’, ‘0’, a=>”$title”);
}
100, 245 Author Name Problem (I)
l 
Middle Name, Extra Space, and Punctuation
Problem
v 
v 
Ø 
If Middle Name = Null:
v 
v 
Ø 
100: $a Last_Name, First_Name Middle_Name.
245: $c First_Name Middle_Name Last_Name.
100: $a Last_Name, First_Name _.
245: $c First_Name _ Last_Name.
If Middle Name with period:
v 
100: $a Last_Name, First_Name Middle_Name..
100, 245 Author Name Problem (II)
l 
Check Name String Code
If there is a middle name
if ($mName){
$hName = “$lName, $fName $mName”;
Include middle name
$cName = “$fName $mName $lName.”;
unless ($hName = ~ / \b\.$/){
If 100 doesn’t end with “.”
$hName = “lName, fName $mName.”;
}
}
else{
If there is no middle name
$hName = “$lName, $fName”;
Exclude middle name
$cName = “$fname $lName.”;
unless ($hName = ~ /\b\.$/){
If 100 doesn’t end with “.”
$hName = “lName, $fName.”;
}
}…
MARC:: Field->new(‘100’, ‘1’, ‘’, a=> “$hName”),
MARC:: Field->new(@mTitle, h=> “\[electronic resource\]\/”, c=> “$cName”),
Limitations
l 
limited by the quality of the metadata input by students
l 
limited to descriptive metadata only, cannot
accommodate classification, subject analysis, and name
authority validation
l 
AACR2, 9.7B22 “For remote access resources, always
give the date on which the resource was viewed for
description”
VT System Limitations (2004)
l 
System problems:
Solutions: New System (Connexion & iii)
Systems (2004): OCLC Passport & VTLS
v  Input problems: Character sets; long abstract
v  Workflow problems: uploading local files to OCLC
v 
l 
ETD Database problems:
v 
Solutions: System maintenance
Degree: 440-> “VPI & SU. $department. $degree $ryear”
502-> “Thesis ($degree)\-\Virginia Polytech
Institute and State University, $ryear.”
v 
No index table in the database for the degree types,
e.g. MA; M.A. ; Master of Arts;
Future Research and Applications
l 
Conduct further research to determine if there
are savings in terms of human resources and
time to catalog
l 
Applications of Perl script on other digital
cataloging projects:
v 
v 
v 
v 
import MARC data into a relational database
perform metadata auto-crosswalk
manipulate vendors’ digital bibliographic records
…
References
l 
Brian E.Surratt and Dustin Hill, “ETD2MARC: a semi-automated
workflow for cataloging electronic theses and dissertations”, Texas
A&M University, 2004, http://di.tamu.edu/bsurratt/
l 
Anne Highsmith …[et al.], “MARC it your way: MARC.pm”,
Information Technology and Libraries, vol. 21, no.1, March 2002.
l 
MARC/Perl : http://marcpm.sourceforge.net/
l 
Comprehensive Perl Archive Network(CPAN): http://www.cpan.org/
l 
Perl4Lib: http://perl4lib.perl.org/
l 
VT Electronic Theses and Disserations: Cataloging Instructions:
http://techserv.lib.vt.edu/Cataloging/CTetd.htm
l 
VALET for ETDs :
http://www.vtls.com/Products/valet-for-ETDs.shtml
Thank You!
Contact: Yan Liao (Clara)
Phone: (540) 231-8845
Email: liaocy@vt.edu
Mary Finn
(540) 231-4980
maryfinn@vt.edu