Paper 1749-2014 Creating Define.xml v2 Using SAS® for FDA Submissions Qinghua (Kathy) Chen, Exelixis, Inc. South San Francisco, CA James Lenihan, Exelixis, Inc. South San Francisco, CA ABSTRACT When submitting clinical data to the Food and Drug Administration (FDA), we need to submit the trial results as well as information that help the FDA understand the data. The FDA has requires the CDISC Case Report Tabulation Data Definition Specification (Define-XML), which is based on the CDISC Operational Data Model (ODM), for submissions using Study Data Tabulation Model (SDTM) for this purpose. Electronic submission to the FDA is therefore a process of following the guidelines from CDISC and FDA. This paper illustrates how to create a CDISC ® guidance compliant define.xml v2 from metadata using SAS . INTRODUCTION In 1999, The FDA standardized the submission of clinical data using the SAS version 5 Transport Format (XPT) and the submission of metadata using Portable Document Format (PDF) as define.pdf, respectively. In 2005, the study data specifications published by the FDA included the recommendation that data definitions (metadata) be provided as a Define-XML file. The FDA has collaborated with the Clinical Data Interchange Standards Consortium (CDISC) ever since its founding in order to standardize the content and structure of clinical trials data for regulatory submission. The FDA has included the CDISC Case Report Tabulation Data Definition Specification (define.xml) in the submission package using SDTM. In March 2013 the final version of define.xml 2.0.0 was released by the CDISC XML Technologies team. The Define-XML specification has been improved from v1 with better clarifications and some new features. The define.xml v2.0 has improved the: • • • • Support for CDISC Controlled Terminology Definition of Value Level metadata The documentation of the data origin or source The handling of comments. The initial implementation of define.xml can be a long complicated process for statistical programmers in the Pharmaceutical Industry. Because most SAS programmers may not have had any exposure to the XML language, this may affect their production/submission time; since they will be learning the language as they are writing the SAS code to generate the XML document. What is XML? XML is an Extensible Markup Language, designed to transport or store data. It is self-descriptive as you may make your own elements, also known as tags. XML also simplifies data sharing since the data are stored in plain text format, and is software and hardware independent. XML documents form a tree structure that starts at "the root" and branches to "the leaves.” An XML document contains XML Elements and an XML element is everything from the element's start tag to the element's end tag. An element can contain other elements, text, attributes or a mix of all of the above. Attributes provide additional information about an element. From the example below you can see that for the root pet, type is an element and name, nickname, color and age are the attributes of this element. If you have more than one element, a cat in this example, you would have the cat block repeated with the correct information filled in. <pet> <type="CAT"> <name>Spot</name> <nickname>Pig Monster</nickname> <color>black</color> <age>18</age> </type> </pet> Creating Define.xml v2 Using SAS for FDA Submissions In order to view the define.xml through a browser, we need the XSLT to transform an XML document into HTML. XSLT is the recommended style sheet language of XML and it is far more sophisticated than CSS. Define.xml v2 recommend by FDA came with define2-0-0.xsl for us to use. So all we need to do is to put define.xml together and the stylesheet will handle the display. Here is what Define.XML looks like: You can see define.xml v2 consists of different components. The first 20 lines should be included in each XML header. The elements in the gray box should be implemented as needed. Annotated Case Report Form, Supplemental document may include Reviewers Guide and Complex Algorithms are standalone documents. Datasets tabulation which is part of Domain Level metadata, Variable Level Metadata, Value Level Metadata, 2 Creating Define.xml v2 Using SAS for FDA Submissions WhereClause, Controlled Terminology, Computational Algorithms and Comments are metadata embedded. The metadata for define.xml is kept in an Excel spreadsheet, partially derived from SDTM or ADaM data specifications when possible, with some critical information added manually. The last three lines are the closing tags for this define file. An Annotated Case Report Form is a CRF with SDTM annotations on it. The Reviewers Guide is a document provided to the FDA reviewer to help them understand the clinical data included and any other information that will help speed up the review process. For better readability, complex algorithms are used when derivation of certain variables gets too long to be displayed in the method cell and the entire explanation can be centralized in one place. There are five steps to create a define file listed below and illustrated in the “Road Map”. They are: 1. 2. 3. 4. 5. Create the metadata spreadsheet from the SDTM datasets and data spec Create define.xml components using SAS Construct all the define.xml supporting documents Create .xpt files for submitted SAS datasets Construct define.xml Road map 5 Annotated Case Report Form SDTM Specifications Define.XML 3 1 Reviewer Guide 3 Complex Algorithms 3 Datasets Tabulation 1 1 Value Level Metadata 1 Controlled Terminology 1 Computation Methods 2 Comments 2 Clinical Date in XPT 4 TOC Metadata 2 Variable Metadata 2 2 Enumerated List/Code List 2 CREATING METADATA SPREADSHEET FROM SDTM SPECIFICATION In order to improve the programming efficiency, it makes more sense to have one program generate the define.xml rather than doing it manually. So it is very important to create the metadata spreadsheet for each of the sections consistently across studies. CREATING METADATA SPREADSHEET FROM THE SDTM DATA SPEC: The following are the metadata we need to create define.xml: 1. Header definitions 2. Datasets (TOC) definitions 3. Dataset variable definitions 4. Value level definitions 3 Creating Define.xml v2 Using SAS for FDA Submissions 5. 6. 7. Computational Method Controlled terminology definitions and code list Comments Header Metadata: Header meatadata contains the information about the study, SDTM version and the version of the style sheet used. FILE OID STUDY OID STUDYN AME STUDYDESCR IPTION PROTOCOL NAME STAND ARD VERSI ON SCHEMALOC ATION STYLES HEET XYZ123 123 XYZ123 A PHASE IIB, DOUBLE-BLIND, MULTI-CENTER, PLACEBO CONTROLLED, PARALLEL GROUP TRIAL OF ANALGEZIA HCL FOR THE TREATMENT OF CHRONIC PAIN XYZ123 SDTM 3.1.2 http://www.cdisc.org/n s/odm/v1.3 define2-00.xsd define2-0-0.xsl TOC metadata: The TOC metadata should contain definitions of all the datasets sent to FDA. The ORDID variable is used to determine the order of domains in the display. NA ME REPE ATIN G ISREFEREN CEDATA PURPO SE LA BE L STRUC TURE DOMAIN KEYS CL AS S ARCHIVELO CATIONID TA Yes Yes Tabulation Trial Arm s One record per planned Element per Arm STUDYID, ARMCD, TAETORD Trial Desig n ../export/ta DOCUMEN TATION ORDID 1 Variable Level Metadata: The variable level metadata is used to describe the variables within each domain. The KEYSEQUENCE variable is used to provide a sequential number key variables so the stylesheet can use it to populate the DOMAINKEYS in the TOC session in define file automatically. D O M AI N VA RN U M VA RIA BL E T Y P E LE N GT H L A B E L TA 1 STU DYID tex t 15 TA 3 ARM CD tex t 8 Stu dy Ide ntifi er Pla nne d Ar m Co de SIGNI FICAN TDIGIT S O RI GI N KEY SEQ UEN CE Der ive d 1 DISPL AYFO RMAT COMPUT ATIONME THODOID Der ive d COD ELIS TNA ME ARMCD MA NDA TOR Y R O L E ROL ECO DELI ST Yes Ide ntif ier ROLEC ODE Yes To pic ROLEC ODE VAL UELI STOI D Value level metadata: The value level metadata in define.xml v2 has changed significantly from v1. With where clause included, it not only allows us to show the condition(s) to use to subset the data, it also provides link(s) back to its source domain so you can refer to it when needed. The VALUELISTOID shows you for which variable the value level metadata is created. The ITEMOID is used to define the record within VALUELISTOID and WHERECLAUSEOID that is used to display/link the subset condition(s). When a DESCRIPTION is needed for one particular parameter in SUPPXX, you can populate it with the source variable label and then it will show after the variable in the display within parentheses. VAL UEL IST OID VA LU EO RD ER VAL UEN AME ITEMOID WHEREC LAUSEOI D TY P E L E N G T H 4 ORIG IN COM PUTA TION METH ODOI D COD ELIS TNA ME SIGN IFIC ANT DIGI TS DISPL AYFO RMAT MA ND AT OR Y DE SC RIP TIO N Creating Define.xml v2 Using SAS for FDA Submissions VL.LB. LBOR RES 1 LBORR ES IT.LB.LBORRES.B ILI.LBCAT.CHEMI STRY.LBSPEC.BL OOD WC.LB.LBTEST CD.BILI.LBCAT. CHEMISTRY.LB SPEC.BLOOD float 3 eDT VL.SU PPDM. QVAL 1 QVAL IT.SUPPDM.QVAL .RACE1 WC.SUPPDM.Q NAM.RACE1 text 5 CRF Page 6 4.2 YES RACE NO Race 1 Here is how the label (Race 1) is shown in Define.xml. Type Variable Length / Display Format 5 Where QNAM EQ RACE1 (Race 1) QVAL text Controlled Terms or Format ["BLACK OR AFRICAN AMERICAN", "OTHER", "WHITE"] <RACE> Origin Derivation/Comment CRF Page 6 Computational Methods: The Computational Methods section is used to display all the derivation rules used together in the datasets submitted. COMPUTATIONMETHODOID is used to link to the Derivation/Comment and Computational Method section. When the derivation of a variable can’t fit into a table cell nicely it should be saved in an external file called Complex Algorithms.pdf. You should note that this metadata is built on prototyping data therefore the DESCRIPTION of the computational method contains the pseudo code. When you prepare for an actual filing, actual text needs to be written. COMPUTATIONMETHODOID COMPUTATIONMETHODNAME COMPUTATIONMETHODTYPE DESCRIPTION MT.AESTDY Algorithm to derive AESTDY Computation AESTDY = AESTDTC - RFSTDTC+1 if AESTDTC is on or after RFSTDTC. AESTDTC RFSTDTC if AESTDTC precedes RFSTDTC Codelist: Codelist is a unique subset of the SDTM controlled terminology. For CDISC codelists, aliases are included at both the CodeListItem and EnumeratedItem levels. When a description is needed for a coded term, the DECODED variable should be marked as ‘YES’ so both the coded term and the translated term will be in the display. Rank is provided for each EnumeratedItem to indicate the temporal order in analysis computation. CODEL ISTNA ME CODELI STLABE L AE Domain Abbreviation (AE) Domain Abbreviation (DM) Action Taken with Study Treatment Action Taken with Study Treatment Action Taken with Study Treatment DM ACN ACN ACN R A N K 1 2 3 CODE DVAL UE TRAN SLAT ED T Y P E CODELIST DICTIONA RY source variab le sourc evalu e sour cety pe AE Adverse Events text YES DM Demograph ics text YES DOSE INCREASE D DOSE NOT CHANGED DOSE INCREASE D DOSE NOT CHANGED text aeaction 3 number text aeaction 4 number DOSE REDUCED DOSE REDUCED text aeaction 2 number 5 CODELI STVERSI ON DEC ODE D Creating Define.xml v2 Using SAS for FDA Submissions Comments: When a comment or an external document is referenced, we should create a record with COMMENTOID and a description for it. Here is how it looks: COMMENTOID DESCRIPTION COM.LBREFID Accession number COM.RELTYPE All values are null since this is used only when identifying a dataset-level relationship. As you can see, some part of this metadata specification can be populated programmatically depending on how your SDTM/ADaM specifications are constructed. Some information has to be filled manually. You can find an example of how we created this metadata specification in Appendix A. In this case, we used a macro to create variable level metadata from SAS datasets and SDTM specification, and read in the metadata template for all of the other tabs. The benefit of doing this is that when the template changes, we don’t have to go back and change the SAS program and thus makes our programs easier to maintain. CREATE DEFNE.XML COMPONENTS USING SAS Once the metadata specification is complete, we can use it to populate the define file. In order to populate the define file correctly, we need to know how each variable is used in it. The Define-XML-2-0-Specification section 5 describes how XML file is constructed, and how each element(s) and/or attribute(s) is defined/constructed. Based on that specification, we can map the metadata into an element or attribute of ODM to form the define file in the end. READING IN METADATA The creation of define.xml basically involves taking data from the metadata spreadsheet one tab at a time and writing it out in .XML format as a text file. The following code is used to read in the DEFINE_HEADER_METADATA sheet from SDTM_METADATA.xls and convert the metadata to a SAS dataset called define_header. We should repeat this code for all other sheets until we read in all the information needed for define.xml. %xls2sas( _datarow=2, _indir=../&path, _infile=SDTM_METADATA.xls, _labels=1, _outdata=define_header, _sheet=DEFINE_HEADER_METADATA, _subset=, _vars=1); CREATING .TXT FILE ONE; FOR EACH OF THE COMPONENTS Define_header.txt contains the header information for the XML file as well as the links to each component of the define file. Variables studyoid, studyname, studydescription, protocolname, standard, version, schemalocation read in from METADATA tab were populated into Define.xml elements accordingly. In addition to that, the link to the external documents blankcrf.pdf and Reviewer’s Guide are created here by def:AnnotatedCRF and def:SupplementalDoc. filename dheader "define_header.txt"; data define_header; set define_header; file dheader notitles; creationdate = compress(put(datetime(), IS8601DT.)); put @1 '<?xml version="1.0" encoding="UTF-8" ?>' / @1 '<?xml-stylesheet type="text/xsl" href="' stylesheet +(-1) '"?>' / 6 Creating Define.xml v2 Using SAS for FDA Submissions @1 '<!-******************************************************************************* -->' / @1 '<!-- File: define2-0-0.xml -->' / @1 "<!-- Date: &sysdate9. -->" / @1 '<!-- Description: Define2-0-0.xml file for ' studyname +(-1) ' -->' / @1 '<!-******************************************************************************* -->' / @1 '<ODM' / @3 'xmlns="http://www.cdisc.org/ns/odm/v1.3"' / @3 'xmlns:def="http://www.cdisc.org/ns/def/v2.0"' / @3 'xmlns:xlink="http://www.w3.org/1999/xlink"' / @3 'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"' / @3 'xsi:schemaLocation="http://www.cdisc.org/ns/def/v2.0' / '../schema/cdisc.org-define-2.0/define2-0-0.xsd"' / %if "&standard" = "ADAM" %then @3 'xmlns:adamref="http://www.cdisc.org/ns/ADaMRes/DRAFT"' / ; @3 'ODMVersion="1.3.2"' / @3 'FileType="Snapshot"' / @3 'FileOID="StudyXYZ123-Define-XML_2.0.0"' / @3 'CreationDateTime="' creationdate +(-1) '"' / @3 'Originator="CDISC XML Technologies Team"' / @3 'SourceSystem="MH-System">' / @3 'SourceSystemVersion="2.0.1">' / @1 '<Study OID="' studyoid +(-1) '">' / @3 '<GlobalVariables>' / @5 '<StudyName>' studyname +(-1) '</StudyName>' / @5 '<StudyDescription>' studydescription +(-1) '</StudyDescription>' / @5 '<ProtocolName>' protocolname +(-1) '</ProtocolName>' / @3 '</GlobalVariables>' / @3 '<MetaDataVersion OID="CDISC.' standard +(-1) '.' version +(-1) '"' / @5 'Name="' studyname +(-1) ',Data Definitions"' / @5 'Description="' studyname +(-1) ',Data Definitions"' / @5 'def:DefineVersion="2.0.0"' / @5 'def:StandardName="'standard +(-1) '-IG"' / @5 'def:StandardVersion="' version +(-1) '">' / %if "&standard" = "SDTM" %then %do; @5 '<def:AnnotatedCRF>' / @7 '<def:DocumentRef leafID="LF.blankcrf"/>' / @5 '</def:AnnotatedCRF>' / @5 '<def:leaf ID="LF.blankcrf" xlink:href="blankcrf.pdf">' / @7 '<def:title>Annotated Case Report Form</def:title>' / @5 '</def:leaf>'/ %end; @5 '<def:SupplementalDoc>' / @7 '<def:DocumentRef leafID="LF.ReviewersGuide"/>' / @7 '<def:DocumentRef leafID="LF.ComplexAlgorithms"/>' / @5 '</def:SupplementalDoc>' / @5 '<def:leaf ID="LF.ReviewersGuide" xlink:href="reviewersguide.pdf">' / @7 '<def:title>Reviewers Guide</def:title>' / @5 '</def:leaf>'/ @5 '<def:leaf ID="LF.ComplexAlgorithms" xlink:href="complexalgorithms.pdf">' / @7 '<def:title>Complex Algorithms</def:title>' / @5 '</def:leaf>'/; run; 7 Creating Define.xml v2 Using SAS for FDA Submissions TOC OF DOMAINS The Itemgroupdef.txt is used to create the TOC part of define.xml. ItemGroupDef is used to define the ItemGroup and ItemRef is used to refer to the key variables specified in variable metadata and any comments needed. The variables highlighted in bold black fonts below are read from TOC_METADATA AND VARIABLE_METADATA sheets. filename igdef "itemgroupdef.txt"; data itemgroupdef; length label $ 40; merge toc_metadata(rename=(label=label1)) VARIABLE_METADATA; by oid; file igdef notitles; if first.oid then do; put @5 "<!-- ******************************************* -->" / @5 "<!-- " oid @25 "ItemGroupDef INFORMATION *** -->" / @5 "<!-- ******************************************* -->" / @5 '<ItemGroupDef OID="IG.' oid +(-1) '"' / @7 'Domain="' name +(-1) '"' / @7 'Name="' name +(-1) '"' / @7 'Repeating="' repeating +(-1) '"' / @7 'IsReferenceData="' isreferencedata +(-1) '"' / @7 'SASDatasetName="' name +(-1) '"' / @7 'Purpose="' purpose +(-1) '"' / @7 'def:Structure="' structure +(-1) '"' / @7 'def:Class="' class +(-1) '"' / @7 'def:CommentOID="' documentation +(-1) '"'; %if &standard=ADAM %then put @7 'def:ArchiveLocationID="LF.' archivelocationid +(-1) '"' / @7 'Comment="' documentation +(-1) '">' ; %else put @7 'def:ArchiveLocationID="LF.' archivelocationid +(-1) '">' / ; ; put @7 '<Description>' / @7 '<TranslatedText xml:lang="en">' label1 +(-1) '</TranslatedText>' / @7 '</Description>'; end; put @7 '<ItemRef ItemOID="IT.' itemoid +(-1) '"' / @9 'OrderNumber="' varnum +(-1) '"' / ; if keysequence ne ' ' then put @9 'KeySequence="' keysequence +(-1) '"' / ; if computationmethodoid ne ' ' then put @9 'MethodOID="' computationmethodoid +(-1) '"' / ; ; put @9 'Mandatory="' mandatory +(-1) @; if role ne '' and "&standard" = "SDTM" then put '"' / @9 'Role="' role +(-1) '"' / @9 'RoleCodeListOID="CodeList.' rolecodelist +(-1) '"/>'; else put '"/>'; if last.oid then put @7 "<!-- **************************************************** -->" / @7 "<!-- def:leaf details for hypertext linking the dataset -->" / 8 Creating Define.xml v2 Using SAS for FDA Submissions @7 @7 '.xpt">' / @9 @7 @5 run; "<!-- **************************************************** -->" / '<def:leaf ID="LF.' oid +(-1) '" xlink:href="' archivelocationid +(-1) '<def:title>' archivelocationid +(-1) '</def:leaf>' / '</ItemGroupDef>'; '.xpt </def:title>' / VARIABLE LEVEL: The Itemdef.txt is used to create the variable level part of define.xml. ItemDef is used to create the variable element. The CodeListRef and def:ValueListRef are used for codelist display and value level data drill down respectively. The variables used here are from VARIABLE_METADATA sheet. **** CREATE "ITEMDEF" SECTION; filename idef "itemdef.txt"; data itemdef; set VARIABLE_METADATA end=eof; by oid; file idef notitles; if _n_ = put @5 @5 @5 1 then "<!-- ************************************************************ -->" / "<!-- The details of each variable is here for all domains -->" / "<!-- ************************************************************ -->" ; put @5 '<ItemDef OID="IT.' itemoid +(-1) '"' / @7 'Name="' variable +(-1) '"' / @7 'DataType="' type +(-1) '"' / @7 'Length="' length +(-1) '"' / @7 'SASFieldName="' variable +(-1) '"' ; if significantdigits ne '' then put @7 'SignificantDigitis="' significantdigits +(-1) '"'; if displayformat ne '' then put @7 'def:DisplayFormat="' displayformat +(-1) '"'; if computationmethodoid ne '' then put @7 'def:MethodOID="' computationmethodoid +(-1) '"'; put %if "&standard" = "SDTM" %then @7 'Origin="' origin +(-1) '"' / ; @7 'Comment="' comment +(-1) '"' / @7 'def:Label="' label +(-1) '">'; if codelistname ne '' then put @7 '<CodeListRef CodeListOID="CodeList.' codelistname +(-1) '"/>'; if valuelistoid ne '' then put @7 '<def:ValueListRef ValueListOID="' valuelistoid +(-1) '"/>'; put @7 '<Description>' / @7 '<TranslatedText xml:lang="en">' label +(-1) '</TranslatedText>' / @7 '</Description>'; put @5 '</ItemDef>'; run; VALUE LEVEL: Value Level Metadata should be provided when there is a need to describe different metadata attributes for subsetting the data within a column. It can also be used on other types of SDTM domains to provide information 9 Creating Define.xml v2 Using SAS for FDA Submissions useful for data interpretation but, the value level metadata approach is very complicated so it is recommended to use it only when it is absolutely necessary. The value level metadata consists of three parts: the itemdef_value, value list and the whereclause list. The Itemdef_value defines how the value level metadata is displayed, and the where clause shows the subset condition in define.xml and the value list is used to link to the whereclause definition. The ItemDef is used to create the valuelist item, and def:WhereClauseDef and def:ValueListDef are used to create the whereclause and valuelist. Conditional Description is used for variables in SUPPXX domains that need a label. The variables read from value level metadata are used here. As you can see from the code below, we need to parse the WhereClauseId so the stylesheet can get the information it needs and display it correctly in the browser. Also, it is worthwhile to mention that this code is for prototyping, and cannot handle all real-world situations as yet. You may find it fun to dig through the stylesheet and try to figure out how to make it work. filename idefvl "itemdef_value.txt"; data itemdefvalue; set valuelevel end=eof; by valuelistoid; file idefvl notitles; if _n_ = put @5 @5 @5 1 then "<!-- ************************************************************ -->" / "<!-- The details of value level items are here -->" / "<!-- ************************************************************ -->" ; put @5 '<ItemDef OID="' itemoid +(-1) '"' / @7 'Name="' valuename +(-1) '"' / @7 'DataType="' type +(-1) '"' / @7 'Length="' length +(-1) '"'; if significantdigits ne '' then put @7 'SignificantDigitis="' significantdigits +(-1) '"'; if displayformat ne '' then put @7 'def:DisplayFormat="' displayformat +(-1) '"'; if computationmethodoid ne '' then put @7 'def:MethodOID="' computationmethodoid +(-1) '"'; put @5 '>'; if description ne ' ' then put @7 '<Description>' / @7 '<TranslatedText xml:lang="en">' description +(-1) '</TranslatedText>' / @7 '</Description>'; if codelistname ne '' then put @7 '<CodeListRef CodeListOID="CL.' codelistname +(-1) '"/>'; put @7 '<def:Origin Type="' origint +(-1) '"' /; if origint ^= "CRF" then put @7 '/>' ; else put @7 '>' / @9 '<def:DocumentRef leafID="LF.blankcrf">' / @11 '<def:PDFPageRef PageRefs=" ' originpn +(-1)'" Type="PhysicalRef"/>' / @9 '</def:DocumentRef>' / @7 '</def:Origin>' /; put @5 '</ItemDef>'; run; filename whrlist "wherelist.txt"; data valuelevel; set valuelevel; by valuelistoid itemoid; 10 Creating Define.xml v2 Using SAS for FDA Submissions file whrlist notitles; if _n_ = put @5 @5 @5 1 then "<!-- *************************************************** -->" / "<!-- VALUE LEVEL WHERE CLAUSE DEFINITION INFORMATION ** -->" / "<!-- ***************************************************** -->"; if first.itemoid then; put @5 '<def:WhereClauseDef OID="' whereclauseoid +(-1) '">'; do i=3 to countw(whereclauseoid,'.')-1 by 2; varcat=compress('IT.' || scan(whereclauseoid,2,'.') || '.' || scan(whereclauseoid,i,'.')) ; varval=compress(scan(whereclauseoid,i+1)); put @7 '<RangeCheck SoftHard="Soft" def:ItemOID="' varcat +(-1) '" Comparator="EQ">' / @11 '<CheckValue>' varval +(-1) '</CheckValue>' / @7 '</RangeCheck>'; end; if last.itemoid then put @5 '</def:WhereClauseDef>'; run; filename vallist "valuelist.txt"; data valuelevel; set valuelevel; by valuelistoid; file vallist notitles; if _n_ = put @5 @5 @5 1 then "<!-- ******************************************* -->" / "<!-- VALUE LEVEL LIST DEFINITION INFORMATION ** -->" / "<!-- ******************************************* -->"; if first.valuelistoid then put @5 '<def:ValueListDef OID="' valuelistoid +(-1) '">'; put @7 @9 @9 if '<ItemRef ItemOID="' itemoid +(-1) '"' / 'OrderNumber="' valueorder +(-1) '"' / 'Mandatory="' mandatory +(-1) '"' ; computationmethodoid ne '' then put @9 'MethodOID="' computationmethodoid +(-1) '"'; put @9 '>' / @9 '<def:WhereClauseRef WhereClauseOID="' whereclauseoid +(-1) '"/>' / @7 '</ItemRef>'; if last.valuelistoid then put @5 '</def:ValueListDef>'; run; It is worthwhile to mention that the way the stylesheet handles the variable label in the whereclause is different for the SUPPXX domains than others. For the value level metadata in the case of supplemental qualifiers, the stylesheet will display the label description. This is conditional on the availability of a description. (Such as RACE1, RACE2, .., RACEOTH will have "Race 1", "Race 2", ..., "Race, Other"). For other types of whereclauses the stylesheet will try to display the decoded value if controlled terminology is attached to the variable that is part of the whereclause. COMPUTATIONAL METHODS: 11 Creating Define.xml v2 Using SAS for FDA Submissions MethodDef is used to display the computation method used. filename comp "compmethod.txt"; data compmethods; set compmethod; file comp notitles; if _n_ = 1 then put @5 "<!-- ******************************************* -->" / @5 "<!-- COMPUTATIONAL METHOD INFORMATION *** -->" / @5 "<!-- ******************************************* -->"; put @5 '<MethodDef OID="' computationmethodoid +(-1) '"'/ @5 'Name="' computationmethodname +(-1) '"' / @5 'Type="' computationmethodtype +(-1)'">' / @7 '<Description>' / @7 '<TranslatedText xml:lang="en">' description +(-1) '</TranslatedText>' / @7 '</Description>' / @5 '</MethodDef>'; run; COMMENTS: def:CommentDef is used to display the comments. **** CREATE COMMENTS SECTION; filename comp "comments.txt"; data comments; set comments; file comp notitles; if _n_ = 1 then put @5 "<!-- ******************************* -->" / @5 "<!-- COMMENTS INFORMATION *** -->" / @5 "<!-- ******************************** -->"; put @5 '<def:CommentDef OID="' commentoid +(-1) '">'/ @7 '<Description>' / @7 '<TranslatedText xml:lang="en">' description +(-1) '</TranslatedText>' / @7 '</Description>' / @5 '</def:CommentDef>'; run; CODELIST: The Codelist section has changed dramatically from v1 as well. Not only did the Codelist CODE get added to the display, codelist has been separated into EnumeratedItem and CodeListItem. The Alias Context attribute refereneces “nci:ExtCodeID” to indicate the NCI/CDISC SDTM Terminology standards. When the coded value is the same as the translated value, only code value is shown. If the codelist is saved in an external dictionary, a separate section will be created just for that. Please note, I used the CODELIST as the last section in the define file as the order of elements for each section does not matter. If you choose to do it differently, you can move the last 4 lines of code to a different place, where appropriate. You can also create a separate section if you like. 12 Creating Define.xml v2 Using SAS for FDA Submissions filename comp "codelist.txt"; data codelists; set codelists end=eof; by codelistname rank; length codelistname $200; file codes notitles; if _n_ = put @5 @5 @5 1 then "<!-- ************************************************************ -->" / "<!-- Codelists are presented below -->" / "<!-- ************************************************************ -->" ; if first.codelistname then do; if index(upcase(ncipreferredterm),'DOMAIN') then put @5 '<CodeList OID="CL.' codelistname +(-1) '.DOMAIN"'; else put @5 '<CodeList OID="CL.' codelistname +(-1) '"'; put @7 'Name="' codelistlabel +(-1) '"' / @7 'DataType="text">'; end; **** output codelists that are not external dictionaries; if codelistdictionary = '' then do; if decoded ne 'YES' then do; put @7 '<EnumeratedItem CodedValue="' codedvalue +(-1) '"' @; if rank ne . then put ' OrderNumber="' rank +(-1) '">'; else put '>'; if code ne ' ' then put @7 '<Alias Name="' code +(-1) '" Context="nci:ExtCodeID"/>'; put @7 '</EnumeratedItem>'; end; else do; put @7 '<CodeListItem CodedValue="' codedvalue +(-1) '"' @; if rank ne . then put ' OrderNumber="' rank +(-1) '">'; else put '>'; put @9 '<Decode>' / @11 '<TranslatedText>' translated +(-1) '</TranslatedText>' / @9 '</Decode>' ; if code ne ' ' then put @7 '<Alias Name="' code +(-1) '" Context="nci:ExtCodeID"/>'; put @7 '</CodeListItem>'; end; if codelist ne ' ' then put @7 '<Alias Name="' codelist +(-1) '" Context="nci:ExtCodeID"/>'; end; **** output codelists that are pointers to external codelists; else if codelistdictionary ne '' then put @7 '<ExternalCodeList Dictionary="' codelistdictionary +(-1) '" Version="' codelistversion +(-1) '"/>'; if last.codelistname then put @5 '</CodeList>'; if eof then put @3 '</MetaDataVersion>' / @1 '</Study>' / @1 '</ODM>'; run; 13 Creating Define.xml v2 Using SAS for FDA Submissions CONSTRUCT ALL THE DEFINE.XML SUPPORTING DOCUMENTS Create Annotated CRF based on SDTM specifications and put lengthy data derivation rules, if any, in Complex Algorithms. For anything else that will help the FDA to understand our data and to speed up the review process, we should put that information in the Reviewer’s Guide. CREATING .XPT FILES FOR CLINICAL DATA Since FDA standardized the submission data format in 1999, converting SAS datasets to .xpt files in version 5 is a must. There are different ways to convert SAS datasets to .xpt file, and the following SAS code provides one example: Libname source ‘SAS-data-library’; Libname xportout xport ‘transport-file’; proc copy in=source out=xportout memtype=data; run; CONSTRUCT DEFINE.XML Here is the command that was used in Unix to concatenate the text files together to form the define.xml. cat define_header.txt compmethod.txt valuelist.txt itemgroupdef.txt itemdef.txt itemdef_value.txt comments.txt codelist.txt > define.xml With the Stylesheet’s help, Define.xml v2 can be viewed via IE or Firefox. The define.xml file is listed on the right for your information. CONCLUSIONS Creating define.xml is difficult especially if you don’t have any knowledge about XML at the beginning. However, with the approach I have outlined above it is readily achievable. 14 Creating Define.xml v2 Using SAS for FDA Submissions APPENDIX A: /*************************************************** *H* *H* PROGRAM: mk_varmeta.sas *H* *H* USAGE: create the SDTM variable_metadata sheet from the STDM specs *H* REQUIRES anadata: *H* REQUIRES rawdata: *H* REQUIRES sdtmdata: *H* REQUIRES export: XL184XXX_SDTM_Speciifcatins_V8.0.xls *H* REQUIRES macros: %xls2sas %sas2xls *H* PRODUCES: *H* *H* *H* REVISION HISTORY: *H* 20131216 jkl Created. *H* *H* $Id:$ ***************************************************/ options mlogic mprint symbolgen validvarname=upcase ; *PN-----------------------------------------------------------------*; *PN-----------------------------------------------------------------*; *PN macro that 1) converts an individual sheet from an xls file to SAS 2) select and label variables 3) creates the dataset "final" by concatinating the datasets *PN-----------------------------------------------------------------*; *PN-----------------------------------------------------------------*; *PN------------------------------------------------*; *PN macro to convert xls sdtmdata sheet to sas data*; *PN------------------------------------------------*; %xls2sas(_datarow = 2, _indir = ../import, _infile = Def_head.xls, _labels =1, _outdata = DEFINE_HEADER_METADATA, _sheet =sheet1 , _vars = ) ; *PN --------------------------------------------------------------------*; *PN --BRING IN SDTM VARIABLES FROM SPEC SPREADSHEET -------------------*; *PN --------------------------------------------------------------------*; %macro doit(dsn=,sort=); %xls2sas(_datarow _indir _infile = = = 7, , XL184307_SDTM_Specifications_V8.0.xls, 15 Creating Define.xml v2 Using SAS for FDA Submissions _labels = _outdata = _sheet = ); 6, , &dsn data &dsn; retain domain var16 var8 var17 var15 var12 var14 var18 var19 ; set &dsn(drop=VAR1-VAR7 ); *PN------------------------------------*; *PN crete the DOMAIN and SORT variables*; *PN------------------------------------*; domain = "&dsn"; sort = "&sort"; label var8 var12 var13 var14 var15 var16 var17 var18 var19 domain = = = = = = = = = = 'VARIABLE' 'DISPLAYFORMAT' 'FLAG' 'COMPUTATIONALMETHODOID' 'COMMENTS' 'VARNUM' 'ORIGIN' 'MANDATORY' 'ROLECODELIST' 'DOMAIN' ; if var8= '' or var13 ne '' sort_var=var16*1.0; then delete; drop var13 var8; run; *PN-------------------------*; *PN concatenate the datasets*; *PN-------------------------*; proc append base=specs data=&dsn force; run; proc datasets library=work; delete &dsn; quit; %mend; *PN--------------------------*; *PN import from the SDTM spec*; *PN--------------------------*; %doit(dsn=AE,sort=A); %doit(dsn=CD,sort=C); %doit(dsn=CM,sort=D); %doit(dsn=DA,sort=E); %doit(dsn=DD,sort=F); %doit(dsn=DM,sort=G); 16 Creating Define.xml v2 Using SAS for FDA Submissions *PN---------------------------------------------------------------------*; *PN------------ THIS SECTION EXTRACTS INFORMATION FROM SDTM DATA -------*; *PN---------------------------------------------------------------------*; %macro doit2(dsn=,sort=); *PN---------------------------*; *PN get the contents datasets *; *PN---------------------------*; proc contents data=sdtmdata.&dsn out=&dsn noprint; run; proc sort data=&dsn; by varnum; run; proc print data=&dsn; title "supp &dsn"; run; data &dsn; length mandatory $3 type $4; set &dsn(drop = type rename=(memname=domain name=variable)); by varnum; *PN------------------------------------------------------*; *PN populate the Mandtory column in the variable_metadata*; *PN------------------------------------------------------*; if varnum in('4','5','10') then mandatory='No'; else mandatory='Yes'; *PN----------------------------------------------*; *PN create the type column and the sort variable *; *PN----------------------------------------------*; type='text'; sort = "&sort"; keep domain varnum variable type length label mandatory sort; label domain='DOMAIN' ; run; data &dsn(rename=(temp11=var11 temp16=var16)); retain domain var16 var8 var10 var11 var9 var18 length temp11 $3 temp16 $3; set &dsn(rename=(varnum = var16 variable = var8 type = var10 length = var11 label = var9 mandatory= var18)) ; *PN------------------------*; *PN make length and varnum *; *PN------------------------*; 17 ; Creating Define.xml v2 Using SAS for FDA Submissions temp11 = left(put(var11,4.0)); temp16 = left(put(var16,2.0)); drop var11 var16; label var8 var9 var10 temp11 temp16 var18 = = = = = = 'VARIABLE' 'LABEL' 'TYPE' 'LENGTH' 'VARNUM' 'MANDATORY' ; run; *PN --------------------*; *PN append the datasets *; *PN---------------------*; proc append base=final data=&dsn force; run; /* proc print data=&dsn label; title "&dsn"; run; */ *PN------------------------------*; *PN clean out the work directory *; *PN------------------------------*; proc datasets library=work; delete &dsn; quit; %mend; *PN---------------------------------------------*; *PN import datasets from SDTM data including SPP*; *PN---------------------------------------------*; %doit2(dsn=AE,sort=A); %doit2(dsn=CD,sort=C); %doit2(dsn=CM,sort=D); %doit2(dsn=DA,sort=E); %doit2(dsn=DD,sort=F); %doit2(dsn=DM,sort=G); %doit2(dsn=SUPPAE,sort=B); %doit2(dsn=SUPPDM,sort=H); data final; length domain $6; set final; *PN----------------------------------------------------*; *PN need numeric to sort the variable numbers correctly*; *PN----------------------------------------------------*; sort_var = var16*1.0; run; 18 Creating Define.xml v2 Using SAS for FDA Submissions proc sort data=final out=VARIABLE_METADATA/*(drop=sort sort_var)*/; by sort sort_var ; run; proc sort data=specs/*(drop=sort sort_var)*/; by sort sort_var ; run; *PN-----------------------------------------------------------------------*; *PN merge the vars from the specc SS onto data imported from SDTM datasets*; *PN-----------------------------------------------------------------------*; data VARIABLE_METADATA; merge VARIABLE_METADATA(in=a) specs; by sort sort_var ; if a; drop sort sort_var; run; *PN--------------------------------------------------------------*; *PN----------------- export to a spread sheet -------------------*; *PN--------------------------------------------------------------*; %sas2xls(_inlib=work, _first=domain var16 , _header=L, _files = DEFINE_HEADER_METADATA VARIABLE_METADATA , _outfile = SDTM_METADATA); 19 Creating Define.xml v2 Using SAS for FDA Submissions REFERENCES • • • • • • http://support.sas.com/publishing/authors/holland_chris.html http://support.sas.com/resources/papers/proceedings13/201-2013.pdf http://support.sas.com/resources/papers/proceedings10/157-2010.pdf Define-XML v2 – Whatis New by Lex Jansen, SAS Institute Inc., Cary, NC, USA Implementing CDISC Using SAS: by Chris Holland and Jack Shostak Define-XML-2-0-Specification SAS and all other SAS Institute Inc. products or service names are registered trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. ACKNOWLEDGEMENTS Lex Jassen, SAS Institute. Amos Shu, Medimmune Inc. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Qinghua (Kathy) Chen Exelixis Inc 210 East Grand Ave, South San Francisco CA, 94080 Email: [email protected] James Lenihan Exelixis Inc. 210 East Grand Ave, South San Francisco CA, 94080 Email: [email protected] 20
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement