Confessions of a Clinical Programmer: Dragging and Dropping Means Never Pharma

Confessions of a Clinical Programmer: Dragging and Dropping Means Never Pharma
SAS Global Forum 2011
Pharma
Paper 205-2011
Confessions of a Clinical Programmer: Dragging and Dropping Means Never
Having to Say You’re Sorry When Creating SDTM Domains
Janet Stuelpner, SAS Institute, Inc.
Jack Shostak, Duke Clinical Research Institute
ABSTRACT
®
We have been clinical programmers for many years. Our first instinct has always been to write the SAS code by
hand because that is all we had in the past. That meant knowing a great deal of syntax and always having the
manuals handy. It also meant pages and pages of code that were difficult to correct, difficult to maintain and hard to
reuse for different compounds or devices. The first level of growth came when SAS introduced various windows and
wizards (e.g., Import/Export Wizard, Report Window, Graph-n-Go) that gave us the ability to start using the wizard
and then grab the SAS code and change it as necessary. The next innovations from SAS were tools like Enterprise
Guide® and Clinical Data Integration® with their Graphical User Interface (GUI) that made programming a great deal
easier, faster and much more efficient. You can still access all of the different data sources that you need from SAS
data sets, spreadsheets and/or relational databases, but in a much easier way. You can still transform your raw data
into SDTM domains using standard or custom transformations where mapping can be automatic or manual
depending on what your input and output data requires. Data validation and compliance checks are much simpler
because all of the tools are available for you to repeat the tasks for each protocol, compound or therapeutic area.
Finally, if you do need to use legacy code or write your own routines, you can do that as well. This presentation will
show you how experienced programmers can learn novel tricks and techniques with new tools, solutions and
technology.
INTRODUCTION
As a clinical programmer, there are many paths to take. The main goal is always to access the data, manipulate and
transform the data, analyze the data and report on the data. As a programmer, one can specialize in data
management (DM) programming and spend a majority of the time cleaning the data through edit checks and the
creation of patient listings and profiles. Another task of the DM programmer is to transform the data from its raw
format into a standard format. This standard format could be the CDISC Study Data Tabulation Model (SDTM) that is
requested by regulatory agencies upon submission of a new compound or it could be a sponsor’s own standards. In
the process of transforming the data, the DM programmer must make sure that the output conforms to the standard
and is compliant as well as valid So another part of the job is to write programs to check the data against the
standard and run the programs whenever a new study is about to be analyzed. Finally when all of the data has been
transformed, the DM programmer must create a transport file that will be sent to the regulatory agency that will review
the submission data.
The statistical (STAT) programmer takes the data that is cleaned and transformed by the DM programmer and
creates tables, listings and graphs (TLG) for the clinical study report (CSR). Sometimes the data is taken from its raw
state and transformed directly into TLGs, but most often the STAT programmer creates analysis datasets from which
they can easily create the necessary output documents for the CSR. The STAT programmer is also tasked with
creating ad hoc reports when needed, yearly safety updates, DSMB reports, and integrated safety and efficacy
summaries.
The focus of this paper will be to study how the clinical programmer in data management (DM) has changed from
creating programs in the past which used base SAS to how we can use new tools and solutions to produce the data
that is needed in a new drug application submission. What did the programmer do in the past to cleanse the data and
how has that process changed? Now that the data is requested to be in a standard format, what types of programs,
macros and formats were used to transform the data? What is done now to make the process easier, more efficient
and repeatable across protocols, compounds and therapeutic areas? From the old methodology to the new tools we
will show how the transformation process can be changed and improved.
1
SAS Global Forum 2011
Pharma
ACCESS and EDITS and STANDARDS, OH MY
Over the years, how we access the data has changed as often as the types of data that we use has changed. Data
entry was part of the process of reading the data from paper case report forms (CRF) and creating SAS data sets
with that data. Sometimes the lab data was written into the CRF and at other times it was sent in an electronic format
that needed careful review and tricky coding to create the lab data sets. Some of the data entry systems did some
preliminary edit checking (e.g., data range checks, limiting values entered, etc.), but most often the edit checks
needed to be done after the data sets were created and systems were put in place to write the queries that were sent
back to the clinical data collection sites. With the advent of relational databases (RDBMS) and electronic data
capture, the amount and type of work needed to clean the data changed.
The way that the programmer reads the data has changed as well. There are many formats in which the data is sent
to the sponsor. These include SAS data sets, MS Excel spreadsheets, RDBMS tables, ASCII files and electronic data
capture. The mechanism for reading this data has changed along with the type of data that needs to be read. From
writing many massive DATA steps to using LIBNAME or SAS Access engines, each type of data must be reviewed to
see what the best choice is for reading it and creating the SAS data sets that will be used to process the data.
Of course, another task that was added to process was the introduction of standards into the submission process. As
the regulatory agencies developed tools with which to review the data, the sponsors have been requested to develop
standards. At first, companies created their own standards which, in some ways, reduced the complexity of the
review process and yet in some ways introduced new issues. The Clinical Data Interchange Standards Consortium
(CDISC) has worked over the last several years to create standards for the pharmaceutical, biotechnology and
medical device companies to adopt. Now there is a whole new level of data management programming that needs to
be done during the submission process.
THE APPROACH
The examples that we will use for this paper are based on the data that you will find in the Appendix. We chose data
for the compound Nicardipine Hydrochloride (Nicardipine).Nicardipine is a calcium channel blocker, considered for
the treatment of patients who have had a particular type of stroke classified as aneurismal subarachnoid hemorrhage
(SAH). This type of stroke occurs when an aneurysm bursts. This causes bleeding inside the brain of the type called
subarachnoid hemorrhage (SAH).
The study was designed to learn whether or not Nicardipine could prevent worsening of a stroke caused by narrowing
of the blood vessels in the brain or improve the outcome following a stroke. In this case, study participants were
children. This was a randomized study where some participants were assigned to a control group (placebo) and
some to the experimental group (study drug).
The data for our examples are very old legacy data. Therefore, the names and types of variables are very different
than the ones that you will find in the SDTM 3.1.2 metadata for the DM domain. The input data sources include three
datasets: ADMIN2, RANDFILE, and REGISTER. The ADMIN2 file contains data about the start and end of treatment.
The RANDFILE file contains information about which treatment was received by each subject. Lastly, the REGISTER
file contains information about each subject such as date of birth, gender and race. All of these fields are needed in
the DM domain. The target data is the last entry in the appendix. This is the resultant DM domain for the Nicardipine
study in our example.
Also included in the appendix is the metadata for the SDTM 3.1.2 DM domain as specified in the SDTM
Implementation Guide (version 3.1.2). If you take a look, you will see all of the objects in the implementation guide
are included so that you can see what the structure is of the target data set.
IMPLEMENTING SDTM WITH BASE SAS APPROACH
In this section, we explore how to implement the SDTM data standard with BASE SAS as your primary tool. In the
simplest form, this involves importing your source data into BASE SAS, transforming that data with DATA steps, SQL,
and SAS PROCS, and then saving your SDTM domains as permanent data sets. For this particular instance of
2
SAS Global Forum 2011
Pharma
creating the DM file we sort the three source datasets by patient identifier and then merge them together. The
remaining activity is to define each of the SDTM DM variables in a DATA step and save that DM file to the target
LIBREF. As is the case with all legacy SAS work, we have at our disposal a code editor window and SAS
documentation perhaps in hardcopy as well as online but that is it.
The SAS code for our example problem of creating Demographic Domain (DM) from our raw source data looks like
this:
proc sort data=rawdata.admin2 out=admin2;
by studyno;
run;
proc sort data=rawdata.randfile out=randfile;
by studyno;
run;
proc sort data=rawdata.register out=register(rename=(sex=sexn race=racen));
by studyno;
run;
data readata;
merge admin2(in=a)
randfile(in=ra)
register(in=re);
by studyno;
run;
data target.dm;
set readata;
length STUDYID SUBJID USUBJID SITEID INVID INVNAM RACE ETHNIC ARM $40
DOMAIN $8 RFSTDTC RFENDTC BRTHDTC DMDTC $64 AGEU $10 ARMCD $20
SEX $1 COUNTRY $3 AGE DMDY 8;
keep STUDYID DOMAIN USUBJID SUBJID RFSTDTC RFENDTC SITEID INVID INVNAM
BRTHDTC AGE AGEU SEX RACE ETHNIC ARMCD ARM COUNTRY DMDTC DMDY;
STUDYID='NIC001';
DOMAIN='DM';
USUBJID=LEFT(PUT(STUDYNO,Z6.));
SUBJID=SUBSTR(COMPRESS(PUT(STUDYNO,Z6.)),4,3);
if nmiss(TXBEGDAT,TXBEGTIM)=0 then
RFSTDTC=PUT(DHMS(TXBEGDAT ,0,0,TXBEGTIM ),IS8601DT.);
if nmiss(TXENDDAT,TXENDTIM)=0 then
RFENDTC=PUT(DHMS(TXENDDAT ,0,0,TXENDTIM ),IS8601DT.);
SITEID=SUBSTR(RPTINV,1,3);
INVID=' ';
INVNAM=PUT(RPTINV,$INV.);
if nmiss(DOB)=0 then BRTHDTC=PUT(DOB,IS8601DA.);
if nmiss(DOB,ADMDAT)=0 then
AGE=(FLOOR((INTCK('month',DOB ,ADMDAT ) - (DAY(ADMDAT ) < DAY(DOB ))) / 12));
AGEU='YEARS';
SEX=SUBSTR(PUT(SEXN,SEX.),1,1);
RACE=PUT(RACEN,RACE.);
ETHNIC=' ';
if TRT='A' then ARMCD='NIC15';
else if TRT='B' THEN ARMCD='PLA';
ELSE ARMCD='';
ARM=PUT(TRT,$TREAT.);
COUNTRY='USA';
DMDTC=PUT(ADMDAT,IS8601DA.);
if nmiss(txbegdat,admdat)=0 then do;
if txbegdat >= admdat then dmdy = txbegdat - admdat + 1;
else dmdy = txbegdat - admdat;
end;
label STUDYID='Study Identifier'
DOMAIN='Domain Abbreviation'
3
SAS Global Forum 2011
Pharma
USUBJID='Unique Subject Identifier'
SUBJID='Subject Identifier for the Study'
RFSTDTC='Subject Reference Start Date/Time'
RFENDTC='Subject Reference End Date/Time'
SITEID='Study Site Identifier'
INVID='Investigator Identifier'
INVNAM='Investigator Name'
BRTHDTC='Date/Time of Birth'
AGE= 'Age in AGEU at RFSTDTC'
AGEU='Age Units'
SEX='Sex'
RACE='Race'
ETHNIC='Ethnicity'
ARMCD='Planned Arm Code'
ARM='Description of Planned Arm'
COUNTRY='Country'
DMDTC='Date/Time of Collection'
DMDY='Study Day of Collection' ;
run;
As you can see, this program consists of three SORT procedure steps, a DATA step to merge the source data, and a
final DATA step to derive the SDTM DM variables that are needed and save it as the final DM file.
BASE SAS APPROACH CHALLENGES
There are a number of challenges with the reliance on BASE SAS alone to perform SDTM domain data creation. A
primary issue is the management of metadata since there is none provided with BASE SAS alone. One thing to note
about this program is that you need to type in all of the LENGTH and LABEL statements to define the SDTM
metadata for the final domain data sets. This typing of metadata is tedious, prone to error, and likely to result in
inconsistencies across SDTM domain metadata for a trial. You also have no real regulation of the target metadata
and no real-time validation that your resulting domain is valid SDTM data. When using this BASE SAS approach you
also run into logistic and strategic issues with code maintenance and reusability of the SAS code. The BASE SAS
code itself can become difficult to read which makes maintenance difficult. This kind of coding tends to become
difficult to read and “one-off” in nature resulting in the fact that future SAS code reusability is limited.
BASE SAS APPROACH BENEFITS
The primary advantage, although some might actually consider this a disadvantage, of the BASE SAS approach is
that you have no restrictions as to what you can do with your SAS code. You have the full arsenal of BASE SAS and
can utilize any SAS procedure, macro code, or SQL procedure code you wish to solve the problem of SDTM data
conversion. Some have taken the BASE SAS approach to SDTM creation work and have augmented it with
commonly available tools such as Microsoft Access or Excel as a place to store and leverage metadata. This
augmented approach is better than BASE SAS solutions alone because you have your target SDTM metadata in a
more manageable source and you can consider the effort somewhat data driven and less prone to metadata
consistency errors
IMPLEMENTING SDTM WITH SAS ENTERPRISE GUIDE
In this section we explore how you would go about implementing the SDTM domain creation with SAS Enterprise
Guide. SAS Enterprise Guide gives you a graphical user interface and some additional tools that you can bring to
bear upon the task of SDTM file creation. The first step in this effort was to define a LIBREF called LIBRARY that
would point to the permanent format catalog associated with the source legacy datasets. Then we were able to
simply drag and drop the source datasets into the Enterprise Guide Process Flow window. With the data in the
process flow, it was trivial to apply PROC SORT Enterprise Guide Tasks to sort the data by patient identifier. At this
point the task of SDTM conversion converges on the same process used with the BASE SAS solution alone where
the data is merged, SDTM variables defined, and the permanent DM dataset saved just as before.
4
SAS Global Forum 2011
Pharma
The following display is a screen shot of the Enterprise Guide project that creates our SDTM DM file:
Display 1. Enterprise Guide Screen Shot of DM Creation
This Enterprise Guide project you see in Display 1 follows a flow very similar to the one in the BASE SAS
implementation before it where you have a few PROC SORT steps with a DATA step merging and deriving the
needed variables and then saving the DM file. However in this case the whole process is presented graphically in the
Enterprise Guide Process Flow window. Here is what the “Create DM” step from Display 1 looks like:
**** MERGE SOURCE DATA THEN DEFINE AND DERIVE DM SDTM VARIABLES;
data target.dm;
merge sasuser.admin2
sasuser.randfile
sasuser.register(rename=(sex=_sex race=_race));
by studyno;
**** CHECK FOR UNIQUENESS OF PRIMARY KEY;
if not (first.studyno and last.studyno) then
put "WARN" "NING: duplicate patient identifiers " studyno=;
**** DEFINE DM VARIABLES;
keep STUDYID DOMAIN USUBJID SUBJID RFSTDTC RFENDTC SITEID INVID INVNAM
BRTHDTC AGE AGEU SEX RACE ETHNIC ARMCD ARM COUNTRY DMDTC DMDY;
attrib
attrib
attrib
attrib
attrib
attrib
STUDYID length = $40 label = 'Study Identifier';
DOMAIN length = $8 label = 'Domain Abbreviation';
USUBJID length = $40 label = 'Unique Subject Identifier';
SUBJID length = $40 label = 'Subject Identifier for the Study';
RFSTDTC length = $64 label = 'Subject Reference Start Date/Time';
RFENDTC length = $64 label = 'Subject Reference End Date/Time';
5
SAS Global Forum 2011
attrib
attrib
attrib
attrib
attrib
attrib
attrib
attrib
attrib
attrib
attrib
attrib
attrib
attrib
Pharma
SITEID length = $40 label = 'Study Site Identifier';
INVID length = $40 label = 'Investigator Identifier';
INVNAM length = $40 label = 'Investigator Name';
BRTHDTC length = $64 label = 'Date/Time of Birth';
AGE length = 8 label = 'Age in AGEU at RFSTDTC';
AGEU length = $10 label = 'Age Units';
SEX length = $2 label = 'Sex';
RACE length = $40 label = 'Race';
ETHNIC length = $40 label = 'Ethnicity';
ARMCD length = $20 label = 'Planned Arm Code';
ARM length = $40 label = 'Description of Planned Arm';
COUNTRY length = $3 label = 'Country';
DMDTC length = $64 label = 'Date/Time of Collection';
DMDY length = 8 label = 'Study Day of Collection';
**** DERIVE SDTM DM VARIABLES;
studyid = "NIC001";
domain = "DM";
usubjid = left(put(studyno,z6.));
subjid = substr(compress(put(studyno,z6.)),4,3);
if txbegdat ne . and txbegtim ne . then
rfstdtc = put(dhms(txbegdat, 0, 0, txbegtim), is8601dt.);
if txenddat ne . and txendtim ne . then
rfendtc = put(dhms(txenddat, 0, 0, txendtim), is8601dt.);
siteid = substr(rptinv,1,3);
invid = '';
invnam = put(rptinv,$inv.);
brthdtc = put(dob,is8601da.);
age = floor((intck('month',dob ,admdat ) - (day(admdat) < day(dob ))) / 12);
ageu = 'YEARS';
sex = substr(put(_sex,sex.),1,1);
race = put(_race,race.);
ethnic = '';
if trt = 'A' then armcd = 'NIC15';
else if trt = 'B' then armcd = 'PLA';
arm = put(trt,$treat.);
country = 'USA';
dmdtc = put(admdat,is8601da.);
if txbegdat >= admdat then dmdy = txbegdat - admdat + 1;
else dmdy = txbegdat - admdat;
run;
ENTERPRISE GUIDE APPROACH CHALLENGES
Using Enterprise Guide as the primary tool to create SDTM data suffers some similar challenges to using BASE SAS
alone. You are still lacking metadata management here and as you can see above, all of the variable lengths and
labels are still manually typed into the program code. Again, you have no real regulation of the target metadata and
no real-time validation that your resulting domain is valid SDTM data. Although Enterprise Guide provides the helpful
Process Flow GUI, the “Tasks” available in Enterprise Guide that you can drop into your Process Flow is limited to
sorting, appending, and transposing the data. The SAS Clinical Data Integration product comes with more available
data management tasks in the form of what it calls “Transformations” instead of “Tasks” as we will see shortly.
ENTERPRISE GUIDE APPROACH BENEFITS
There are some advantages, over BASE SAS alone at least, to using Enterprise Guide to create SDTM data. As with
the BASE SAS approach, you once again have the full arsenal of BASE SAS and can utilize any BASE SAS PROC,
SAS MACRO, or SAS SQL code you wish to solve the problem of data conversion. However, with Enterprise Guide
you get some additional assistance in the form of automated “tasks” that you can drag and drop into your project.
You can see the PROC SORT driven “Sort” task used in Display 1 above, but there are other useful tasks for SDTM
6
SAS Global Forum 2011
Pharma
creation such as the data splitter, data appender, and data transposing (rows to columns and columns to rows) tasks
that can be very helpful here. If we had a more difficult domain to create then these additional prepackaged “tasks”
can be leveraged in the process flow and programming. Also it is worth mentioning with Enterprise Guide 4.3 that
you get more of a true development environment in SAS than ever before. Enterprise Guide 4.3 includes code
completion facilities and interactive syntax guides found in other software development environments that you will
love as a SAS programmer. Because of the Enterprise Guide process flow view, the SDTM work lends itself to being
more manageable and reusable long term because the programming itself tends to be less spaghetti code. Finally,
just as with the BASE SAS approach, the Enterprise Guide approach could be used in conjunction with tools such as
Microsoft Access or Excel to give you a minimal way of managing your SDTM metadata.
IMPLEMENTING SDTM WITH SAS CLINICAL DATA INTEGRATION
Now that we have explored the creation of SDTM files with BASE SAS and SAS Enterprise Guide approaches, it is
now a good time to look at the “full monty” SAS approach to SDTM data creation work and that involves using SAS
Clinical Data Integration. SAS Clinical Data Integration is an ETL tool built on top of the SAS Data Integration product
and it comes with clinical trials additions that we will use in this process. To begin this process in SAS CDI, we drag
and drop the SDTM DM domain from our metadata repository and you will see that in Display 2 below as a beige box
called “DM” to the far right. That target domain already has defined for you the table and variable level metadata that
you need and it also includes appropriate integrity constraints on the data. Now that we have our target defined, we
drag and drop our source datasets and you can see those in Display 2 as the three beige boxes to the far left. The
next step is to join the three source datasets via a SQL join which is done by dragging and dropping the predefined
“SQL Join” SAS CDI transform. The “Extract” transformation step you see in Display 2 below is where the SDTM DM
variables get defined in a process analogous to the BASE SAS and SAS EG DATA step code you saw previously.
Within SAS CDI this is done within point and click driven PROC SQL code building steps. The final step is to insert
the “Table Loader” transformation which takes the SAS dataset from the “Extract” step and saves the permanent DM
dataset.
Display 2. SAS Clinical Data Integration Screen Shot of DM Creation
7
SAS Global Forum 2011
Pharma
SAS CLINICAL DATA INTEGRATION APPROACH CHALLENGES
Since SAS Clinical Data Integration handles many facets to SDTM data creation the challenges to using it are a bit
limited. Probably the biggest challenge for a SAS programmer is learning to give up slinging BASE SAS code and
learning to rely on the tool to do the work. Also, SAS Clinical Data Integration relies on SAS SQL under the hood
quite a bit so if you are an “old-school” BASE SAS programmer then brushing up on your SQL skills is advantageous.
As with the BASE SAS and Enterprise Guide solutions, you can utilize any BASE SAS procedures you wish to use,
but the key advantage to using Clinical Data Integration is in its ability to manage your metadata. Therefore, you
want to avoid writing a bunch of custom SAS code because that limits the tool’s ability to control the work. This can
be a bit of a challenge because you have to learn to work and program largely within the confines of the transforms
available within CDI. SAS CDI also makes you think about metadata management so there is a bit of setup that
needs to be done in the tool in terms of defining your target data metadata up front.
SAS CLINICAL DATA INTEGRATION APPROACH BENEFITS
SAS Clinical Data Integration gives you the same kind of process flow view and drag and drop tasks/transforms that
Enterprise Guide gives you, but it gives you what neither BASE SAS nor Enterprise Guide can provide alone. SAS
Clinical Data Integration manages your metadata for your SDTM work. It controls the target SDTM metadata so
compliance with a defined SDTM standard is built into your workflow. It also connects the metadata across you
SDTM data creation so that you can analyze your data for changes and updates and also propagate a change across
your SDTM data creation.
If you use SAS Clinical Data Integration as intended with standard transforms, it essentially enforces a bit of
consistency of process in creating SDTM domains. This consistency along with the process view allows for SDTM
creation jobs to be more easily maintained and also allows for reuse of jobs. SAS Clinical Data Integration also
allows for “typical” SDTM generation tasks, such as study day (--DY) or ISO date (--DTC) creation, to be standardized
into user written transforms that can be dragged and dropped into future SDTM jobs.
Although SAS Enterprise Guide provides a number of common “tasks” that can be dragged and dropped into your
process flow, SAS Clinical Data Integration provides a much more expansive list of transformations for you to choose
from. Several of those are extremely handy in terms of creating SDTM domains and those include the sort,
transpose, data joiner, lookup table, data extraction, and data loader transformations.
Finally, SAS Clinical Data Integration is integrated with the SAS Clinical Standards Toolkit found with BASE SAS
software. There are preexisting CDI transformations that allow you to validate your SDTM datasets based on the
SDTM metadata and also to automatically generate your define.xml file which is a huge benefit.
CONCLUSION
In this paper we compared various SAS technologies in the task of converting clinical trials data into the CDISC
SDTM. We looked at a BASE SAS approach, a SAS Enterprise Guide approach, and a SAS Clinical Data Integration
approach. What we saw in the three approaches was that we moved from little tool support to heavy context specific
tool support in the work. It used to be that when clinical SAS programmers were confronted by a data transformation
task such as SDTM conversions that all we had was BASE SAS. Now we have a better GUI tool to use in SAS
Enterprise Guide that helps us in SAS code development. In addition to that improved GUI, we also have a new
clinical trials and CDISC friendly industry-specific tool in SAS Clinical Data Integration to use. SAS Clinical Data
Integration gives us a way to manage our metadata and process in a way that we did not have before while still
allowing us to write BASE SAS code when needed. It is a tool like this that we needed to have in order to have
largely metadata driven transformation processes that can scale up to perform numerous data conversions in a
reliable and efficient manner.
CONTACT INFORMATION
Janet Stuelpner
[email protected]
Jack Shostak
[email protected]
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are
trademarks of their respective companies.
8
SAS Global Forum 2011
Pharma
APPENDIX
Subset of data used in example programs along with SDTM metadata definitions for DM domain.
9
SAS Global Forum 2011
Pharma
10
SAS Global Forum 2011
Pharma
SDTM 3.1.2 Demography Metadata
11
SAS Global Forum 2011
Pharma
12
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement