October 14th, 2019 Meeting

Past Meeting

Download Jacob’s Presentation
Download Fan’s Presentation
Download Troy’s Presentation

Monday, October 14th, 2019

12:30 PM (Registration)
1:30 – 4:30 PM (Meeting)

Gilead Sciences
Space A
309 Vintage Park Way
Foster City, CA 94404

Facility Host:
Amy Caron

Event Host:
Thomas Leung (415) 956-3611

Speaker 1

Python and R made easy for the SAS® Programmer

Janet Li, Pfizer


Many of the day-to-day tasks and responsibilities of the statistical programmer of a pharmaceutical research and development group or contract research organization (CRO) involved include importing/exporting data, deriving variables and creating analysis data sets, and creating clinical study report (CSR) materials such as tables, listings, and figures (TLFs).

These outputs are used for submission to regulatory agencies and are generally programmed in SAS®. There is a growing movement to integrate artificial intelligence (AI) capabilities, such as machine learning and natural language processing, into the clinical programming world. Two very useful tools for SAS® programmers wishing to expand their AI knowledge are Python and R.

In this paper, we provide examples of common SAS® procedures and syntax that are used in the creation of analysis datasets and TLFs and translate them to Python and R to help the clinical SAS® programmer familiarize themselves with an alternative way of programming and allay their concerns related to learning a new language. Furthermore, we hope this can serve as a starting point for the clinical programmer wishing to use Python and R in conjunction with SAS® to more effectively and efficiently deliver statistical programming outputs.

About the Speaker:

Janet is currently a statistical programmer at Pfizer in San Francisco, CA. Janet has over 5 years of experience working as a project manager/research coordinator for stroke-related observational studies and clinical trials at Georgetown University.

She has also been conducting research on statistical considerations for the master protocol design being implemented in oncology trials. She holds a BS in Neuroscience from Duke University and a MS in Biostatistics from Georgetown University. In her spare time, she is an aspiring metalsmith/jewelry maker and loves to read (especially Murakami novels) and explore new places.

Speaker 2

Abstracting and Automating Hierarchical Data Models: Leveraging the SAS® FORMAT Procedure CNTLIN Option To Build Dynamic Formats That Clean, Convert, and Categorize Data

Troy Hughes, Independent


The SAS® FORMAT procedure “creates user-specified formats and informats for variables.” In other words, FORMAT defines data models that transform (and sometimes bin) prescribed values (or value ranges, in the case of numeric data) into new values. SAS formats facilitate multiple objectives of data governance, including data cleaning, the identification of outliers or new values, entity resolution, and data visualization, and can even be used to query or join lookup tables.

SAS formats are often hardcoded into SAS software, but where data models are fluid, formats are best defined within control files outside of software. This modularity—the separation of data models from the programs that utilize them—allows SAS developers to build and maintain SAS software independently while domain subject matter experts (SMEs) separately build and maintain the underlying data models.

Independent data models also facilitate master data management (MDM) and software interoperability, allowing a data model to be maintained as a single instance, albeit implemented not only with SAS but also Python, R, or other languages or applications.The CNTLIN option (within the SAS FORMAT procedure) facilitates this modularity by creating SAS formats from data sets.

This text introduces the BUILD_FORMAT macro that greatly expands the utility of CNTLIN, allowing it to build formats not only from one-to-one and many-to-one format mappings but also from multitiered, hierarchical data models that are built and maintained externally in XML files. The numerous advantages of BUILD_FORMAT are demonstrated through successive SAS code examples that rely on the taxonomy of the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5).

About the Speaker:

Troy has more than 20 years of experience leading SAS teams and projects in support of federal, state, and local government initiatives. Since 2013, he has given more than 100 presentations, trainings, and hands-on workshops at SAS conferences, including at SAS Global Forum, SAS Analytics Experience, WUSS, SCSUG, SESUG, MWSUG, PharmaSUG, and local user groups. Additionally, he has authored two groundbreaking books that model software design and development best practices:

  • – SAS Data Analytic Development: Dimensions of Software Quality (2016)
  • – SAS Data-Driven Development: From Abstract Design to Dynamic Functionality (2019)

Troy has an MBA in information systems management and numerous certifications including SAS Base, SAS Advanced, SAS Clinical Trials, PMP, PMI-RMP, PMI-PBA, PMI-ACP, CISSP, CSSLP, ITIL, CSM, CSD, CSPO, CSP-SM, and CSP-PO. He is a US Navy veteran with two tours of duty in Afghanistan.

Speaker 3


Fan Lin, Gilead Sciences


ADME (Absorption, Distribution, Metabolism and Excretion) study is usually conducted in early clinical drug development stage to understand the route of drug excretion and its metabolites in human body. It measures the concentrations of the parent/metabolite(s) and determines the amount of radioactivity in plasma, urine and feces (Gerlie Gieser, Investigators Forum 2012).

Due to the sample species involving urine/feces and subjects being discharged at different times the complexities of creating CDISC compiled SDTM and ADAM datasets are increased compared to other PK studies. In this paper we will introduce the complexities of ADME PK study and our approaches to resolve these challenges. This paper demonstrates the process of ADME study PKMERGE/PC/ADPC/TF in a flow chart, and then describes the details in each step, followed by a list of challenges existing in current industry.

Other challenges also include deriving last record carried forward over for early discharge subjects to maximum discharge visit in the dataset and how to represent this in cumulative graph. This paper provides the detailed steps of resolving each challenges to create CDISC complied data modules and analysis presentations. Referring other industry white paper [1] our process meets industry standard and provides high quality visual plot by utilizing the most powerful SAS graphic template language.

About the Speaker:

Fan Lin is a senior manager of the statistical programming at Gilead Sciences. She has about 20 years’ experience in industry, first as an analytical scientist at Smith Kline Beecham, then as a statistical programmer at varied companies. She co-authored several publications on Journal of Medicinal Chemistry.

Speaker 4

Proportion difference and confidence interval based on CMH test in stratified RCT with an example in pooled analysis of HIV trials

Jacob Gong, Gilead Sciences


In clinical trials, adjustment for region and other stratification factors in global trials is recommended; we need to take account of stratified factors in the estimation of common treatment effect. Method favored by biostatisticians is to obtain weighted average of stratum-specific proportion differences.

The Cochran-Mantel-Haenszel (CMH) method is a technique that generates an estimate of an association between a treatment and an outcome after adjusting for or taking into account stratification factors. At Gilead, there is a company-wise SAS macro to produce strata-adjusted proportion difference from CMH method. In this presentation, we will demonstrate an example where we calculate 95% asymptotic confidence interval for strata-adjusted proportion difference using company macro and its limitations and potential solutions based on different scenarios.

About the Speaker:

Jacob is a currently statistical programmer at Gilead Sciences. He supports HIV TA including long-acting regimens and HIV cure compounds. Prior to joining Gilead, he worked at Eli Lilly in multiple TAs including diabetes and neuroscience. He holds a MS in biostatistics from Columbia university. During graduate school, he interned at Memorial Sloan Kettering Cancer Center and New York Psychiatric Institute. He TA categorical data analysis at Columbia which he enjoyed a lot and believed there is nothing boring about statistics.