4 About the QA Package

The purpose of this chapter is to document version 8.5.0 of the Sentinel Quality Assurance Package (QA Package). This documentation describes QA Package methods and capabilities and provides the information required to get started with running the tool.

4.1 Control Flow Module

All Level 1 checks for all tables are performed first. If any major issues with the data are detected the package will abort (see Figure 4.1)

Control Flow: Level 1 Module Abort Logic

Figure 4.1: Control Flow: Level 1 Module Abort Logic

All Level 2 checks are then performed in a logical sequence and abort at each step if AbortYN = Y for any flag in that step (see Figure 4.2). Level 2 checks include:

  1. Perform critical intra-table checks that would cause downstream data integrity issues
  2. Perform critical cross-table checks for the same reasoning as above. Note: this is the final place where the package may abort
  3. Continue to remaining Level 2 checks and all remaining modules, regardless of the resulting data flags. Note: Prior to this step, all resulting datasets are output to the “dplocal” folder. Only log, metadata, and signature files will be in the “msoc” subfolder, until all abort checks have been successfully executed. At that time, all appropriate datasets will be moved to the “msoc” subfolder.
Control Flow: Level 2 Module Abort Logic

Figure 4.2: Control Flow: Level 2 Module Abort Logic

After successful completion of the Level 2 module, each individual module following will then be executed, in the order specified in the inputfiles/control_flow.csv file.

4.2 Mother-Infant Identification Module

If Execute_MI is set to Y in the master program file, the Mother-Infant Identification (MI ID) package will execute after the QA program package has completed. Running the MI ID package is a pre-requisite for the creation of the Mother-Infant Linkage Table. Specifications for the Mother-Infant Identification Program can be found on the Sentinel website and a more extensive discussion of QA checks for the Mother-Infant Linkage (MIL) table can be found in Chapter 11.

Definition 4.1 (Live Birth Delivery) Live births are defined using a code list developed by the Medication Exposure in Pregnancy Risk Evaluation Program (MEPREP) and other Sentinel and non Sentinel pregnancy-related projects.13 Identification of live births based on the original work in MEPREP has been updated to incorporate ICD-10-CM and ICD-10 Procedure Coding System (ICD-10-PCS) codes. Live birth deliveries, as identified in the description above, are eligible for the MIL table when they have the following additional characteristics:

  • the encounter occurred at least 90 days later than the start of Data Partner data availability;
  • the individual was between 10 and 54 years of age as of the admission date of the delivery encounter and has an assigned value of Female in the Sentinel Common Data Model demographic table variable Sex;
  • the individual has no evidence of delivery for 180 days prior; and
  • the individual had 90 days of continuous medical coverage, gaps of up to 45 days allowed, through the delivery date.

Infant records with at least one day of enrollment with medical coverage during the first three years of life are eligible for inclusion in the MIL Table.

4.3 Definition Of Enrollment Span Comparisons

Definitions and Examples of Enrollment Date Range Relationships by PatID

Figure 4.3: Definitions and Examples of Enrollment Date Range Relationships by PatID

4.4 Minimum And Maximum Dates Of Data Completeness

Minimum and Maximum dates of data completeness are created by this package for all SCDM tables4 containing at least one date variable as defined in the input file lkp_all_minmax.sas7bdat.5

Definition 4.2 (Minimum date of data completeness) The Minimum date of data completeness (mindate) is calculated by determining the earliest year-month—e.g., 2010-01—with a record count within an 80% threshold of the record count of the next month—e.g., 2010-02—and then assigning the first day of that month to create a SAS date, formatted as YYYY-MM-DD—e.g., 2010-01-01.

Definition 4.3 (Maximum date of data completeness) The Maximum date of data completeness (maxdate) is calculated by determining the latest year-month—e.g., 2017-10—with a record count within an 80% threshold of the record count of the prior month—e.g., 2017-09—and then assigning the last day of that month to create a SAS date, formatted as YYYY-MM-DD—e.g., 2017-10-31.

Overall minimum are then calculated thus:

Definition 4.4 (Overall Minimum date of data completeness (DP Min)) The overall dp_mindate is calculated by determining the latest mindate—i.e. the maximum of the minimum—from the SCDM Enrollment, Dispensing, Encounter, Diagnosis, and Procedure Table

Definition 4.5 (Overall Maximum date of data completeness (DP Max)) The overall dp_maxdate is calculated by determining the earliest maxdate—i.e. the minimum of the maximum—from the SCDM Enrollment, Dispensing, Encounter, Diagnosis, and Procedure Table

These dates are stored in a SAS dataset minmax_dates. The DPMin and DPMax associated with the latest production ETL at each Data Partner site will be used by Common Components (CC) to populate the global macro variables &mindate and &maxdate for distributed request packages.

Example of Maximum date of data completeness algorithm

Figure 4.4: Example of Maximum date of data completeness algorithm

4.5 Age Calculation

Age in years (age_years) is calculated using the date of birth variable found in the SCDM Demographic Table and the overall maxdate calculated for the ETL under review.

Definition 4.6 (Kreuter method for age calculation) The following equation measures age in whole years. It counts the months between the two dates, subtracts one month if the day boundary has not been crossed for the last month, and then converts months to years and reports it as an integer.4

'Age_years' = floor((intck('month','birth_date',&DP_MaxDate.)-(day(&DP_MaxDate.)\<day('birth_date')))/12)

Age in years is summarized based on the following categories:

00. Missing
00. Negative
01. 0-1 yrs
02. 2-4 yrs
03. 5-9 yrs
04. 10-14 yrs
05. 15-18 yrs
06. 19-21 yrs
07. 22-44 yrs
08. 45-64 yrs
09. 65-74 yrs
10. 75+ yrs

4.6 Enforcing the ICD-10 switchover for Inpatient and Institutional Encounters

The Quality Assurance (QA) Package includes two warn flags, DIA_2_07_00-0_223 and PRO_2_07_00-0_223. These flags are designed to signal when an ICD-10 code is detected prior to October 1, 2015, or when an ICD-9 code is identified on or after this date in either the Diagnosis or Procedure tables.

For the majority of encounter types, the triggering of these flags is dependent on the ADate variable in either the Diagnosis or Procedure table. However, there is an exception for inpatient (IP) and institutional stays (IS). During the transition from the ICD-9 to ICD-10 era, the switchover was based on the discharge date rather than the admission date for these two types of encounters. Consequently, when the EncType is either IP or IS, the QA Package bases this validation check on the Ddate variable from the Encounter Table. The mechanics of the QA Package dictates that these inpatient and institutional stay flags be treated as cross-table checks and are thus implemented as DIA-ENC_2_07_00-0_223 and PRO-ENC_2_07_00-0_223.

4.7 Data Characteristics Sign-off Report

The Data Characteristics Sign-off Report is a summary report of all flags thrown by a given run of the QA Package. The report is used by Data Partners to comment and sign-off on the flags thrown by the QA Package, and returned to SOC. The QA Package generates this report in the msoc folder, and it is named data_characteristics_sign_off_report_[dpid]_etl[#].xlsx where [dpid] is the Data Partner callsign and [#] is the ETL under review. For further details on the output of this report, see the Data Characteristics Sign-off Report specifications within Section @ref(#qar-msoc-folder).

In generating the Data Characteristics Sign-off Report, the QA Package calculates difference in the count of flags found in ETL under review and the previous ETL of the QA Package. This is accomplished by adding the previous ETL’s all_l1_l2_flags SAS dataset to the inputfiles directory of the QA Package prior to package execution.

References

1.
Alison Tse Kawai, Grace M. Lee, Azadeh Shoaibi. Mini-Sentinel CBER/PRISM Surveillance Protocol Influenza Vaccines and Pregnancy Outcome. Version 2.; 2014. https://www.sentinelinitiative.org/studies/vaccines-blood-biologics/influenza-vaccines-and-pregnancy-outcomes-prism
3.
Li Q, Andrade SE, Cooper WO, et al. Validation of an algorithm to estimate gestational age in electronic health plan databases: VALIDITY OF GESTATIONAL AGE ALGORITHM. Pharmacoepidem Dr S. 2013;22(5):524-532. doi:10.1002/pds.3407
4.
William Kreuter. Age calculation and when does the sun rise and fall code. In: SAS Conference Proceedings: Pacific Northwest SAS Users Group 1996.; 1996:3. https://www.lexjansen.com/pnwsug/1996/PNWSUG96025.pdf

  1. While the r01_mother_deliveries and / or r02_infants tables, created by the separate Mother-Infant Identification Program contain dates, these are derived dates from the core tables and are thus not used for setting minimum and maximum dates.↩︎

  2. It should be noted that the min/maxdate algorithm may not work well with all types of date distributions—e.g., a distribution with a large drop proceeded or followed by a long, flat tail of many months. When there are at least two consecutive months at the tail end of the distribution with relatively low counts, the algorithm may sometimes pick a month with incomplete data. For example, if the Year-Month 2017-06 in Figure 4 had a count of “600” instead of “400”, it would meet the 80% threshold and be incorrectly chosen as the max date.↩︎