Algebra 2 Midterm Exam
Algebra II Midterm Exam
Name: _________________________
Score: ______ / ______
Answer the questions below. Make sure to show your work and justify all of your answers
1. Simplify: Show your work.
2. For the given quadratic equation convert into vertex form, find the vertex, and find the value for x = 6. Show your work.
y = -2x2 + 2x +2
3. |
A manufacturer of shipping boxes has a box shaped like a cube. The side length is (5a + 4b). What is the volume of the box in terms of a and b? Show your work.
|
|
|
4. If a function, f(x) is shifted to the left four units, what will the transformed function look like?
|
5. Solve the problem by writing an inequality. A club decides to sell T-shirts for $12 as a fund-raiser. It costs $20 plus $8 per T-shirt to make the T-shirts. Write and solve an equation to find how many T-shirts the club needs to make and sell in order to profit at least $100. Show your work.
6. The velocity of sound in air is given by the equation , where v is the velocity in meters per second and t is the temperature in degrees Celsius. Find the temperature when the velocity is 329 meters per second by graphing the equation. Round the answer to the nearest degree. Show your work.
7. The volume in cubic feet of a box can be expressed as , or as the product of three linear factors with integer coefficients. The width of the box is x-2.
Factor the polynomial to find linear expressions for the height and the length. Show your work.
8. What is the solution to the equation . Show your work.
9. Solve the equation. Check for extraneous solutions. Type your answers in the blanks. Show your work.
x = _____ or _____
10. Write an expression for the volume of a cylinder with a height 7in. greater than the radius.
11. What is the value of log81 3? Show your work.
Time (hours) |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
Population (1000s) |
5.1 |
3.03 |
1.72 |
1.17 |
1.38 |
2.35 |
4.08 |
12. In an experiment, a petri dish with a colony of bacteria is exposed to cold temperatures and then warmed again.
|
Find a quadratic model for the data in the table. Type your answer below. Show your work.
|
|
|
|
|
13. Use the model from problem 12 to estimate the population of bacteria at 9 hours. Type your answer below. Show your work. |
|
|
14. Evaluate the expression for the given value of the variable(s). Show your work.
15. Find a quadratic model for the set of values: (-2, -20), (0, -4), (4, -20). Show your work.
16. Simplify the expression. Type your answer in the blank.
-
17. Suppose you cut a small square from a square of fabric as shown in the diagram. Write an expression for the remaining shaded area. Factor the expression. Type your answer below.
|
18. Is the relation {(3, 5), (–4, 5), (–5, 0), (1, 1), (4, 0)} a function? Explain. Type your answer below.
19. Evaluate Show your work.
20. Consider the leading term of the polynomial function. What is the end behavior of the graph? Describe the end behavior and provide the leading term.
-3x5 + 9x4 + 5x3 + 3
ORIGINAL RESEARCH
Demographic Factors and Hospital Size Predict Patient Satisfaction Variance—Implications for Hospital Value-Based Purchasing
Daniel C. McFarland, DO1*, Katherine A. Ornstein, PhD2, Randall F. Holcombe, MD1
1Division of Hematology/Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, Mount Sinai Medical Center, New York, New York; 2Department of Geriatrics and Palliative Medicine, Icahn School of Medicine at Mount Sinai, Mount Sinai Medical Center, New York, New York.
BACKGROUND: Hospital Value-Based Purchasing (HVBP) incentivizes quality performance-based healthcare by link- ing payments directly to patient satisfaction scores obtained from Hospital Consumer Assessment of Health- care Providers and Systems (HCAHPS) surveys. Lower HCAHPS scores appear to cluster in heterogeneous population-dense areas and could bias Centers for Medi- care & Medicaid Services (CMS) reimbursement.
OBJECTIVE: Assess nonrandom variation in patient satis- faction as determined by HCAHPS.
DESIGN: Multivariate regression modeling was performed for individual dimensions of HCAHPS and aggregate scores. Standardized partial regression coefficients assessed strengths of predictors. Weighted Individual (hos- pital) Patient Satisfaction Adjusted Score (WIPSAS) utilized 4 highly predictive variables, and hospitals were reranked accordingly.
SETTING: A total of 3907 HVBP-participating hospitals.
PATIENTS: There were 934,800 patient surveys by the most conservative estimate.
MEASUREMENTS: A total of 3144 county demographics (US Census) and HCAHPS surveys.
RESULTS: Hospital size and primary language (non–English speaking) most strongly predicted unfavorable HCAHPS scores, whereas education and white ethnicity most strongly predicted favorable HCAHPS scores. The average adjusted patient satisfaction scores calculated by WIPSAS approxi- mated the national average of HCAHPS scores. However, WIPSAS changed hospital rankings by variable amounts depending on the strength of the predictive variables in the hospitals’ locations. Structural and demographic characteris- tics that predict lower scores were accounted for by WIPSAS that also improved rankings of many safety-net hospitals and academic medical centers in diverse areas.
CONCLUSIONS: Demographic and structural factors (eg, hospital beds) predict patient satisfaction scores even after CMS adjustments. CMS should consider WIPSAS or a simi- lar adjustment to account for the severity of patient satisfac- tion inequities that hospitals could strive to correct. Journal of Hospital Medicine 2015;10:503–509. VC 2015 Society of Hospital Medicine
The Affordable Care Act of 2010 mandates that gov- ernment payments to hospitals and physicians must depend, in part, on metrics that assess the quality and efficiency of healthcare being provided to encourage value-based healthcare.1 Value in healthcare is defined by the delivery of high-quality care at low cost.2,3 To this end, Hospital Value-Based Purchasing (HVBP) and Physician Value-Based Payment Modifier pro- grams have been developed by the Centers for Medi- care & Medicaid Services (CMS). HVBP is currently being phased in and affects CMS payments for fiscal year (FY) 2013 for over 3000 hospitals across the United States to incentivize healthcare delivery value. The final phase of implementation will be in FY 2017 and will then affect 2% of all CMS hospital reim- bursement. HVBP is based on objective measures of
hospital performance as well as a subjective measure of performance captured under the Patient Experience of Care domain. This subjective measure will remain at 30% of the aggregate score until FY 2016, when it will then be 25% the aggregate score moving for- ward.4 The program rewards hospitals for both over- all achievement and improvement in any domain, so that hospitals have multiple ways to receive financial incentives for providing quality care.5 Even still, there appears to be a nonrandom pattern of patient satisfac- tion scores across the country with less favorable scores clustering in densely populated areas.6
Value-Based Purchasing and other incentive-based programs have been criticized for increasing dispar- ities in healthcare by penalizing larger hospitals (including academic medical centers, safety-net hospi- tals, and others that disproportionately serve lower socioeconomic communities) and favoring physician- based specialty hospitals.7–9 Therefore, hospitals that serve indigent and elderly populations may be at a dis- advantage.9,10 HVBP portends significant economic consequences for the majority of hospitals that rely heavily on Medicare and Medicaid reimbursement, as most hospitals have large revenues but low profit mar- gins.11 Higher HVBP scores are associated with for profit status, smaller size, and location in certain areas
*Address for correspondence and reprint requests: Daniel McFarland, DO, Hematology/Oncology, Mount Sinai Medical Center, One Gustave L. Levy Place, Box 1079, New York, NY 10029; Telephone: 212–659-5420; Fax: 212–241-2684; E-mail: [email protected]
Additional Supporting Information may be found in the online version of this article.
Received: November 13, 2014; Revised: March 17, 2015; Accepted: April 3, 2015 2015 Society of Hospital Medicine DOI 10.1002/jhm.2371 Published online in Wiley Online Library (Wileyonlinelibrary.com).
An Official Publication of the Society of Hospital Medicine Journal of Hospital Medicine Vol 10 | No 8 | August 2015 503
of the United States.12 Jha et al.6 described Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) scores’ regional geographic vari- ability, but concluded that poor satisfaction was due to poor quality.
The Patient Experience of Care domain quantifies patient satisfaction using the validated HCAHPS sur- vey, which is provided to a random sample of patients continuously throughout the year at 48 hours to 6 weeks after discharge. It is a publically available standardized survey instrument used to measure patients’ perspectives on hospital care. It assesses the following 8 dimensions: nurse communication, doctor communication, hospital staff responsiveness, pain management, medicine communication, discharge information, hospital cleanliness and quietness, and overall hospital rating, of which the last 2 dimensions each have 2 measures (cleanliness and quietness) and (rating 9 or 10 and definitely recommend) to give a total of 10 distinct measures.
The United States is a complex network of urban, suburban, and rural demographic areas. Hospitals exist within a unique contextual and compositional meshwork that determines its caseload. The top popu- lation density decile of the United States lives within 37 counties, whereas half of the most populous parts of the United States occupy a total of 250 counties out of a total of 3143 counties in the United States. If the 10 measures of patient satisfaction (HCAHPS) scores were abstracted from hospitals and viewed according to county-level population density (sepa- rated into deciles across the United States), a trend would be apparent (Figure 1). Greater population den- sity is associated with lower patient satisfaction in 9 of 10 categories. On the state level, composite scores of overall patient satisfaction (amount of positive scores) of hospitals show a 12% variability and a sig-
nificant correlation with population density (r 5 20.479; Figure 2). The lowest overall satisfaction scores are obtained from hospitals located in the population-dense regions of Washington, DC, New York State, California, Maryland, and New Jersey (ie, 63%–65%), and the best scores are from Louisiana, South Dakota, Iowa, Maine, and Vermont (ie, 74%– 75%). The average patient satisfaction score is 71% 6 2.9%. Lower patient satisfaction scores appear to cluster in population-dense areas and may be asso- ciated with greater heterogeneous patient demo- graphics and economic variability in addition to population density.
These observations are surprising considering that CMS already adjusts HCAHPS scores based on patient-mix coefficients and mode of collection.13–18
Adjustments are updated multiple times per year and account for survey collection either by telephone, email, or paper survey, because the populations that select survey forms will differ. Previous studies have shown that demographic features influence the patient evaluation process. For example, younger and more educated patients were found to provide less positive evaluations of healthcare.19
This study examined whether patients’ perceptions of healthcare (pattern of patient satisfaction) as quan- tified under the patient experience domain of HVBP were affected and predicted by population density and other demographic factors that are outside the control of individual hospitals. In addition, hospital-level data (eg, number of hospital beds) and county-level data
FIG. 1. Overall patient satisfaction by population density decile. Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS)
scores are segregated by population density deciles (representing 33 million
people each). Population density increases along the grey scale. The com-
posite score and 9 out of 10 HCAHPS dimensions demonstrate lower patient
satisfaction as population density increases (darker shade). Abbreviations:
Doc, doctor; Def Rec, definitely recommend.
FIG. 2. Averaged Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) scores by state correlated with state population
(Pop) density. Bivariate correlation of composite HCAHPS scores predicted
by state population density without District of Columbia, r 5 20.479,
P < 0.001 (2-tailed). This observed correlation informed the hypothesis that
population density could predict for lower patient satisfaction via HCAHPS
scores.
McFarland et al | Patient Satisfaction Variance Prediction
504 An Official Publication of the Society of Hospital Medicine Journal of Hospital Medicine Vol 10 | No 8 | August 2015
such as race, age, gender, overall population, income, time spent commuting to work, primary language, and place of birth were analyzed for correlation with patient satisfaction scores. Our study demonstrates that demographic and hospital-level data can predict patient satisfaction scores and suggests that CMS may need to modify its adjustment formulas to eliminate bias in HVBP-based reimbursement.
METHODS Data Collection
Publically available data were obtained from Hospital Compare,20 American Hospital Directory,21 and the US Census Bureau22 websites. Twenty relevant US Census data categories were selected by their rele- vance for this study out of the 50 publically reported US Census categories, and included the following: county population, county population density, percent of population change over 1 year, poverty level (per- cent), income level per capita, median household income, average household size, travel time to work, percentage of high school or college graduates, non- English primary language spoken at home, percentage of residents born outside of the United States, popula- tion percent in same residence for over 1 year, gender, race (white alone, white alone (not Hispanic or Lat- ino), black or African American alone), population over 65 years old, and population under 18 years old.
HCAHPS Development
The HCAHPS survey is 32 questions in length, com- prised of 10 evaluative dimensions. All short-term, acute care, nonspecialty hospitals are invited to partic- ipate in the HCAHPS survey.
Data Analysis
Statistical analyses used the Statistical Package for Social Sciences version 16.0 for Windows (SPSS Inc., Chicago, IL). Data were checked for statistical assumptions, including normality, linearity of relation- ships, and full range of scores. Categories in both the Hospital Compare (HCAHPS) and US Census datasets were analyzed to assess their distribution curves. The category of population densities (per county) was con- verted to a logarithmic scale to account for a skewed distribution and long tail in the area of low popula- tion density. Data were subsequently merged into an Excel (Microsoft, Redmond, WA) spreadsheet using the VLookup function such that relevant 2010 census county data were added to each hospital’s Hospital Compare data. Linear regression modeling was per- formed. Bivariate analysis was conducted (ENTER method) to determine the significant US Census data predictors for each of the 10 Hospital Compare dimensions including the composite overall satisfac- tion score. Significant predictors were then analyzed in a multivariate model (BACKWORDS method) for each Hospital Compare dimension and the composite
average positive score. Models were assessed by deter- minates of correlation (adjusted R2) to assess for goodness of fit. Statistically significant predictor varia- bles for overall patient satisfaction scores were then ranked according to their partial regression coeffi- cients (standardized b).
A patient satisfaction predictive model was sought based upon significant predictors of aggregate percent positive HCAHPS scores. Various predictor combina- tions were formed based on their partial coefficients (ie, standardized b coefficients); combinations were assessed based on their R2 values and assessed for col- inearity. Combinations of partial coefficients included the 2, 4, and 8 most predictive variables as well the 2 most positive and negative predictors. They were then incorporated into a multivariate analysis model (FOR- WARD method) and assessed based on their adjusted R2 values. A 4-variable combination (the 2 most pre- dictive positive partial coefficients plus the 2 most pre- dictive negative partial coefficients) was selected as a predictive model, and a formula predictive of the composite overall satisfaction score was generated. This formula (predicted patient satisfaction formula [PPSF]) predicts hospital patient satisfaction HCAHPS scores based on the 4 predictive variables for particu- lar county and hospital characteristics.
PPSF 5 KMV 1 BHB HBð Þ 1 BNE NEð Þ 1 BE Eð Þ 1 BW Wð Þ
where KMV 5 coefficient constant (70.9), B 5 un- standardized b coefficient (see Table 1 for values), HB 5 number of hospital beds, NE 5 proportion of non-English speakers, E 5 education (proportion with bachelor’s degree), and W 5 proportion identified as white race only.
The PPSF was then modified by weighting with the partial coefficient (b) to remove the bias in patient sat- isfaction generated by demographic and structural fac- tors over which individual hospitals have limited or no control. This formula generated a Weighted Indi- vidual (hospital) Predicted Patient Satisfaction Score (WIPPSS). Application of this formula narrowed the predicted distribution of patient satisfaction for all hospitals across the country.
WIPPSS 5 KMV 1 BHB HBð Þ 12bHBð Þ 1 BNE NEð Þ 12bNEð Þ 1 BE Eð Þ 12bEð Þ 1 BW Wð Þ 12bWð Þ
where b 5 standardized b coefficient (see Table 1 for values).
To create an adjusted score with direct relevance to the reported patient satisfaction scores, the reported scores were multiplied by an adjustment factor that defines the difference between individual hospital- weighted scores and the national mean HCAHPS score across the United States. This formula, the Weighted Individual (hospital) Patient Satisfaction Adjustment Score (WIPSAS), represents a patient
Patient Satisfaction Variance Prediction | McFarland et al
An Official Publication of the Society of Hospital Medicine Journal of Hospital Medicine Vol 10 | No 8 | August 2015 505
satisfaction score adjusted for demographic and struc- tural factors that can be utilized for interhospital com- parisons across all areas of the country.
WIPSAS 5 PSrep 1 1 PSUSA2WIPPSSXð Þ=100½ �
where PSrep 5 patient satisfaction reported score, PSUSA 5 mean reported score for United States (71.84), and WIPPSSX 5 WIPPSS for individual hospital.
Application of Data Analysis
PPSF, WIPPSS, and WIPSAS were calculated for all HCAHPS-participating hospitals and compared with averaged raw HCAHPS scores across the United States. WIPSAS and raw scores were specifically analyzed for New York State to demonstrate exactly how adjustments would change state-level rankings.
RESULTS Complete HCAHPS scores were obtained from 3907 hospitals out of a total 4621 hospitals listed by the Hospital Compare website (85%). The majority of hospitals (2884) collected over 300 surveys, fewer hospitals (696) collected 100 to 299 surveys, and fewer still (333) collected <100 surveys. In total, results were available from at least 934,800 individual surveys, by the most conservative estimate. Missing HCAHPS hospital data averaged 13.4 (standard devia- tion [SD] 12.2) hospitals per state. County-level data were obtained from all 3144 county or county equiva- lents across the United States (100%). Multivariate regression modeling across all HCAHPS dimensions found that between 10 and 16 of the 20 predictors (US Census categories) were statistically significant and predictive of individual HCAHPS dimension
scores and the aggregate percent positive score as demonstrated in Table 2. For example, county per- centage of bachelors’ degrees positively predicts for positive doctor communication scores, and hospital beds negatively predicts for quiet dimension. The strongest positive and negative predictive variables by model regression coefficients for each HCAHPS dimension are also listed in Table 2.
Table 1 highlights multivariate regression modeling
of the composite average positive score, which pro- duced an adjusted R2 of 0.222 (P < 0.001). All varia- bles were significant and predicted change of the composite HCAHPS except for place of birth–foreign
born (not listed in the table). Table 1 ranks variables from most positive to most negative predictors.
Other HCAHPS domains demonstrated statistically
significant models (P < 0.001) and are listed by their
coefficients of determination (ie, adjusted R2) (Table 2). The best-fit dimensions were help (adjusted
R2 5 0.304), quiet (adjusted R2 5 0.299), doctor com- munication (adjusted R2 5 0.298), nurse communica-
tion (adjusted R2 5 0.245), and clean (adjusted R2 5 0.232). Models that were not as strongly predic- tive as the composite score included pain (adjusted
R2 5 0.124), overall 9/10 (adjusted R2 5 0.136), defi-
nitely recommend (adjusted R2 5 0.150), and explained meds (adjusted R2 5 0.169).
A predictive formula for average positive scores was created by determination of the most predictive partial coefficients and the best-fit model. Bachelor’s degree and white only were the 2 greatest positive predictors, and number of hospital beds and non–English speak- ing were the 2 greatest negative predictors. The PPSF (predictive formula) was chosen out of various combi- nations of predictors (Table 1), because its coefficient of determination (adjusted R2 5 0.155) was closest to
TABLE 1. Multivariate Regression of Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) Average Positive Score by County and Hospital Demographics
B SE b t P
Educational attainment–bachelor’s degree 0.157 0.018 0.27 8.612 <0.001 White alone percent 2012 0.09 0.012 0.235 7.587 <0.001 Resident population percent under 18 years 0.404 0.0444 0.209 9.085 <0.001 Black or African American alone percent 2012 0.083 0.014 0.191 5.936 <0.001 Median household income 2007–2011 20.00003 0.00 20.062 22.027 0.043 Population density (log) 2010 20.277 0.083 20.087 23.3333 0.001 Average travel time to work 20.107 0.024 20.088 24.366 <0.001 Educational attainment–high school 20.082 0.026 20.088 23.147 0.002 Average household size 22.58 0.727 20.107 23.55 <0.001 Total females percent 2012 20.423 0.067 20.107 26.296 <0.001 Percent non–English speaking at home 2007–2011 20.052 0.018 20.14 22.929 0.003 No. of hospital beds 20.006 0.00 20.213 212.901 <0.001 Adjusted R2 0.222
NOTE: A multivariate linear regression model of statistically significant dimensions of patient satisfaction as determined by Hospital Consumer Assessment of Healthcare Providers and Systems scores is provided. The dependent variable is the composite of average patient satisfaction scores by hospital (3192 hospitals). Predictors (independent variables) were collected from US Census data for counties or county equivalents. All of the listed predictors (first column) are statistically significant. They are placed in order of partial regression coefficient contribution to the model from most positive to most negative contribution. Adjusted R2 (last row) is used to signify the goodness of fit. Abbreviations: b, standardized b (partial coefficient); B, unstandardized b coefficient; P, statistical significance; SE, standard error; t, t statistic.
McFarland et al | Patient Satisfaction Variance Prediction
506 An Official Publication of the Society of Hospital Medicine Journal of Hospital Medicine Vol 10 | No 8 | August 2015
the overall model’s coefficient of determination (adjusted R2 5 0.222) without demonstrating colinear- ity. Possible predictive formulas were based on the predictors’ standardized b and included the following combinations: the 2 greatest overall predictors (adjusted R2 5 0.051), the 2 greatest negative and pos- itive predictors (adjusted R2 5 0.098), the 4 greatest overall predictors (adjusted R2 5 0.117), and the 8 greatest overall predictors (adjusted R2 5 0.201), which suffered from colinearity (household size plus non–English speaking [Pearson 5 0.624] and under 18 years old [Pearson 5 0.708]). None of the correlated independent variables (eg, poverty and median income) were placed in the final model.
The mean WIPSAS scores closely corresponded with the national average of HCAHPS scores (71.6 vs 71.84) but compressed scores into a narrower distri- bution (SD 5.52 vs 5.92). The greatest positive and negative changes were by 8.51% and 2.25%, respec- tively. Essentially, a smaller number of hospitals in demographically challenged areas were more signifi- cantly impacted by the WIPSAS adjustment than the larger number of hospitals in demographically favor- able areas. Large hospitals in demographically diverse counties saw the greatest positive change (e.g., Texas, California, and New York), whereas smaller hospitals in demographically nondiverse areas saw compara- tively smaller decrements in the overall WIPSAS scores. The WIPSAS had the most beneficial effect on
urban and rural safety-net hospitals that serve diverse populations including many academic medical centers. This is illustrated by the reranking of the top 10 and bottom 10 hospitals in New York State by the WIP- SAS (Table 3). For example, 3 academic medical
TABLE 3. Top Ten Highest-Ranked Hospitals in New York State by HCAHPS Scores Compared to WIPSAS
Ten Highest Ranked New York State
Hospitals by HCAHPS
Ten Highest Ranked New York
State Hospitals After WIPSAS
1. River Hospital, Inc. 1. River Hospital, Inc. 2. Westfield Memorial Hospital, Inc. 2. Westfield Memorial Hospital, Inc. 3. Clifton Fine Hospital 3. Clifton Fine Hospital 4. Hospital For Special Surgery 4. Hospital For Special Surgery 5. Delaware Valley Hospital, Inc. 5. New York–Presbyterian Hospital 6. Putnam Hospital Center 6. Delaware Valley Hospital, Inc. 7. Margaretville Memorial Hospital 7. Montefiore Medical Center 8. Community Memorial Hospital, Inc. 8. St. Francis Hospital, Roslyn 9. Lewis County General Hospital 9. Putnam Hospital Center 10. St. Francis Hospital, Roslyn 10. Mount Sinai Hospital
NOTE: Top 10 highest-ranked hospitals in New York State by overall patient satisfaction out of 167 evalu- able hospitals are shown. The left column represents the current top 10 hospitals in 2013 by HCAHPS over- all patient satisfaction scores, and the right column represents the top 10 hospitals after the WIPSAS adjustment. The 4 factors used to create the WIPSAS adjustment were the 2 most positive partial regression coefficients (education–bachelor’s degree, white alone percent 2012) and the 2 most negative partial regres- sion coefficients (number of hospital beds, non–English speaking at home). Three urban academic medical centers, Montefiore Medical Center, New York Presbyterian Hospital, and Mount Sinai Hospital, were reranked from the 46th, 43rd, and 42nd respectively into the top 10. Abbreviations: HCAHPS, Hospital Con- sumer Assessment of Healthcare Providers and Systems; WIPSAS, Weighted Individual (hospital) Patient Satisfaction Adjustment Score.
TABLE 2. Multivariate Regression of Hospital Consumer Assessment of Healthcare Providers and Systems by County and Hospital Demographics
Average
Positive
Scores
Nurse
Communication
Doctor
Communication Help Pain
Explain
Meds Clean Quiet
Discharge
Explain
Recommend
9/10
Definitely
Recommend
Educational–bachelor’s 0.27 0.19 0.45 0.10 0.10 0.05 0.08 0.33 0.15 0.27 0.416 Hospital beds 2 0.21 20.16 20.19 20.26 20.16 20.17 2 0.27 20.26 20.06 20.11 — Population density 2010 20.09 20.07 20.28 20.20 20.08 20.23 2 0.14 2 0.19 0.22 0.07 * White alone percent 0.24 0.25 0.09 0.16 0.23 0.07 0.16 — 0.17 0.31 0.317 Total females percent 20.11 20.05 20.06 20.07 20.06 20.03 2 0.05 2 0.09 20.12 20.09 — African American alone 0.19 0.19 — 0.09 0.23 0.09 0.07 0.34 * 0.09 0.084 Average travel time to work 20.09 20.10 * 20.09 20.06 20.04 2 0.08 * 20.12 20.17 20.16 Foreign-born percent * 20.16 0.14 20.06 20.12 20.08 0.06 2 0.13 20.18 * * Average household size 20.11 20.05 20.15 20.07 * 20.07 * 2 0.01 * 20.07 0.076 Non–English speaking 20.14 20.12 20.50 20.07 * * * * * 20.34 20.28 Education–high school 20.09 20.09 20.40 * — — — 2 0.27 0.06 20.08 * Household income 20.06 * 20.35 20.08 * * 2 0.16 2 0.41 — — 20.265 Population 65 years and over * 20.14 20.14 20.12 * 20.11 2 0.15 — — * 20.10 White, not Hispanic/Latino * * 20.20 * * * 0.09 0.13 0.09 20.22 20.25 Population under 18 0.21 — 0.15 — 0.08 — — — 0.11 0.20 — Population (county) * 20.06 20.08 * 20.03 20.05 * * 20.06 * * All ages in poverty — — 20.24 — — — 2 0.10 2 0.22 20.08 * 20.281 1 year at same residence * 0.13 0.12 0.11 — — 0.10 * 20.04 * * Per capita income * 20.07 * * * * * 0.09 — — * Population percent change * * * * * * 2 0.05 — — * * Adjusted R2 0.22 0.25 0.30 0.30 0.12 0.17 0.23 0.30 0.19 0.14 0.15
NOTE: Linear regression modeling results of 10 dimensions of patient satisfaction (ie, Hospital Consumer Assessment of Healthcare Providers and Systems [HCAHPS]) and Average Positive Scores (top row) by county demo- graphics and hospital size (left column) are shown. Adjusted R2 (last row) is used to signify the goodness of fit. All models are statistically significant with P 5 <0.001. Partial regression coefficients (b) are used to positively or neg- atively assess contribution to the individual models (ie, each column). The dash (—) indicates nonsignificance and the asterisk (*) indicates a value that was statistically significant in univariate analysis but not in multivariate analysis. Independent variables (first column) are ordered from top to bottom by the number of HCAHPS dimensions that each contributes to HCAHPS predictive scoring.
Patient Satisfaction Variance Prediction | McFarland et al
An Official Publication of the Society of Hospital Medicine Journal of Hospital Medicine Vol 10 | No 8 | August 2015 507
centers in New York State, Montefiore Medical Cen- ter, New York Presbyterian Hospital, and Mount Sinai Hospital, were moved from the 46th, 43rd, and 42nd (out of 167 hospitals) respectively into the top 10 in patient satisfaction utilizing the WIPSAS meth- odology. Reported patient satisfaction scores, PPSF, WIPPSS, and WIPSAS scores for each hospital in the United States are available online (see Supporting Table S1 in the online version of this article).
DISCUSSION The HVBP program is an incentive program that is meant to enhance the quality of care. This study illus- trates healthcare inequalities in patient satisfaction that are not accounted for by the current CMS adjust- ments, and shows that education, ethnicity, primary language, and number of hospital beds are predictive of how patients evaluate their care via patient satisfac- tion scores. Hospitals that treat a disproportionate percentage of non–English speaking, nonwhite, none- ducated patients in large facilities are not meeting patient satisfaction standards. This inequity is not ameliorated by the adjustments currently performed by CMS, and has financial consequences for those hospitals that are not meeting national standards in patient satisfaction. These hospitals, which often include academic medical centers in urban areas, may therefore be penalized under the existing HVBP reim- bursement models.
Using only 4 demographic and hospital-specific pre- dictors (ie, hospital beds, percent non–English speaking, percent bachelors’ degrees, percent white), it is possible to utilize a simple formula to predict patient satisfaction with a significant degree of correlation to the reported scores available through Hospital Compare.
Our initial hypothesis that population density pre- dicted lower patient satisfaction scores was confirmed, but these aforementioned demographic and hospital- based factors were stronger independent predictors of HCAHPS scores. The WIPSAS is a representation of patient satisfaction and quality-of-care delivery across the country that accounts for nonrandom variation in patient satisfaction scores.
For hospitals in New York State, WIPSAS resulted in the placement of 3 urban-based academic medical centers in the top 10 in patient satisfaction, when pre- viously, based on the raw scores, their rankings were between 42nd and 46th statewide. Prior studies have suggested that large, urban, teaching, and not-for- profit hospitals were disadvantaged based on their hospital characteristics and patient features.10–12
Under the current CMS reimbursement methodolo- gies, these institutions are more likely to receive finan- cial penalties.8 The WIPSAS is a simple method to assess hospitals’ performance in the area of patient satisfaction that accounts for the demographic and hospital-based factors (eg, number of beds) of the hos- pital. Its incorporation into CMS reimbursement cal-
culations, or incorporation of a similar adjustment formula, should be strongly considered to account for predictive factors in patient satisfaction that could be addressed to enhance their scores.
Limitations for this study are the approximation of county-level data for actual individual hospital demo- graphic information and the exclusion of specialty hos- pitals, such as cancer centers and children’s hospitals, in HCAHPS surveys. Repeated multivariate analyses at dif- ferent time points would also serve to identify how CMS-specific adjustments are recalibrated over time. Although we have primarily reported on the composite percent positive score as a surrogate for all HCAHPS dimensions, an individual adjustment formula could be generated for each dimension of the patient experience of care domain.
Although patient satisfaction is a component of how quality should be measured, further emphasis needs to be placed on nonrandom patient satisfaction variance so that HVBP can serve as an incentivizing program for at-risk hospitals. Regional variation in scoring is not altogether accounted for by the current CMS adjustment system. Because patient satisfaction scores are now directly linked to reimbursement, further evaluation is needed to enhance patient satisfaction scoring paradigms to account for demographic and hospital-specific factors.
Disclosure Nothing to report.
References 1. Florence CS, Atherly A, Thorpe KE. Will choice-based reform work
for Medicare? Evidence from the Federal Employees Health Benefits Program. Health Serv Res. 2006;41:1741–1761.
2. H.R. 3590. Patient Protection and Affordable Care Act 2010 (2010). 3. Donabedian A. The quality of care. How can it be assessed? JAMA.
1988;260(12):1743–1748. 4. Lake Superior Quality Innovation Network. FY 2017 Value-Based
Purchasing domain weighting. Available at: http://www.stratishealth. org/documents/VBP-FY2017.pdf. Accessed March 13, 2015.
5. Hospital Value-Based Purchasing Program. Available at: http://www. cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/ Hospital-Value-Based-Purchasing. Accessed December 1st, 2013.
6. Jha AK, Orav EJ, Zheng J, Epstein AM. Patients’ perception of hospital care in the United States. N Engl J Med. 2008;359(18):1921– 1931.
7. Porter ME, Lee TH. Providers must lead the way in making value the overarching goal Harvard Bus Rev. October 2013:3–19.
8. Jha AK, Orav EJ, Epstein AM. The effect of financial incentives on hospitals that serve poor patients. Ann Intern Med. 2010;153(5):299– 306.
9. Joynt KE, Jha AK. Characteristics of hospitals receiving penalties under the Hospital Readmissions Reduction Program. JAMA. 2013; 309(4):342–343.
10. Ryan AM. Will value-based purchasing increase disparities in care? N Engl J Med. 2013;369(26):2472–2474.
11. Thorpe KE, Florence CS, Seiber EE. Hospital conversions, margins, and the provision of uncompensated care. Health Aff (Millwood). 2000;19(6):187–194.
12. Borah BJ, Rock MG, Wood DL, Roellinger DL, Johnson MG, Naessens JM. Association between value-based purchasing score and hospital characteristics. BMC Health Serv Res. 2012;12:464.
13. Elliott MN, Zaslavsky AM, Goldstein E, et al. Effects of survey mode, patient mix, and nonresponse on CAHPS hospital survey scores. Health Serv Res. 2009;44(2 pt 1):501–518.
14. Burroughs TE, Waterman BM, Cira JC, Desikan R, Claiborne Dunagan W. Patient satisfaction measurement strategies: a comparison of phone and mail methods. Jt Comm J Qual Improv. 2001;27(7):349– 361.
15. Fowler FJ Jr, Gallagher PM, Nederend S. Comparing telephone and mail responses to the CAHPS survey instrument. Consumer Assessment of Health Plans Study. Med Care. 1999;37(3 suppl):MS41–MS49.
McFarland et al | Patient Satisfaction Variance Prediction
508 An Official Publication of the Society of Hospital Medicine Journal of Hospital Medicine Vol 10 | No 8 | August 2015
16. Rodriguez HP, von Glahn T, Rogers WH, Chang H, Fanjiang G, Safran DG. Evaluating patients’ experiences with individual physi- cians: a randomized trial of mail, internet, and interactive voice response telephone administration of surveys. Med Care. 2006;44(2): 167–174.
17. O’Malley AJ, Zaslavsky AM, Elliott MN, Zaborski L, Cleary PD. Case-mix adjustment of the CAHPS Hospital Survey. Health Serv Res. 2005;40(6 pt 2):2162–2181.
18. Mode and patient-mix adjustments of CAHPS hospital survey (HCAHPS). Available at: http://www.hcahpsonline.org/modeadjust- ment.aspx. Accessed December 1, 2013.
19. Zaslavsky AM, Zaborski LB, Ding L, Shaul JA, Cioffi MJ, Clear PD. Adjusting performance measures to ensure equitable plan compari- sons. Health Care Financ Rev. 2001;22(3):109–126.
20. Official Hospital Compare Data. Displaying datasets in Patient Survey Results category. Available at: https://data.medicare.gov/data/hospital- compare/Patient%20Survey%20Results. Accessed December 1, 2013.
21. Hospital statistics by state. American Hospital Directory, Inc. website. Available at: http://www.ahd.com/state_statistics.html. Accessed December 1, 2013.
22. U.S. Census Download Center. Available at: http://factfinder.census. gov/faces/nav/jsf/pages/download_center.xhtml. Accessed December 1, 2013.
Patient Satisfaction Variance Prediction | McFarland et al
An Official Publication of the Society of Hospital Medicine Journal of Hospital Medicine Vol 10 | No 8 | August 2015 509
- l
- l
• Low-Low confidence that the evidence reflects the true effect. Further evidence is likely to change our confidence in the estimate of effect and is likely to change the estimate. A low rating indicates that there is a high risk of bias and residual confounding.
• Insufficient-A lack of evidence to estimate the effect(s).
Figure 3.1. Process Used to Identify Articles for Review, Pay-for-Performance
Library Search (n= l,891) PubMed P4P Search (n=l,707)
PubMed "Author Name" Search (n=S3) Other (i.e., reference mining, articles
research team had in Endnote libraries from previous reviews {n=l3)
Articles ncluded after ti tle ... and abstract screening
(n= l,314)
Articles screened and categorized by research assistant and senior researcher
(n=S77)
Articles retamed after full screen (n=104)
Articles added based on TEP Articles ncluded recommendations{n=7) _.. ... (simulations and articles that did � ,-
not assess P4P effects) (n=8)
Final Count of Studies Reviewed (n=l03)
Ambulatory (n=48) Hospital {n=38)
Other (e.g. Nursing Home) (n=4) Multiple Settings (n=3)
Liter ature Reviews (n= lO)
Research Questions
Measuring Performance in Value-Based Purchasing Programs
1. What goals should be set and how should success be defined for VBP programs?
As discussed in Chapter Two (environmental scan ofVBP programs), P4P sponsors generally
established goals that were high-level (e.g., "improved health," ''bend the cost curve") and
heavily emphasized clinical quality (27 out of 35 programs). Goals related to cost/affordability
24
Table 3.2. Summary of Studies Examining the Association Between Process and Outcome Measures
Risk-Adjusted or Standardized Outcomes
30-Day Mortality In-Hospital Mortality Complications 30-Day Readmissions 1-Year Survival
# Studies # Studies # Studies # Studies # Studies # Studies # Studies # Studies Non- # Studies Non- Fewer Non- Fewer Non- # Studies Non-
Lower significan Lower significant Complica- significan Read miss significant Better significant Condition-Related Process Measures Mortality t Effect Mortality Effect tions t Effect ions Effect Mortality Effect
AMI I Beta-blocker use at admission 1 1 1 4 1
Beta-blocker use at discharge 2 1 2 1
Aspirin use at admission 1 1 3 1
Aspirin use at discharge 2 2 1 1
ACE inhibitor use at discharge 2 2 1 1
Smoking cessation counseling for smokers during 1 1 admission
Timely reperfusion therapy 1 1
Heparin at admission 1
Intravenous glycoprotein llb/llla inhibitors at 1 admission
Lipid lowering medication at discharge 1
AMI composite measures� 5 1 4:.! 1 1 1 1
CHF I CHF composite measures4 2 1 1 2 1 1 1
Pneumonia I Antibiotics timing 1 1 1 1
Pneumonia composite measures5 2 1 1 2 1 1
Orthopedic Surgery
Composites of SCIP and other process measures0 1 1 1
High Risk Surgical Procedures I Composites of SCIP measures' 18 1
28
1 In one study, significant results were no longer observed when hospital fixed effects were included in the model. 2 In one study, two composites with different weighting of the measures were included in the model. One composite was associated with lower inpatient mortality and one was associated with higher inpatient mortality. 3 Two different AMI process measure composite measures were used. One included five measures: beta-blocker use at admission, beta-blocker use at discharge, aspirin use at admission, aspirin use at discharge, ACE inhibitor use at discharge. The other composite included these measures plus smoking cessation counseling and timely reperfusion therapy. 4 Two different CHF process measure composites were used. One included two measures: ACE inhibitor or angiotensin receptor blocker for left ventricular systolic and dysfunction and assessment of left ventricular function. The other composite included these measures plus smoking cessation counseling and discharge instructions. 5 Two different pneumonia process measure composite were used. One included 3 measures: antibiotics provided within 4 hours or less, pneumococcal vaccination, and oxygenation assessment. The other included these measures plus blood culture prior to antibiotics, appropriate antibiotic, pneumococcal vaccination status, influenza vaccination status, and smoking cessation counseling. 6 Two different process-of-care composite measures were used for orthopedic surgery. One included 6 measures: metabolic complication avoidance index, hematoma avoidance index, readmission avoidance index, antibiotics administered within 1 hour before incision, antibiotics discontinued within 24 hours of surgery, appropriate antibiotic selection. The other included 9 SCIP measures: prophylactic antibiotic received within 1 hour prior to surgery, prophylactic antibiotic selection, prophylactic antibiotic discontinuation within 24 hours after surgery, cardiac surgery patients with controlled 6 AM postoperative glucose, patients with appropriate hair removal, colorectal surgery patients with immediate postoperative normothermia, recommended venous thromboembolism prophylaxis ordered, recommended venous thromboembolism prophylaxis ordered and received, surgery patients on beta-blocker therapy prior to admission who received a beta blocker during perioperative period. 7Two different SCIP measure composites were used. One included 5 SCIP measures: receipt of prophylactic antibiotics within 2 hours of surgery, discontinuation of prophylactic antibiotics within 24 hours of surgery, selection of correct prophylactic antibiotic, ordering of venous thrombosis prophylaxis, ordering of venous thrombosis prophylaxis within 24 hours of surgery. The other included these measures plus cardiac surgery patients with controlled 6 AM postoperative glucose, patients with appropriate hair removal, colorectal surgery patients with immediate postoperative normothermia, recommended venous thromboembolism rrophylaxis ordered and received, surgery patients on beta-blocker therapy prior to admission who received a beta-blocker during perioperative period.
Non-significant effects except abdominal aortic aneurysm, where highest SCIP compliance had lower mortality rates.
29
Table 3.3. Articles Examining Relationship Between Performance on Pay-for-Performance Measures and Patient Outcomes
Reference
Bhattacharyra et al., 20091 1
Setting
Hospital
Study Design
Cross-sectional analysis of correlation between composite quality score for hip and knee surgery and patient outcomes among the subset of the 260 HQID hospitals that participated in the hip and knee portion of the program in 2004/2005 (actual number of hospitals not reported). Hospitals were placed into 1 of 4 tiers based on composite performance score: top 10% (tier 1 ); second decile (tier 2); top 50% but not in top 2 deciles (tier 3); bottom 50% (tier 4 ).
Program Measure(s) Patient Outcome(s)
• Composite measure capturing 3 process measures and 3 intermediate outcome measures
• Data for 4 of the 6 individual measures were only available for those hospitals with performance in top 50% of HQID hospitals
32
• Inpatient mortality after hip and knee arthroplasty
• Iatrogenic complications
• Urinary tract infections
Findings
• Higher-tier hospitals did not have lower complications or urinary tract infections.
• No significant difference in hip and knee arthroplasty associated mortality across the hospital tiers, but was a trend toward a higher rate of mortality in tier 4 hospitals (r = 0.116; p = 0.088).
• All hospitals with mortality > 2.0% were in tiers 3 and 4.
Assessment of Methodological
Quality
Poor: Data on 4 of 6 measures used in composite only available for top 50% of performers. Mortality and complications not available for all hospitals. Limited variability in quality composite led to arbitrary placement into tiers. Lack of control for confounders.
Reference
BradleJ et al., 2006 1
Setting
Hospital
Study Design
Cross-sectional analysis of correlation between CMS/Joint Commission AM I core process measures and hospital-level, risk standardized measures of patient outcomes using January 2002- March 2003 Medicare claims data from 962 hospitals participating in the National Registry of Myocardial Infarction. Hospital level performance was estimated using hierarchical generalized linear models as well as crude process rates. Main analysis included patients transferred out; these were excluded in secondary analyses
Program Measure(s) Patient Outcome(s)
• 7 AMI process measures and a composite quality score
33
• Risk standardized 30- day all-cause mortality
• Risk standardized in hospital mortality
Findings
Assessment of Methodological
Quality
• Risk-standardized 30- Fair day all-cause mortality significantly, but weakly, correlated with beta-blocker at discharge (r=-.16, p<.001 ), aspirin at discharge (r=-.18, p<.001 ), timely reperfusion therapy (r=-.18, p<.001), and the quality composite (r=-.25, p<.001 ), but not with other process measures (beta- blocker at admission, aspirin at admission, ACE inhibitor at discharge, smoking cessation counseling).
• Amount of variation in 30-day mortality explained by process measures ranged from 0.1 % to 3.3%; the measures jointly explained 6% of variation.
• Aspirin at admission was weakly associated with risk-standardized in-hospital, all-cause mortality (r=-.12, p<.05); other measures, including the composite, were not.
Reference Setting
Glickman et al., Hospital 2009
139
Study Design
Assessed association between AMI and CHF process measures and inpatient mortality measures after AMI among 1,351 hospitals participating in Hospital Compare that had at least one patient eligible for AMI measures and one eligible for CHF measures, at least 25 treatment opportunities across all measures, and could be merged with American Hospital Association data on hospital characteristics and Joint Commission data on risk adjusted inpatient mortality after AMI. Hospital-level multivariable logistic regression assessed association for each scoring
�yste_ m with inpatient survival (1-
mpatient mortality) in subsequent year, controlling for hospital-level academic affiliation, geographic l�cation, population density, bed size, presence of percutaneous coronary intervention and cardiac surgery.
Program Measure(s) Patient Outcome(s)
• 8 AMI process measures
• 4 CHF process measures
• Two sets of composite adherence scores assigned different weights to individual measures.
• Opportunity model • Principal
components analysis used to place measures into one of two groups (clinical cardiac activities and administrative cardiac activities). Adherence was calculated with more weight given to measures with greater opportunity for improvement
34
• Risk-adjusted inpatient mortality after AMI
Findings
• In a model with both clinical and administrative cardiac activities composite, higher clinical cardiac activities were associated with higher inpatient survival (OR=1.13, p<.001 ), while higher scores for administrative cardiac activities were associated with worse inpatient survival (OR=0.96, p<.001 ).
• When separate composite measures were included for AM I and CHF, AMI performance was associated with improved survival (OR 1.09, p<.001) while the CHF composite was associated with lower inpatient survival (OR 0.98, p<.05).
Assessment of Methodological
Quality
Poor: Outcome measures was risk adjusted inpatient mortality after AMI, but analyses included quality measures for heart failure patients. In addition, analyses included quality measures for care delivered at discharge, which would not affect inpatient mortality rates
Reference
Jha et al., 200?
140
Setting
Hospital
Study Design
Cross-sectional analyses assessed association between condition-specific composite and morality using Hospital Quality Alliance data from April 1, 2004- March 31, 2005, linked with American Hospital Association data on hospital characteristics and 2003 Medicare Provider and Analysis Review (MEDPAR) discharge data for calculating outcomes. Patients received in transfer or transferred to another hospital were excluded. Patient level multivariable logistic regressions accounting for clustering of patients within hospitals controlling for patient demographics, comorbidities using Elixhauser method, and hospital characteristics were used to estimate the probability of death stratified by hospital's performance on Hospital Quality Alliance measures (by quartiles). The number of hospitals included in analyses ranged from 1,965 for AMI to 3,270 for pneumonia.
Program Measure(s) Patient Outcome(s)
• 10 Hospital Quality Alliance process measures were used to create summary performance scores for three clinical conditions:
• 5 AM I process measures
• 2 CHF process measures
• 3 pneumonia process measures
35
• Risk-adjusted inpatient mortality for patients with primary diagnosis of AMI, CHF or pneumonia
Findings
• Significant trend for lower performance being associated with higher mortality for each condition (AMI p<.001; CHF p=.005; pneumonia p<.001 ).
• Compared with hospitals in the bottom quartile of performance, hospitals in the top quartile had -1 % lower mortality for AMI, 0.4% for CHF, and 0.8% for pneumonia.
• In multivariable analyses, patients discharged from a hospital in top quartile of Hospital Quality Alliance performance for each condition had a lower odds of dying than patients discharged from hospitals in the bottom quartile performance (AMI: OR=0.91, 95% Cl=0.86, 0.96; CHF: OR=0.92, 95% Cl=0.88, 0.98; pneumonia: OR=0.90, 95% Cl=0.86, 0.95 ).
Assessment of Methodological
Quality
Poor: The data used to generate mortality rates predates the data on quality measures, which may not reflect the quality of care delivered at the time of the inpatient mortality data. Quality composites used in analyses included measures of care delivered at discharge, would not affect inpatient mortality rates.
Reference
Jha et al., 2011
111
Setting
Hospital
Study Design
Cross-sectional analysis of relationship between hospital quality of process-of-care measures, costs and mortality using the 2007 Hospital Compare
data, 2005 MEDPAR data linked with the 2005 Medicare Beneficiary file, 2007 American Hospital Association data, 2007 information on hospital-specific cost-to-charge ratios, disproportionate share hospital (DSH) index
a
and ratio of interns and residents to beds, 2007 Area Resource File with county-level socioeconomic information, and the 2008 Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey. Hospital-level risk-adjusted cost ratios (actual to expected costs), quality composite scores, mortality rates, and HCAHPS scores were estimated. Four groups of hospitals were identified: those in the highest quartile of performance and lowest quartile of cost (best), those in the lowest quartile of performance and highest quartile of costs (worst), those in the highest quartile of performance and highest quartile of costs, those in the lowest quartile of performance and lowest quartile of costs.
Program Measure(s) Patient Outcome(s)
• Process-of-care measures for AMI, CHF, pneumonia and prevention of surgical complications.
• Summary scores were created for each condition using the Joint Commission's methodology for those hospitals.
36
• 30-day risk adjusted mortality rate for patients hospitalized with AMI, CHF, and pneumonia.
Findings
Assessment of Methodological
Quality
• AMI patients admitted Fair
to low-quality hospitals had a higher probability of death than those admitted to the "best" hospitals (low cost, low quality OR=1.12; high cost, low quality OR=1.1 O; analysis of variance p value= .005).
• Pneumonia patients also had a higher probability of death when admitted to low quality hospitals (low cost, low quality OR=1.19; high cost, low quality OR=1.07;
analysis of variance p value<.001 ).
• No significant difference observed for CHF.
Reference Setting
Krumholz et al., Hospital 2013
141
Study Design Program Measure(s) Patient Outcome(s)
30-day readmissions and 30-day Not applicable mortality were identified for a cohort of aged Medicare beneficiaries with an index hospitalization with a primary diagnosis of AMI, CHF, or pneumonia between July 1, 2005, and June 30, 2008. 30-day all- cause risk-standardized readmission rate (RSRR) and risk-standardized mortality rate (RSMR) were estimated for each hospital using hierarchical logistic regression models that adjusted for patients demographic and clinical characteristics and accounted for patient clustering within hospitals, and had hospital- specific random effects. For each condition, hospitals were considered high performers if they were in the lowest quartile for RSMR and RSRR and lower performers if they were in the highest quartile for both. Analysis included 4506 hospitals for AMI, 4767 hospitals for CHF, and 4811 hospitals for pneumonia.
37
For AMI, CHF, and pneumonia
• 30-day all-cause risk-standardized mortality rates (RSM Rs)
• 30-day, all cause, risk standardized readmission rates (RSRRs)
Findings
Assessment of Methodological
Quality
• Overall, there was no Good
association between RSMR and RSRRs for AMI or pneumonia.
• There was a negative association between RSMRs and RSRRs for CHF (r=-.17, 95% Cl -.20 to -.14).
Reference Setting
Nicholas et al., Hospital 2010
133
Study Design
Cross-sectional analysis of SCIP measures reported on Hospital Compare data Jan 1, 2005-Dec 31, 2006, and patient outcomes derived from MEDPAR data for
patients with 1 of 6 high-risk surgical procedures (abdominal aortic aneurysm repair, aortic valve repair, coronary artery bypass graft, esophageal resection, mitral valve repair and pancreatic resection) using hierarchical linear models to assess associations. Models controlled for hospital-level procedure volume and patient characteristics and comorbidity using the Charlson comorbidity index, whether the admission was scheduled, emergent or urgent, zip code-level median income, year of admission and hospital random effects. Hospitals were placed in low (bottom quintile of performance), medium (middle three quintiles of performance) and high (top quintile of performance) compliance groups based on opportunity composite score. Analyses included 2,189 hospitals.
Program Measure(s) Patient Outcome(s)
• 2 SCIP measures
in 2005:
• An additional 3
measures were included in 2006
• An opportunity composite score was created
38
• 30-day risk adjusted postoperative mortality rate, venous thrombo embolism, and surgical site infection.
Findings
Assessment of Methodological
Quality
• In univariate analyses, Good
there were no significant associations between process measures and mortality except for aortic valve replacement where
hospitals with highest SCIP compliance had lower mortality rates.
• In multivariate analyses, neither high nor low compliance hospitals were significantly different from hospitals with middle compliance; nor did high and lower
compliance hospitals have different mortality rates from one another.
• Unadjusted complication rates were lower among hospitals in the lowest compliance quintile than those in the
highest compliance quintiles. Results were not significant in multivariate analyses.
Reference Setting
Peterson et al., Hospital 2006
125
Study Design
The association between process-of-care measures for patients presenting with symptoms consistent with acute coronary syndrome to 350 hospitals participating in the "Can Rapid Risk Stratification of Unstable Angina Patients Suppress Adverse Outcomes with Early Implementation of the American College of Cardiology/American Hospital Association Guideline" (CRUSADE) National Quality Improvement Initiative between January 1, 2001, and September 30, 2003, and in-hospital mortality was examined using Pearson correlation coefficients and Cochran-Armitage test for trend. Adjusted mortality rates were estimated using hierarchical generalized linear mixed models adjusting for patient characteristics, comorbid conditions, and a patient's propensity to be treated at a top quartile center.
Program Measure(s) Patient Outcome(s)
• 9 cardiac process of-care measures
• Opportunity model composite was created
39
• In-hospital mortality
Findings
Assessment of Methodological
Quality
• Improved performance Fair
on process measures was significantly, though modestly, associated with lower in-hospital mortality (ranging from -.12 to - .36) (p<.05) except for beta blocker within 24 hours and beta-blocker at discharge, which were not significant.
• Composite measure of quality was negatively associated with in hospital mortality (r= .30, p<.001 ).
• The adjusted in hospital mortality rate for hospitals in the top quartile was 6.31 % versus 4.15% for hospitals in the 4th quartile (OR=0.81, p<.001).
Reference Setting
Popescu et al., Hospital 2009
142
Study Design
The association between AMI process measures 2004-2006 and risk-adjusted 30-day mortality for 2005 was assessed for 2761 hospitals reporting AMI measures
to the Hospital Compare database. Hospitals were categorized as high adherence (top decile of performance on AMI measures for 3 consecutive years), low adherence (lowest decile of performance for 3 consecutive years), or intermediate performance (all other hospitals in sample). 30-day mortality rates for AMI patients were estimated using multivariable mixed models controlling for patient sociodemographic characteristics and comorbidity as well as hospital random effects.
Program Measure(s) Patient Outcome(s)
• 5 AMI process measures
• Opportunity model
composite was created
40
• 30-day mortality
Findings
Assessment of Methodological
Quality
• Mean AMI Fair
performance varied significantly across the three groups p<.001 ).
• Low-performing
hospitals had higher unadjusted 30-day mortality rates (23.6% vs. 17.8% vs. 14.9%, p< 0.001).
• Differences persisted after adjusting for patient characteristics
(16.3% vs. 16.0% vs. 15.7%; P 0.02).
Reference Setting
Quattromani et Hospital al., 2011
143
Study Design
Cross-sectional analysis of 95,704 adult emergency department admissions with a principal diagnosis of pneumonia from 530 hospitals in the 2007 Hospital Healthcare Cost and Utilization's National Inpatient Sample linked with hospital-level data on the timely receipt of antibiotics and American Hospital Association data. Hospitals were placed in quartiles based on their timely receipt of antibiotics performance. A population averaged logistic regression model controlled for patient demographics and comorbid conditions, weekend admission, and accounting for correlation of patients within hospitals.
Program Measure(s) Patient Outcome(s)
• Receipt of first dose of antibiotics within 4 hours of arrival at hospital
41
• All-cause inpatient mortality
Findings
Assessment of Methodological
Quality
• No significant Fair
associations found; compared with the lowest-performing hospitals, the risk- adjusted OR of mortality was 0.89 (95% Cl= 0.77 to 1.02) in the highest performing time-to first-antibiotic-dose quartile, 0.94 (95% Cl
= 0.82 to 1.08) in the second quartile, 0.91 (95% Cl= 0.79 to 1.05) in the third quartile.
Reference
Ryan et al., 2009
78
Setting
Hospital
Study Design
Medicare inpatient claims and Hospital Compare process-of care measures for 2004-2006 were used to assess relationship between the process measures and risk-adjusted patient outcomes. One model estimated the relationship between performance and the log of risk adjusted mortality, controlling for hospital characteristics, year and hospital characteristics - year interactions. The second model included hospital fixed effects to capture unobserved characteristics as well as year and hospital characteristics interacted with year. Excluded from analysis were transfer patients and hospitals with less than 10 patients for each measure.
Program Measure(s) Patient Outcome(s)
• 5 AM I process measures
• 2 CHF process measures
• 3 pneumonia process measures
• Two methods for creating composites were used:
• The weighted sum
of z-scores for process measures for each diagnosis
• The z-score of the unweighted sum of each process measure for each diagnosis
42
• Risk-adjusted 30-day mortality for AMI, CHF, and pneumonia
Findings
Assessment of Methodological
Quality
• Based on the models Good
with hospital characteristics, a one standard-deviation increase in process measure composite was associated with a 9% reduction in mortality for AMI (p<.01 ), 1.5% reduction for CHF (p<.05) and 1.9% reduction for pneumonia (p<.01 ).
• Associations no longer significant when hospital fixed effects included in the models.
• These results are supported by finding that while small process performance improvements from 2004 to 2006, there were not similar changes in mortality.
Reference
Stefan et al., 2013
132
Setting
Hospital
Study Design
The association between Hospital Compare process quality measures and 30-day readmission for patient with AMI, CHF, or pneumonia and those
undergoing major surgery in 2007 was examined using Spearman rank correlations. Data were obtained from the Quality Improvement Organization Clinical Data Warehouse. 30-day readmission rates were estimated using the same technique as CMS for the Hospital Compare website, with hierarchical generalized linear models accounting for patient clustering within hospitals, adjusted for patient characteristics, zip-code level median income, comorbidities, discharge disposition, number of admissions in previous year, and length of stay relative to median length of stay for that condition. A ratio of predicted to expected readmission rate was calculated for each hospital for each condition. Hospitals were placed into quartiles based on
performance score for each condition and the absolute difference in mean risk standardized readmission rates of hospitals in the highest and lowest quartiles of performance calculated.
Program Measure(s) Patient Outcome(s)
• 8 AMI process measures
• 7 pneumonia
process measures
• 4 CHF process measures
• 9 SCIP measures
• Two sets of composite adherence scores used. (1) an opportunities composite and (2) an appropriate care composite (i.e., did patients receive all care processes for which they were
eligible?)
43
• Condition
specific 30-day risk standardized readmission rate (only for those also included in process-of-care measures)
Findings
Assessment of Methodological
Quality
• Higher performance Good scores were significantly, but weakly correlated with lower readmission rates for pneumonia (r=-.07, p<.0001 ), AMI (-.10, p<.0001) and orthopedic surgery (r=- .06, p<.003), but not heart failure, abdominal surgery or cardiac and vascular surgery.
• Results very similar whether opportunity model or appropriate care composite used.
• Multivariable models with process measures and hospital characteristics explained a very small amount of total variation in hospital level readmission rates.
• The difference in mean
risk-standardized readmission rates between hospitals in the 1st and 4th
quartiles of process performance significant for AMI, but difference in readmission rates only 0.3 percentage points.
Reference
Werner and Brad low, 2006
135
Setting
Hospital
Study Design
Examined correlation between Hospital Quality Alliance 10 measure starter set from Hospital Compare for 2004 and hospital- level patient outcomes calculated
using 2004 MEDPAR data and risk adjusted using the Elixhauser method, patient characteristics, and whether the admission was emergent or elective in 3657 hospitals using. Hospitals were grouped into thirds based on average 1-year risk-adjusted mortality rate for each condition. A Bayesian approached was used to assess relationship between composite measures, individual performance measures and condition-specific outcomes. The relationship between hospital performance and outcomes were estimated controlling for hospital characteristics.
Program Measure(s)
• 5 AMI process measures
• 2 CHF process measures
• 3 pneumonia process measures
• Two composite measures created
• Opportunity model composite
• An "all or none" measure that identified hospitals that performed above the 75th percentile on every measure they reported and hospitals that performed below the 75th percentile on every measure reported
44
Patient Outcome(s)
• Condition- specific inpatient mortality
• Condition
specific 30-day mortality
• 1-year risk adjusted mortality rates
Assessment of Methodological
Findings Quality
• Adjusting for hospital Good characteristics, hospitals in the 75th percentile had significantly lower inpatient mortality than those performing in the 25th percentile for each condition's composite measure and most of the individual measures.
• The absolute risk reduction (ARR) was small, ranging from .001 for CHF to .005 for both AMI and pneumonia.
• Results were similar for 30-day mortality.
• Results for 1-year mortality were significant for AMI and pneumonia, but not for CHF.
• Comparing hospitals performing above the 75th percentile on all measures to those performing below the 25th percentile on all measures, the ARR for AMI ranged from 0.008 (p=.06) for inpatient mortality to 0.18 (p=.008) for 1-year mortality.
• The ARR for pneumonia was .014 (p<.001) in inpatient mortality, .003 (p=.00) for 30 day mortality and 0.13 (p<.001) for 1 year mortality.
Reference Setting
Kralewski et al., Ambulatory 201i 38 care
Study Design
Cross-sectional study of 133,703 Medicare patients with diabetes treated by 234 group practices in 2009. Patients were attributed to the practice where they received the plurality of their care. Claims data were used to assess lab testing, emergency department use, hospitalizations and total costs. Practice structural characteristics were obtained from the 2009 practice survey of the Medical Group Management Association. Regression analysis was used to assess association between measures and risk adjusted outcomes.
Program Measure(s) Patient Outcome(s)
• LDL lab test during the past year
45
• Inappropriate emergency department use
• Avoidable hospitalizations
• Costs per patient with diabetes
Findings
• LDL testing for an additional one percentage point of diabetics in the practice was associated with reduced per capita costs of $51 (p<.001 ), fewer primary care treatable emergency visits (p<.001) and few avoidable hospitalizations (p<.001).
Assessment of Methodological
Quality
Fair
Reference Setting
Ryan and Ambulatory Doran, 201i 37 care
Study Design
Retrospective analysis of the amount of improvement in incentivized intermediate outcomes was a result of improvements in incentivized process measures for diabetes, coronary heart disease, stroke, epilepsy, and hypertension using 2004-2008 data from a panel of family practices participation in the UK's Quality Outcomes Framework. Data on practice performance was linked to patient and practice characteristics and community-level Index of Deprivation. The number of included practices ranged from 3864 (epilepsy) to 6822 (diabetes). "Opportunities model" composite measures were created for each year separately for process and outcomes measures for each condition for each practice. Longitudinal fixed effects models controlling for composite process components performance for all other conditions and year fixed effects were used to estimate the extent to which improvements in incentivized outcomes were due to improvements in incentivized process measures. Separate models were run for each diagnosis. Standard errors accounted for clustering at the practice level.
Program Measure(s) Patient Outcome(s)
• 10 diabetes process measures
• 5 coronary heart disease process measures
• 3 stroke process measures
• 2 epilepsy process measures
• 1 hypertension process measure
46
• Intermediate outcomes
• 4 for diabetes • 2 for coronary
heart disease • 2 for stroke • 1 for epilepsy • 1 for
hypertension
Findings
• A 10 percentage point increase in process composite was associate with an increase in the outcome performance of 3.16 percentage points for diabetes, 4.32 percentage points for coronary heart disease, 7.60 percentage points for stroke, 7 .24 percentage points for epilepsy and 7 .16 percentage points for hypertension.
• The amount of increase in the outcome composite due to the change in the process composite was 29.6% for diabetes, 25.6% for coronary heart disease, 34.7% for stroke, 29.1 % for epilepsy, and 17.7% for hypertension.
Assessment of Methodological
Quality
Good
Reference
Sidorenkov et al., 2011
136
Setting
Multiple settings
Study Design
Systematic review of literature indexed on MEDLINE and Embase up through May 1, 2010, that focused on relationship between quality indicators and
outcomes for diabetes care. Studies were classified as high, medium, or low quality. 24 studies were identified, 17 of which evaluated intermediate outcomes. Of the studies assessing "hard" outcomes, 3 were cohort and 4 were case control studies
Program Measure(s)
• Adequate drug
treatment
• visits and exams
• HbA 1 c tests
• other or composite tests/exams
47
Patient Outcome(s)
• Hospitalizations
• Treatment related complications,
• Disease-related
complications, hospital
• Readmissions,
• Microvascular complications or lower extremity amputations
• Macrovascular
complications
• Death
• Composite physical and/or
mental health score
Assessment of Methodological
Findings Quality
• Few associations Good
between process measures and outcome measures were identified. One study showed adequate drug treatment of patients
hospitalized for diabetes was associated with fewer treatment-related complications, but another study
144 found no association with readmission rates.
• A medium-quality cohort study found HbA1c testing was
associated with decreased macrovascular complications and kidney disease, but not microvascular complications or death.
145
• Lipid testing was associated with fewer lower extremity complications, while eye exams were not.
• A high-quality study
showed a composite measure that captured HbA 1 c testing, eye exams, LDL screening and nephropathy monitoring was associated with better
mental health status but not physical health status as measured by the SF36.
146
Reference
Werner et al., 2013
74
Setting
Nursing home
Study Design
Assessed the extent to which changes in nursing home process measures account for changes in outcome measures among 16,623 nursing homes reporting data from 2000 to 2009 for the Online Survey, Certification, and Reporting and nursing home Minimum Data Set. Analyses included facility fixed effects, time-varying facility characteristics, indicator for quarter of the year to capture seasonal effects, and quarter interacted with process measures.
Program Measure(s) Patient Outcome(s)
• 6 process measures focused on pain management, written bladder training program, preventive skin care, receiving tube feeds, mechanically altered diets, assist devices while eating
• 4 outcome measures focused on long stay residents with moderate or severe pain, catheter inserted and left in their bladder, pressure sores, or significant weight loss
Findings
Assessment of Methodological
Quality
• Approximately one- Good third of the improvements in the percentage of nursing home patients in moderate or severe change were due to changes in process measures.
• None of the improvements in other outcome measures appeared to be related to improvement in process measures.
NOTE: Not all of the studies listed in the table were conducted in the context of a P4P experiment; rather, the measures that were the focus of the study are typically found within P4P programs. a DSH hospitals are those that receive compensation through Medicare for treating a disproportionate number of indigent patients.
48
Reference
Amundson et al., 2003
30
An et al., 2008
49
Armour et al., 2004
32
Table 3.4. Evidence on Effectiveness of Physician and Physician Group Pay-for-Performance Programs
Program Description
Health Partners P4P focused on tobacco Ask and Advice rates from 1996 to 1999
Collaborative project between Fairview Physician Associates and multiple Minnesota health plans to encourage referrals to health plan sponsored quit line from 2005 to 2006
Large managed care health plan operating in the southeastern United States implemented a year end bonus program that was designed, in part, to improve colorectal cancer screening use among an individual practice association's PCPs from a 1 0-month period across 2001- 2002
Study Design
Longitudinal study of participants
RCT of usual care vs. P4P for quit line referrals
Pre-post study of P4P cohort
Incentive Structure
Bonus pool
Clinic receives $5,000 for 50 quit line referrals
Bonus payment
Measures Examined
Process:
Documentation and discussion of tobacco use
Process: Rates of referral; contact and enrollment after referral; and project costs
Process: Colorectal cancer screening
62
Findings
Process:
Mean ask rate increased from 49% to 73%
Advise rate increased from 32% to 53%
Process: 11.4% of smokers were referred in P4P group compared with 4.2% in the control group (p=0.001)
Process: From 2000 to 2001, colorectal cancer screening use increased from 23.4% to 26.4% (p< 0.01).
Assessment of Methodological
Quality
Poor: Regional population, no modeling to control for confounders
Fair
Poor: Short study period, cross sectional with limited controls
Reference
Bardach et al., 2013
147
Program Description Study Design
P4P experiment Cluster-RCT, between April 2009 84 small and March 2010 primary care among small primary practices. care practices (<10 Intervention physicians) in New received York City. incentives and In addition to financial quarterly incentives, clinics performance were provided with reports, while EHR software with control received decision-support and only patient registry performance functions and QI reports. specialists that offered One-year technical assistance. evaluation.
Incentive Structure
Incentive paid to the clinic/practice.
Incentive paid for every instance of patient meeting the quality criteria. Higher incentive payments given for patients who were sicker, had Medicaid insurance or were uninsured.
Bonuses were a maximum of $200/patient and $100,000/clinic
Range of payments was to clinics was $600-$100, 000 (median $9,900).
Measures Examined
Process: Aspirin or anthrombotic prescription
Smoking cessation
Outcomes: Blood pressure control
Cholesterol control
63
Findings
Process: Adjusted change in performance significantly higher in the intervention group than controls for aspirin or antithrombotic prescription by 6.0% (p=0.001 )for patients with ischemic vascular disease or diabetes
Outcomes: Adjusted change in blood pressure control significantly higher in the intervention group than control by
• 5.5% (p=0.01) among patients with only hypertension
• 7 .8% among patients with hypertension and diabetes
• 7 .8% (p=0.01) for patients with hypertension, diabetes and ischemic vascular disease
No difference in cholesterol control (p=0.22)
Changes were higher for uninsured or Medicaid patients in intervention clinics compared with controls, except for cholesterol control.
Assessment of Methodological
Quality
Good: Randomized study design, although short study duration.
Findings may not generalizable to larger practices or those without EHRs or QI assistance.
Reference
Beaulieu and Horri�an 2005
Chen et al., 2010a50
Program Description
In 2001, a managed care organization in upstate New York designed and implemented a pilot program to financially reward doctors for the quality of care delivered to diabetic patients across an 8- month period.
P4P program initiated by preferred provider organization (PPO) in Hawaii from 1998 to 2007
Study Design
Pre-post with comparison group
Compared pre post changes of intervention group to comparison group in a different state
Incentive Structure
Incentive payment equivalent to a 12% increase in PMPM reimbursement if performance goals are met
Additional 1.5- 7 .5% of base salary to perform processes of care
Measures Examined
Process: 6 measures of diabetes care quality
Outcome: 3 diabetes outcome measure
Process: ACE inhibitor use among CHF patients, mammography, cervical cancer screening, colorectal cancer screening, HbA 1 c testing for diabetes, the varicella vaccine, and the measles, mumps, rubella (MMR) vaccine
64
Findings
Process: Physicians and patients achieved significant improvement on five out of six process measures.
Outcome: Physicians and patients achieved significant improvement on two out of three outcome measures (HbA1c control and LDL control).
Process: P4P group had significantly greater increases in quality scores than the comparison group for cervical cancer screening and HbA 1 c testing.
P4P group had significantly greater increases than the non P4P group in quality scores for mammography and varicella for the 2nd to 3rd year.
P4P group improved less than the non-P4P group for colorectal cancer screening every year, except from the 3rd to the 4th year
Assessment of Methodological
Quality
Poor: Small number of study participants (n= 17 physicians). Physicians self selected; one small region, short duration, physicians not matched at baseline. Comparison patients had higher baseline performance on all measures
Fair
Assessment of Incentive Methodological
Reference Program Description Study Design Structure Measures Examined Findings Quality
Chen et al., PPO in Hawaii Longitudinal 1.5-7.5% of base Process: Process: Fair 2010b
48 provided incentives to study salary to perform Diabetes processes of care Improved diabetes quality care physician to improve comparing processes of care Outcome: compared with non-P4P quality and reduce participating Hospitalizations participating physicians among hospitalizations from practices with patients who saw p4P providers 1999 to 2006 nonparticipating throughout entire study period
practices (OR=1.20; 95% Cl, 1.05-1.37, p<0.01).
Reduction in hospitalization for patients who saw p4P providers throughout entire study period
Chen et al., Health plan in Hawaii Longitudinal Bonus of 3.5% of Process: Process: Fair 2011
149 incentivizes multivariate professional fees LDL testing, statin P4P group improved (32%-70%) participating regression prescribing compared with non-P4P group physicians additional models (40%-61 %) on quality composite payments to improve comparing 2 cardiovascular participants to disease quality nonparticipants measures from 2000 to 2006
Chien et al., New York Medicaid Difference-in- $200 bonus Process: Process: Good: Regional but 2010
22 nonprofit plan differences payment for each 2-year old immunizations Immunization rates within multiple years of implemented a P4P comparing fully immunized 2- Hudson Health Plan rose at a observation and program that participants and year-old significantly, albeit modestly, strong difference and incentivized nonparticipants higher rate than the robust difference design immunization delivery pre-post secular trend noted among to 2-year-olds from comparison health plans. 2003 to 2007
65
Reference
Chien et al., 2012b69
Chunj et al., 2003
Chung et al., 2010a 103
Program Description
New York Medicaid nonprofit plan implemented a P4P program that incentivized improvements in diabetes care and outcomes in 2003- 2007
Voluntary P4P program implemented by a health plan in Hawaii from 1997 to 2000.
RCT of the effects of the frequency of a P4P bonus on performance in Palo Alto Medical Foundation over the course of a 1-year study period.
Study Design
Difference-in differences comparing participants and nonparticipants pre-post
Time trend of participants
RCT
Incentive Structure
$100-$300 bonus payments for each patient completing all the missing care processes
3.5% above base fees
Bonus payment of up to 2% of base salary
Measures Examined
Process: Diabetes quality measures (HbA1c testing, lipid testing, dilated eye exams, lipid control)
Outcome: Diabetes outcome measures (e.g., BP and HbA1c and LDL levels)
Process: Use of ACE inhibitors or angiotensin receptor blockers in CHF, measurement of HbA1c in diabetes, and rates of childhood immunizations
Process: Six process measures (prescription of asthma controller, cervical cancer screening, chlamydia screening, colon cancer screening, whether the height and weight were measured and recorded, and documentation of tobacco use history)
Outcome: 3 outcome measures for diabetes control (BP 130/BOmmHg, HbA1co7%, and LDLo100 mg/dl)
66
Findings
Process: Between pre- and post intervention periods, changes on available diabetes measures were not statistically significant
Outcome: Changes in diabetes outcome measures were not statistically significant when compared with non-Hudson plans
Process: ACE inhibitor rate increased from 40.8 to 64.2% for CHF patients (p<0.001)
HbA 1 c testing increased from 51.5 to 79.6% (P<0.0001)
MMR immunization rates varied and no consistent tend could be identified
Process: Frequency of bonus payment did not affect process or outcome measures.
Assessment of Methodological
Quality
Good: Regional but multiple years of observation and strong difference and difference design
Poor: No contemporaneous control group, case study only
Fair
Assessment of Incentive Methodological
Reference Program Description Study Design Structure Measures Examined Findings Quality
Chung et al., P4P program within Pre-post Bonus payment of Process: Process: Poor: Single practice 2010b
33 single clinic in comparison of up to 2% of base 5 measures related to From 2006 to 2007, 8 of 9 no comparison group California from 2005 participants salary screening, asthma incentivized and previously to 2007 medication prescribing, and reported measures showed
prevention significant improvement (mix of process and outcome measures)
Coleman et A large federally Pre-post Reduction in base Process: Process: Poor: Single al., 2007
27 qualified health center comparison of salary couple with Avg. annual # of encounters From 2003 (pre-P4P) to 2004 organization, no implemented single practice bonus payments for per diabetic patient, % (1st year P4P), significant comparison group, incentives for absolute meeting diabetic patients with any increase (16.2%) in biannual and relatively short performance and productivity goals HbA1c test, HbA 1 c testing for diabetic time frame improvement on Outcome: patients (p<0.001) process and outcome % diabetic patients with Outcome: measures in 2004. recommended number of No significant improvement in
HbA 1 c tests, % diabetic blood sugar control (HbA 1 c< 7 or patients with controlled HbA 1 c <9) in ACCESS patients blood sugar (HbA1c <7, or Medicaid patients from NCQA HbA1c<9). dataset (OLS p=.1639)
Collier, 2007�� A community health Pre-post Bonus Structure: Structure: Poor: Only a single care system comparing 24/7 access to care, Almost all of the measures were organization, and implemented a P4P participants to maintaining at most an 18:1 accomplished analytic methods program for 12 nonparticipants physician to patient ratio, Process: poorly explained hospitalists on a range dictating medical records Although the contracted group of structural, process, within 12 hours and did not consistently meet all Joint and utilization providing discharge Commission/CMS targets, measures from 2003 summaries within 24 hours, compliance with most quality to 2006 attending monthly hospital indicators improved to a greater
meetings, and having extent than a concurrent non- membership in the Society contracted group. of Hospitalists
Process: CMS/Joint Commission process measures
67
Assessment of Incentive Methodological
Reference Program Description Study Design Structure Measures Examined Findings Quality
Curtin et al., P4P program that was Pre-post cost 10% salary Costs: Costs: Poor: Single entity 2006
3 a 5-year partnership analysis withhold returned Costs PMPM Positive return on investment of and "benefit" (2000-2004) between focused on when goals are met Return on investment 1.6:1.0 in 2003 and 2.5:1.0 in measured simply as Excellus health plan return on 2004 pre-post comparison. and a Rochester, New investment Little analytic work to York, independent deal with confounding practice association factors.
Cutler et al., IHA program is a Cross sectional Bonus above base Process: Process: Poor: Short study 2007
28 state-wide P4P (2004) PMPM capitation LDL testing and control for Higher proportion of patients in period, cross- program providing comparison of payment patients with diabetes P4P group who attained LDL-C sectional, no controls physician groups with participants and goal (<130 mg per dl) those in for confounding bonuses for meeting nonparticipants the routine care (78.2% vs. factors. patient experience, 55.7%, p<.001). process, and outcome Higher rate of achieving a LDL-C measure. This study <100 mg per dl than those in the focuses on Mercy routine care group ( 46.7% vs. Medical Group. 35.2%, p =.004)
Fagan et al., Intervention by Longitudinal Bonus payment up Process: Process: Good: Relatively 2010
40 national managed (2004-2006) to 20% of the 5 incentivized quality Quality of care generally large region, care organization to study in which capitation fee for measures (influenza improved for both groups during difference-difference provide P4P bonus pre- and post- Medicare managed vaccine, HbA1c testing, eye the study period. Only slight design to control for payments to 9 PCP data from care organization exam, LDL screening, and differences were seen between time invariant practices for meeting intervention patients nephropathy screening), 2 the intervention and comparison confounders. quality of care compared with non-incentivized measures group trends and changes in measures comparison (avoiding short-acting trends over time.
practices antihypertensive and Costs: prescribing an ACE/
No significant differences were angiotensin receptor blocker medication for diabetics with
observed in the average total
renal insufficiency) medical cost trends per member per month (p=.42) between P4P
Costs: and non-P4P members with Emergency department diabetes from baseline to follow- utilization, and total paid up costs
68
Assessment of Incentive Methodological
Reference Program Description Study Design Structure Measures Examined Findings Quality
Fairbrother et RCT of 57 inner-city RCT $1,000-$7,500 Process: Process: Fair al., 2001
23 physicians bonus depending Up-to-date immunization Both the bonus and the randomized to a P4P on improvement coverage enhanced FFS groups improved bonus.enhanced- level significantly in documented up-to- FFS, or control group date immunization status (Bonus: in 1997-1998 49.7 to 55,6%, p<0.05; Enhanced
FFS: 50.8 to 58.2%, p<0.01) compared with the control group.
Steady increases, but no significant difference in number of well child visits.
Improvement was due primarily to improved documentation rather than actual vaccines given. Missed opportunities (when vaccines were due but not given) did not change.
Felt-Lisk et al., 5 Medicaid health Pre-post Bonus payments Process: Process: Fair 2007
44 plans that changes in based on the % of plan members with 6 or From pre-implementation (2002 implemented P4P participants with number of patients more well-baby visits by age to 2003) to post implementation programs from 2002 a limited receiving well-baby 15 months (2004 to 2005), 2-year average to 2005 comparison to visits HEDIS scores improved 7.5-27
national trends percentage points. Large effects not seen in 4 of 5 plans.
Gavagan 51
et Rewarding Results Longitudinal $4,000-$12,000 Process: Process: Fair al., 2010 Collaborative analysis with bonus payment Preventive care (cervical Found no evidence for a clinically
Demonstration: comparison depending on cancer screening, significant effect of financial Physicians at 6 of 11 group performance mammography, pediatric incentives on performance of clinics were given immunization) preventive care incentives for achieving group targets in preventive care.
69
Assessment of Incentive Methodological
Reference Program Description Study Design Structure Measures Examined Findings Quality
Gilmore et al., P4P program Compared Bonus of 1 %-5% of Process: Process: Fair 2007
25 providing bonuses to changes over base professional 11 process measures Positive association between individual physicians time between fees related to screening, care for having seen only program- for absolute participating diabetes, hypertension, participating providers and performance on physicians and asthma, CHF, and high receiving recommended care for patient experience, nonparticipating cholesterol, prevention all 6 years recommended care for structural, quality and physicians all 6 years (OR: 1.09, 95%: practice pattern 1.072-1.10). measure from 1998 to 2003
Greene et al., Large, multifaceted QI Pre-post no 15% payment Process: Process: Poor: No comparison 2004
35 intervention consisting comparison withhold returned Overall exceptions per 1,000 A statistical process control chart group and no of physician group based on episodes, acute sinusitis showed a shift toward apparent controls for education, profiling, performance care pathway exceptions per recommended treatment patterns confounding factors. and a financial 1,000 episodes, services per after our intervention. incentive, to improve 1,000 episodes of acute treatment quality for sinusitis
acute sinusitis in Rochester from 1999 to 2001
Hung and AHRQ health Cross-sectional Unclear Process: Process: Poor: Single year, Green 2012
31 promotion initiative comparison of Smoking cessation Practices that were involved with small sample size, offering incentives to participants and counseling, linking patients P4P had greater odds of offering and limited controls PCPs to improve on nonparticipants to smoking cessation recommended cessation for confounding smoking cessation services in community counseling (OR= 27.6, p <0.01) factors. measures
70
Assessment of Incentive Methodological
Reference Program Description Study Design Structure Measures Examined Findings Quality
Larsen et al., Health care system Longitudinal Bonus of 0.5% to Process: Process: Poor: Single system, 2003
29 implemented a multi- analysis no 1% of total Rates of testing of HbA1c HbA 1 c test increased from 78.5% no comparison group, faceted diabetes care comparison physician and LDL, rate of annual eye in 1998 to 90.5% in 2002. no controls for program, which group compensation exams, LDL cholesterol screening test confounders. included financial Outcome: within the prior 2 years increased incentives for LDL and HbA 1 c values from 65.9% in 1998 to 91. 7% in individual physicians 2002. for diabetes QI from
Annual eye exam increased from 1998 to 2002
52% in 1998 to 62% in 2002.
Outcome: % with HbA1c less than 7.0 increased from 33.5% in 1998 to 52.8% in 2002.
Average HbA 1 c decreased from 8.1 in 1998 to 7.3 in 2002.
% with HbA 1 c greater than 9.5 decreased from 34.6% in 1998 to 21.4% in 2002.
% with LDL cholesterol was less than 130 mg/dL increased from 39.9% in 1998 to 69.8% in 2002.
Leitman et al., Beth Israel Medical Pre-post Gainshare Cost: Cost: Poor: Single system, 2010
39 Center implemented a analysis Cost-savings, average LOS, $7 million savings compared P4P and shared comparing Process: Process: participating savings program for participating Quality measures for AMI, Change in quality measures not physicians with individual physicians and CHF, pneumonia statistically significant nonparticipating using patient nonparticipating
Outcome: Outcomes: physicians, with
experience, patient physicians unclear controls for safety, process,
30-day mortality or No measurable change in 30-day confounding factors.
outcome, and readmission mortality or readmission
efficiency measures between 2006 and 2009.
71
Reference
Lester et al., 2010
46
Program Description
35 medical facilities participating in a P4P program through Kaiser Permanente Northern California from 1997 to 2007.
Study Design
Longitudinal analysis of participants including removal of incentives
Incentive Structure
Bonus
Measures Examined
Process: Screening for diabetic retinopathy, cervical cancer
Outcome: Control of hypertension (systolic blood pressure <140 mm Hg), Glycemic control (HbA1c <8%)
72
Findings
Process: Removing incentives for diabetic retinopathy screening declined on average by approx. 3% per year (mean change 3.1 %, 95% Cl, 2.4% to 3.8%) and cervical cancer screening by an average of approx. 2% per year (mean 1.6%, 95% Cl, 1.1% to 2.1%)
Outcome: Hypertensive adults whose systolic BP was less than 140 mm Hg increased (58.3%to 78.2%).
Glycemic control was incentivize and performance improved from 47% to 69.8%
Assessment of Methodological
Quality
Poor: Pre-post only within a single system.
Reference
Levin-Scherz et al., 2006
45
Program Description
Large, heterogeneous integrated delivery network that incorporated physician quality, efficiency, and structural metrics into P4P contract
Study Design
Longitudinal analysis (2001-2003) comparing to state and national trends
Incentive Structure
Contracts included some element of withhold, often approximately 10% of hospital
and/or physician fees.
Some included an opportunity for bonus payments beyond the agreed-upon fee schedule.
Withholds were returned or bonuses earned depending on regional service organization and Partners Community HealthCare, lnc.(PCHI) network performance compared with previously agreed targets
Measures Examined
Process: Performance on adult diabetes and pediatric asthma HEDIS measures
73
Findings
Process: HbA 1 c : Participants improved significantly greater than the statewide improvement rate on (7.0 vs. 4.9 percentage points, p < .05).
Diabetic eye exams: participants performance improved, while statewide performance declined slightly (18. 7 vs. -0.8 percentage points, p <O .05).
Diabetic LDL screening: Participants' performance improved by almost twice as much as the state average (13.2 vs. 7.4, p < .05).
Nephropathy screening:
Participant rates improved over twice as much as statewide improvement (15.2 vs. 12.9 percentage points, p<0.05).
All four diabetes measures: PCHl's 1st P4P plan achieved significant improvements on all 4 diabetes measures compared with national trends (p<0.05).
Pediatric asthma controller: Performance improved more than the state average on every measure except pediatric asthma controller use (1.7 vs. 3.9 percentage points, p >0.05).
Assessment of Methodological
Quality
Fair
Assessment of Incentive Methodological
Reference Program Description Study Design Structure Measures Examined Findings Quality
Mandel and 54 pediatric practices Longitudinal % of base pay Process: Process: Poor: Analytic Kotagal 2007
36 in the greater analysis based on reporting, Medication control, flu shots, % of the network asthma methods insufficiently Cincinnati area were (interrupted network and written self- population receiving "perfect explained to make involved in a P4P time series) performance, and management plans care" increased from 4% to 88%. strong determination. program that with no practice %of the network asthma rewarded practices for comparison performance population receiving the influenza participating in the group vaccine increased from 22% to collaborative, achieving network-
41%,
and practice-level performance thresholds, and building improvement capability related to asthma from 2003 to 2006.
Mullen et al., PacifiCare Difference-in- Bonus payment of Process: Process: Good: Regional 2010
42 implemented a QI differences $500-$5,000 based Measures related to Fail to find evidence that initiative intervention but strong program in California on performance screening, diabetes, and either resulted in major design with in conjunction with the prevention improvement in quality or notable difference-in- IHA P4P program. disruption in care differences approach Study analyzed effects and multiple years of of implementing both data. programs on incentivized and non- incentivized measures from 2001 to 2005.
Pearson et al., P4P programs Pre-post Combination of Process: Process: Fair 2008
5 introduced into analysis with bonuses and Measures related to process Not associated with greater physician group comparison withholds ranging measures related to improvement in quality compared contracts from 2001- group from $200 to a high screening, diabetes, and with a rising secular trend 2003 by 5 major of approximately prevention commercial health $2,500 per PCP plans in Massachusetts
74
Reference
Petersen et al., 2013
148
Program Description Study Design
RCT of P4P incentives RCT with time among Virginia trended primary care practices analysis for care (n=83 physicians and 42 non-physicians in 12 study sites) provided to hypertensive patients. Sites were randomized into 4 groups: (1) individual clinician-level incentives, (2) practice-level incentives, (3) combined-level incentives, and (4) no incentives. Participants were provided with educational webinars regarding treatment guidelines, and customized audit and feedback reports for 16 months starting in April 2008.
Incentive Structure
Bonus payments
Mean payment of $4,270 in combined group, $2,672 in individual group, and $1,648 in practice group
Measures Examined
Process: Use of recommended anti hypertensive medications or any medication management (start a medication, add a medication, or dose adjustment)
Outcomes: Blood pressure control or appropriate response to uncontrolled blood pressure
75
Findings
Process: While guideline-recommended medication increased significantly during 16-month period, there was no significant change compared with controls.'
Difference in proportion of patients receiving any medication adjustment among the individual level physician group compared with the control group was 15.36% (p=0.05)
Outcomes: Adjusted absolute difference of 8.36% difference in proportion of patients achieving BP control or receiving appropriate response between individual incentive group and controls (p=.005)
Follow-up for 12 months after the end of the incentive found that performance gains were not sustained and declined substantially, though not back to pre-intervention levels
Assessment of Methodological
Quality
Good: RCT with strong post hoc analysis to validate results.
16-month intervention period; small number of clinic sites.
Assessment of Incentive Methodological
Reference Program Description Study Design Structure Measures Examined Findings Quality
Pourat et al., Studies financial Cross-sectional Presence of Process: Process: Poor: Simple cross- 2005
34 incentives and comparison unspecified Five measures of sexually Physicians reimbursed with sectional sexually transmitted using financial incentives transmitted disease capitation and a financial associations. disease services in in regression from physician incentive for management of a cross-sectional surveys utilization (odds ratio [OR] = 1.63) sample of PCPs or salary and a financial incentive contracted with for management of utilization (OR Medicaid managed = 2.63) were more likely than care organizations in those reimbursed under other 2002 in 8 California methods to prescribe chlamydia counties drugs for the partner.
PCPs least often reported they annually screened females aged 15--19 years for chlamydia (OR = 0.63) if reimbursed under salary and a financial incentive for productivity, or screened females aged 20-25 years (OR = 0.43) if reimbursed under salary and a financial incentive for financial performance
Rosenthal et PacifiCare Difference-in- $0.23 per member Process: Process: Good: Regional al., 2005
10 implemented a P4P differences per month for each Cervical cancer screening, Significant improvement in intervention but strong program in California, comparing performance target mammography, and HbA1c cervical cancer screening relative design with incentivizing patient participants in that was met or testing to the control group (3.6%). difference-in- experience and California to exceeded. No significant improvement on differences approach process measure from nonparticipants mammography (p=0.13) and and multiple years of 2001 to 2004. in the Pacific hemoglobin A1c testing (p=0.50). data
Northwest
76
Assessment of Incentive Methodological
Reference Program Description Study Design Structure Measures Examined Findings Quality
Rosenthal, Bridges to Excellence Cross sectional Up to $50 for each Process: Process: Fair 2008
52 was first implemented comparison of patient covered by Process measures related to In one cohort, better performance in Massachusetts in non-recognized a participating diabetes and preventive on measures of cervical cancer 2003, with 2 major physicians in employer care. screening, mammography, and physician reward Massachusetts. Utilization: glycolated hemoglobin testing. components: the Patient resource use, In the other cohort, significantly Physician Office Link number of episodes per better performance on all 4 and the Diabetes Care patient and the total diabetes process measures of Link. resource use per episode quality, with the largest
differences observed in microalbumin screening (17.7%).
Utilization: Among recognized practices, significantly greater % of their resource use accounted for by evaluation and management services (3.4%), and a smaller % accounted for by facility (-1.6%), inpatient ancillary (-0.1 %), and non-management outpatient services (-1.0%). Recognized physicians had significantly fewer episodes per patient (0.13) and lower resource use per episode ($130).
Rosenthal et Culinary Health Fund, Panel data $100 to both the Cost/utilization: Cost/Utilization: Good: Longitudinal al., 2009
70 a union-sponsored analysis of pregnant member NICU admissions, spending Lowered odds of neonatal study with strong health plan, offered outcomes and and the member's in the first year of life intensive care unit admission design, including members and spending for network Outcomes: (0.45; 95% Cl, 0.23 - 0.88) instrumental variables providers financial participants and obstetrician or Low birth weight Lowered spending in the first to account for incentives to seek nonparticipants midwife year of life (estimated elasticity of confounding factors. prenatal care. using -0.07; 95%
instrumental Cl, -0.12 to -0.01)
variables to account for Outcome:
selection bias No reduction in low birth weight (0.53; 95% Cl, 0.23-1.18)
77
Reference
Roski et al., 2003
47
Serumaga, 2011
24
Program Description
40 clinics of a large multispecialty medical group practice were randomly allocated to receive performance incentives related to smoking cessation from 1999 to 2000.
Study Design
RCT focused on smoking cessation, provider adherence to accepted guidelines and associated patient outcomes. 40 clinics of a large multispecialty medical group practice were randomly allocated to control, incentive, and registry groups.
Incentive Structure
Clinics that met both goals with one to seven providers could receive a $5,000 award, and clinics with eight or more providers were eligible for a $10,000 bonus.
Clinics who reached or exceeded only one of the two performance goals were eligible for half the amount.
UK National Health Interrupted time PCPs can receive Service Quality and series analysis up to 25% of base Outcomes Framework (2000-2007) salary
Measures Examined
Process: Referral to and use of counseling program
Outcomes: Quit rate
Process: Rates of blood pressure monitoring
Outcomes: Blood pressure over time, blood pressure control, treatment intensity, hypertension related outcomes, all-cause mortality
78
Findings
Process: Patients visiting registry clinics accessed counseling programs statistically significantly more often (P 0.001) than patients receiving care in the control condition
Outcomes: Quitting rate (7-d sustained abstinence, not-incentivized) was 22.4% for the P4P group, 21.7% for the incentive registry group, and 19.2% for the control group
Assessment of Methodological
Quality
Fair
Process: After accounting for Fair secular trends, no changes in blood pressure monitoring (level change 0.85, 95% confidence interval -3.04 to 4.74, P=0.669 and trend change -0.01, -0.24 to 0.21, P=0.615), control (-1.19, -2.06 to 1.09, P=0.109 and -0.01, -0.06 to 0.03, P=0.569), or treatment intensity (0.67, -1.27 to 2.81, P=0.412 and 0.02, -0.23 to 0.19, P=0.706) were attributable to P4P.
Outcomes: P4P had no effect on the cumulative incidence of stroke, myocardial infarction, renal failure, CHF, or all-cause mortality in both treatment experienced and newly treated subgroups.
Assessment of Incentive Methodological
Reference Program Description Study Design Structure Measures Examined Findings Quality
Unutzer et al., The state of Survival Annual program Process: Process: Poor: Simple pre-post 201237 Washington analyses, which funding to Timely follow-up of patients After implementation of the P4P with no comparison
implemented a examined the participating clinics in the program, psychiatric incentive program, participants group. population-focused, time to was contingent on consultation for patients who were more likely to experience integrated care improvement in meeting several do not show clinical timely follow-up, and the time to program for safety net depression quality indicators improvement, and regular depression improvement was patients in 29 before and after tracking of psychotropic significantly reduced community health implementation medications Outcomes: clinics related to of the P4P Outcome: The hazard ratio for achieving depression from 2008 program. Treatment response treatment response was 1. 73 to 2010. (95% confidence interval = 1.39,
2.14) after the P4P program implementation compared with preprogram implementation.
Youn� et al., PCPs in Rochester, Pre-post with 5% physician fees Process: Process: Poor: Regional 2007 New York, received no comparison withheld to fund 5 diabetes measures: 2 Post-P4P implementation, population, simple
withheld bonuses for group incentive pools and Hemoglobin A 1 c tests, 1 statistically significant increases pre-post, no controls performance on returned based on LDL screening, 1 for all measures were observed, for confounding process and patient performance urinalysis/microalbumin, 1 flu with largest increases for LDL factors. experience measures. vaccination, and 1 eye exam screening and eye exams. Focused on diabetes No significant interaction term for measures. every measure, indicating that
there was no difference between the post- and pre-intervention trends.
Younf/; et al., P4P programs in 3 Two case Bonus of up to Process: Process: Poor: Limited to two 2010 o safety net settings in studies $4,000 based on Program A: annual retinal No evidence that P4P led to case studies.
Chicago, offering performance eye exam, annual HbA1c substantial improvements in incentives to physician testing for diabetics, quality. groups for prescription of controller performance on medications for patients with process-of-care asthma, and 6 well-child measures visits.
Program B: Annual HbA1c test, annual LDL check, and annual foot exam.
79
Table 3.5. Evidence on Effectiveness of Hospital Pay-for-Performance Programs
Assessment of Program Study Methodological
Reference Description Design Incentive Structure Measures Examined Findings Quality
Atkinson et Case study of Longitudinal Part of annual update at risk. Process: Process: Poor: Case study al., 2010
154 Long Island analysis Amount at risk unspecified 23 core Hospital Overall composite measure of quality within a single Health Network (2004- Compare measures has shown a steady increase over organization, no P4P program, 2008) of Utilization: time from 78 in the first quarter of comparison group, no implemented in single Case mix-adjusted 2004 to 93.3 in the first quarter of statistical testing
2004 and integrated LOS 2008
operated by 1 0 system Utilization: clinically Case mix-adjusted average LOS has integrated decrease of about 0.25 days from hospitals 2003 to 2008
Berthiaume Hospital Quality Single year Bonus payments provided Number of hospitals Process: Poor: Small sample et al., and Service cross based on point system receiving incentives 4 of 13 hospitals attained 85% size, no comparison 2004
156 Recognition section from consistent with GWTG-CAD adherence to the GWTG-CAD group, no statistical program: 2002 program performance measures testing, results included Implemented by only the proportion of
the Hawaii hospital meetings goals Medical Services and receiving Association, incentives
focused on GWTG-CAD
Berthiaume Hospital Quality Longitudinal Bonus payments provided Outcomes: Outcomes: Poor: Small sample et al., and Service analysis based on point system Surgical/OB LOS and Significant reduction in Surgical LOS, size, no comparison 2006
155 Recognition (2001- consistent with GWTG-CAD complications, patient no change in OB LOS group program: 2004) of program experience No statistically significant change in Implemented by participants complications the Hawaii
No statistical significant change in Medical Services Association, with
patient experience reported
17 hospitals focused on GWTG-CAD
80
Reference
Calikoglu et al., 2012
57
Program Description
Quality-Based Reimbursement Program and the Hospital-Acquired Conditions Program sponsored by the State of Maryland studied from 2009 to 2011
Study Design
Longitudinal analysis comparing MD hospital trend with national trend
Incentive Structure
Rewards for highest performers and penalties for lowest performers.
Reallocation is the % of total inpatient revenue that the hospital was penalized or rewarded by, based on its performance score. The maximum penalty for the
quality-based reimbursement program is set at 0.5%, and the distribution of penalties and rewards is determined based on a linear scale.
Measures Examined
Safety: 3M's 64 preventable conditions list
Process: 19 core CMS and Joint Commission process measures in 4 care domains: heart attack, CHF, pneumonia, and surgical infection prevention.
81
Findings
Safety: Preventable conditions declined, especially infection-related conditions (All included: -18.59%, infection related -27.83%, all other-14.33% p<0.001
Process: Only measure that improved faster was influenza vaccination for pneumonia patients (+20.5% in MD vs. +15.1%).
Assessment of Methodological
Quality
Fair
Reference Program
Description
Glickman et CMS HQID al., 2007
53
Study Design
Longitudinal analysis (2003- 2006) comparing change in participants to nonparticipa nts
Incentive Structure
HQID methodology (see page 48 for details)
Measures Examined
Process: CMS measures:
aspirin at arrival, aspirin at discharge, angiotensin-converting enzyme inhibitor or angiotensin receptor blocker for left ventricular systolic dysfunction, Smoking cessation counseling for active or recent smokers, Beta Blocker at arrival, Beta Blocker at discharge
Non-CMS measures:
Glycoprotein llb/1 Ila inhibitor use, clopidogrel at discharge, any heparin use, lipid-lowering medication, dietary modification counseling, referral for cardiac rehabilitation, electrocardiogram within 10 minutes, cardiac catheterization within 48 hours
Outcomes: In-hospital death
82
Findings
Process: Slightly higher rate of improvement for 2 of 6 targeted incentivized therapies at P4P vs. control hospitals for aspirin at discharge (OR 1.31 vs. 1.17, p=.04), smoking cessation counseling (OR 1.50 vs. 1.28, p=.05). No significant difference in a composite measure of the 6 incentivized measures between groups.
Outcomes: No evidence that in-hospital mortality improvements were incrementally greater at P4P hospitals (change in odds of in-hospital death per half year period, 0.91 vs. 0.97, p=.21 ).
Assessment of Methodological
Quality
Good: Solid design with a comparison group to account for fixed difference in outcomes across practices, adjusted for patient risk in mortality models
Reference
Grossbart, 2006
153
Program Description
CMS HQID
Study Design Incentive Structure
Difference - HQID methodology (see page in- 48 for details) differences from 2003- 2004
comparing participating hospitals within Catholic Healthcare partners to those that
did not participate
Measures Examined
Process: Composite quality scores in 3 clinical areas: AMI, CHF, and pneumonia. Number of
opportunities and % improvement for each measure of AMI, CHF, and pneumonia
83
Findings
Process: Participating hospitals improved their composite scores by 9.3% versus 6.7% for nonparticipating hospitals (p < .001 ).
For CHF, improvement from baseline to the 1st year for participating hospitals was 19.2% versus 10.9% for nonparticipating hospitals in CHF (p < .001 ).
In the area of AMI, the improvement from baseline to the 1
st year for
participating hospitals was 3.1 %
versus 2.9% for nonparticipating hospitals, although this was not significant (p = .730).
Among pneumonia patients,
nonparticipating hospitals slightly outpaced the pay-for-performance cohort (7.9% vs. 7.2%), although again, the difference was not significant (p = .395).
Assessment of Methodological
Quality
Fair
Reference
Herrin et al., 2008
60
Jha et al., 2012
73
Program Description
Health care system in Texas implemented a P4P program that distributed
bonuses to director/clinical managers and chief executive officers for patient experience, process, and efficiency measure.
CMS HQID
Study Design
Longitudinal analysis (2002- 2005) with comparison
hospitals in Texas
Longitudinal analysis (2003- 2009) with comparison group
Incentive Structure
Portion of salary at risk based on performance, ranging from 10% for clinical managers to 60% for the chief executive officer.
HQID methodology (see page 48 for details)
Measures Examined
Process: Quality index based on 13 core Joint Commission measures related to AM I,
pneumonia, CHF, and surgical site prevention
Outcomes: Mortality
Outcome: 30-day mortality among patients who had AMI, CHF, pneumonia or who underwent CABG in HQID and non-HQID hospitals
84
Findings
Process: On seven measures, Baylor Healthcare System hospitals improved compliance more rapidly.
For three of the core measures, BHCS hospitals increased compliance significantly faster: beta- blockers at admission (p = .04 ), beta blockers at discharge (p = .007), and
antibiotics within 4 hours (p = .014). In contrast, for the three non-exposed measures, BHCS hospitals had average changes that were smaller or that were even more negative, though not significantly so, than other hospitals reporting to the Joint Commission.
Outcome: No significant difference in mortality rate.
Outcome: At baseline, the composite 30-day mortality was similar for HQID and non-HQID hospitals.
The rates in mortality per quarter decreased at the HQID and non- HQID hospitals were similar (0.04%
and 0.04%, difference, -0.01 percentage points; 95% Cl, -0.02 to 0.01).
After 6 years, mortality remained
similar in HQID and non-HQID hospitals (11.82% and 11.74%; difference, 0.08 percentage points; 95% Cl, -0.30 to 0.46).
No evidence that HQID led to a decrease in 30-day mortality.
Assessment of Methodological
Quality
Fair
Fair
Assessment of Program Study Methodological
Reference Description Design Incentive Structure Measures Examined Findings Quality
Kruse et al., CMS HQID Difference- HQID methodology (see page Costs: Costs: Good: Utilized a 2012
77 in- 48 for details) Hospital revenues, No significant effect of P4P on difference-in- differences costs, and margins or hospital revenues, costs, and differences design with using data Medicare payments margins or Medicare payments a strong empirical from 2002 (index hospitalization (index hospitalization and 1 year after framework to also to 2005 and 1 year after admission) for AMI patients. account for time-variant
admission) for AMI hospital characteristics patients
Lindenauer CMS HQID Longitudinal HQID methodology (see page Process: Process: Good: Large national et al., 2007
59 analysis 48 for details) 10 individual process Pay-for-performance hospitals sample with a solid (2003- measures of AMI, showed significantly greater matching methodology 2006) using CHF, and pneumonia improvement than did control to account for potential an exact and composite scores hospitals in 7 of the 10 individual confounders. match for AMI, CHF, measures. Pay-for-performance approach to pneumonia, and all hospitals also achieved greater match HQID combined improvement in all the composite hospitals process measures, with differences with ranging from 4.1 % for pneumonia controls (P<0.001) to 5.2% for CHF
(P<0.001).
Nahra et al., Blue Cross Blue Pre-post % add-on to hospitals' Process: Process: Poor: Limited to a 2006
157 Shield of comparison inpatient DRG reimbursements Aspirin at discharge; Aspirin at discharge patients from single region, no Michigan among from Blue Cross Blue Shield of AMI patients receiving 87% to 95%, Beta blockers from 81 % comparison group, no implemented a participating Michigan. beta blocker at to 93%, and ACE inhibitors from 70% controls included in hospital incentive hospitals Maximum possible add-on for discharge; CHF to 80%. calculation of "benefit" system for heart- heart related patients receiving ACE Outcome: related care
care has increased from 1.2% inhibitor prescriptions Improvement in quality-adjusted life
involving 85 of a hospital's BCBSM
at discharge. years between 733.3 and 1,701.2 hospitals.
inpatient Outcome:
DRG reimbursements in 2000- Quality-adjusted life
2002 to 2% of a hospital's Blue years
Cross Blue Shield of Michigan inpatient DRG reimbursements
in 2003
85
Assessment of Program Study Methodological
Reference Description Design Incentive Structure Measures Examined Findings Quality
Nicholas et CMS HQID Longitudinal HQID methodology (see page Process: Process: Good: Multiple years of al., 2011
54 analysis 48 for details) CMS core measures P4P hospitals did not preferentially a large national (2003- increase efforts for easy tasks sample, strong analytic 2005) with in patients with CHF or pneumonia, design using fixed and comparison but they did exhibit modestly greater random effects and group effort on easy tasks for heart attack hospital characteristics
admissions. to control for potential confounders
Ryan et al., CMS HQID Difference- HQID methodology (see page Costs: Costs: Good: Multiple years of 2009
78 in- 48 for details) Risk-adjusted 60-day No evidence that the HQID had a a large national differences cost for AMI, CHF, significant effect on risk-adjusted 60- sample, strong analytic using pneumonia, or CABG day cost design using fixed and multiple Outcomes: Outcomes: random effects and years of Risk-adjusted 30-day No evidence that the HQID had a hospital characteristics data (2000- mortality for AMI, CHF, significant effect on risk-adjusted 30- to control for potential 2006) pneumonia, or CABG day mortality confounders
Ryan and Mass Health Longitudinal Hospitals were eligible to Process: Process: Good: Multiple years of Blustein analysis receive three types of rewards: CMS core measures Estimates from preferred a large national 2011
55 (2004- "Attainment Award," given to for pneumonia and specification, found small and non- sample, strong analytic 2009) with hospitals with composite surgical site infections significant program effects for design using fixed comparison scores exceeding the median pneumonia (-0.67 percentage points, effects and hospital- group from HQID hospitals 2 years p>0.10) and SIP (-0.12 percentage specific time trends to
prior; and "Improvement points, p>0.10) control for potential Award," given to hospitals confounders scoring above the median of HQID hospitals in the current year and also ranking within the top 20% in terms of QI among HQID hospitals.
86
Reference
Ryan et al., 2012a
90
Program Description
CMS HQID
Study Design Incentive Structure
Matched HQID methodology (see page difference- 48 for details) in- differences using multiple years of data (2004- 2009)
Measures Examined
Process: Composite process quality scores for AMI, CHF, and pneumonia
87
Findings
Process: In every case, HQID hospitals improved their quality more than matched comparison hospitals in phase I
HQID hospitals experienced a weakening of QI relative to matched comparison hospitals in phase II.
In both phases, average adjusted annual QI was greater for demonstration hospitals than for matched comparison hospitals for each diagnosis.
Overall difference-in-differences estimates indicated that HQID hospitals improved less in phase II than phase I, compared with comparison hospitals, the difference was significant for HF and pneumonia, but not AMI.
Assessment of Methodological
Quality
Good: Large national sample, used match comparison group, and differences-in differences to account for other time invariant differences between hospitals
Reference
Sutton et al., 2012
72
Program Description
P4P program implemented in 24 hospitals in the northwest UK
Study Design
The triple difference (2007- 2010) analysis captured the effect of the program on mortality for the conditions included in the program in the northwest region in addition to changes over time in overall mortality in the northwest region and differences in mortality between the conditions included and not included in the program between the northwest region and the rest of England
Incentive Structure
HQID methodology (see page 48 for details)
Measures Examined
Outcome: Changes in mortality
88
Findings
Outcome: Risk-adjusted, absolute mortality for the conditions included in the pay-for performance program decreased significantly.
Absolute reduction of 1.3 percentage points (95% confidence interval [Cl], 0.4 to 2.1; P = 0.006)
Relative reduction of 6%, equivalent to 890 fewer deaths (95% Cl, 260 to 1500) during the 18-month period. The largest reduction, for pneumonia, was significant (1.9 percentage points; 95% Cl, 0.9 to 3.0; P<0.001 ),
No significant reductions for acute myocardial infarction (0.6 percentage points; 95% Cl, -0.4 to 1.7; P = 0.23)
and CHF (0.6 percentage points; 95% Cl, -0.6 to 1.8; P = 0.30).
Assessment of Methodological
Quality
Good: Very strong analytic approach with multiple sensitivity checks
Reference
Werner et al., 2011
56
Program Description
CMS HQID
Study Design
Longitudinal analysis (2004- 2008) with matched comparison group
Incentive Structure
HQID methodology (see page 48 for details)
Measures Examined
Process: CMS core measures for AMI, pneumonia, and CHF and calculated the composite scores for pneumonia and CHF
89
Findings
Process: Performance of the hospitals in the project initially improved more than the performance of the control group: More than half of the pay-for performance hospitals achieved high performance scores, compared with less than a third of the control hospitals. However, after five years, the two groups' scores were virtually identical.
Assessment of Methodological
Quality
Good: National sample of intervention practices over time matched to large number of comparison practices using a number of key variables
Table 3.6. Evidence on Effectiveness of Pay-for-Performance Programs in Other Settings
Assessment of Program Incentive Methodological
Reference description Study design structure Measures examined Findings Quality
Hittle et Medicare RCT from 2007 Program cost Outcome: Outcome: Fair al., 2011
75 implemented the to 2008 savings were 21 measures of Only 2 measures (improvement in Home Health comparing distributed to the activities of daily living; pain interfering with activity and Agency P4P treatment, highest-performing 7 incentivized, 14 not improvement in urinary demonstration and control, and agencies and the incentivized incontinence), which were both incentivized nonparticipants most improved non-incentivized, showed improvements in significant differences btw patient outcomes treatment and control participating and cost-savings to home health agencies. Medicare Utilization:
No significant difference in change between treatment and control hospitalization or emergent care
Shen Maine Office of Office of Annual payment Outcomes: Outcome: Fair 2003
76 Substance Abuse Substance Abuse update dependent The proportion of Performance-based contracting incentivized clients were on previous outpatient clients had a significantly negative nonprofit providers compared before performance classified as being the marginal effect on the probability of to care for high- and after the most severely ill Office of Substance Abuse clients priority substance intervention to being most severe abuse clients Medicaid patients
Shepard Addiction services RCT from 1994 Counselor could Process: Process: Fair et al.
6 company offered to 1996 earn a bonus of Number of treatment 59% of patients in treatment group
2006 incentives to 11 $100 for each client sessions completed at least five sessions, substance abuse who completed at whereas 33% in comparison group counselors least five treatment completed the same providing outpatient sessions aftercare treatment
90
Reference
Werner, 2013
74
Program description
Medicaid's nursing home P4P from 2001 to 2009
Study design
Difference-in differences
Incentive structure
Point system translating into a per-diem add-on
Measures examined
Resident-level indicator of clinical outcomes (e.g., falls, pressure sores, catheter insertion, and restraints) and facility level regulatory deficiencies (total number of deficiencies in a given year and the number of immediate jeopardy deficiencies).
91
Findings
Outcome: Three clinical quality measures (the % of residents being physically restrained, in moderate to severe pain, and developed pressure sores) improved, other targeted quality measures either did not change or worsened. Two structural measures (total number of deficiencies and nurse staffing) worsened slightly under P4P
Assessment of Methodological
Quality
Good: Multiple years with difference-in differences design
Reference
An et al., 2008
49
Beard et al., 2013
80
Beaulieu and Horrigan
2005 41
Table 3.7. Pay-for-Performance's Effect on Unmeasured Areas-Unintended and Spillover Effects
Program Description Unintended Consequences
Improvements in Areas Not lncentivized by Program (Spillover Effects)
RCT of usual care vs. No evidence of unintended Not reported P4P for smoking quit consequences.
line referrals in 25 Referral rates of contact and usual care clinics with subsequent enrollment in quit 24 P4P clinics. 10 services did not differ between month study period usual care and P4P sites. from 2005-2006.
Retrospective cohort study assessing measures within the VAs for appropriate care and overtreatment of lipid management among a cohort of patients with diabetes. 1-year study period from 2010-2011.
Independent Health managed care plan in New York state physician P4P program (n=17 physicians). Focus on diabetes process and outcome measures. 8-month study period from 2001 to 2002.
13. 7% received potential overtreatment: high-dose statins for patients with no diagnosis of ischemic heart disease either during or before the measurement period.
Not reported
Assessed performance on two non-incentivized measures for mammogram and colorectal screening. 10 physicians improved, 7 remained unchanged.
Authors concluded that physicians did not reallocate effort away from preventive screening toward diabetes care.
95
Assessment of Methodological Quality
Poor: Small intervention, short time period. Strength is randomization of clinic sites.
Fair : Data did not capture care provided outside of the VA. Strength is large nationally representative sample.
Poor: Small number of study participants (n= 17 physicians). Physicians self-selected; one small region, short duration, physicians not matched at baseline. Comparison patients had higher baseline performance on all measures
Reference
Healy and Cromwell 2012
86
Calikoglu et al., 2012
57
Program Description
CMS identified 8 conditions for which it would no longer pay a higher DRG rate if the conditions occurred in the inpatient setting and were not present on admission. 3-year evaluation from 2008 to 2010.
Two P4P programs implemented in 2008 by the state of Maryland, one focused on process measures and one on HACs. (2007-2010)
Unintended Consequences
Across all payers, counting all secondary diagnosis codes had the greatest positive effect in raising HAC rates for Medicare and Medicaid beneficiaries. Evidence of undercoding HACs for trauma and falls, deep vein thrombosis/PE following certain orthopedic procedures, stage Ill or IV pressure ulcer, catheter associated urinary tract infection, and vascular catheter-associated infection.
Highest undercoding rates found for trauma and falls and deep vein thrombosis/PE after orthopedic procedures.
No consistent pattern in coding could be found across hospital characteristics across the HA Cs.
No evidence of unintended consequences. Audits to guard improper coding found 98% of hospitals were coding correctly present on admission
Improvements in Areas Not lncentivized by Program (Spillover Effects)
Assessed rates of decline in HACs among non Medicare payers as a result of the Medicare HAC Present on Admission nonpayment. No consistent pattern in the reporting of the rates of HACs across 3 years or by type of payer or by state.
Not reported
96
Assessment of Methodological Quality
Fair: Examined variation across 4 states in reported rates and differences in coding.
Poor: Measured change compared with base period for HACs. No accounting for secular effects and anticipatory behavior related to implementation of CMS non-payment policy going into effect in 2012. Regional effort in an all payer state. No controls for confounders. No comparison group or trends prior to implementation of program.
Reference Program
Description
Campbell and UK P4P contract for Marchildon, family practitioners 2007
84 started in 2004. Study assesses longitudinal change at three time points 1998, 2003 and 2005 after introduction of P4P in 2004
Campell et al., 2009
159 UK P4P contract (Quality Outcomes Framework) for PCPs started in 2004. 136 performance indicators
Interrupted time series analysis examined longitudinal change for 42 practices at four time points before and after implementation of P4P (1998 pre P4P, 2003 pre P4P,2005 post-P4P, and 2007 post-P4P)
Unintended Consequences
Not reported
Improvements in Areas Not lncentivized by Program (Spillover Effects)
Performance on indicators with incentives for three conditions examined was substantially higher at all three time points than for those without incentives. The rate of improvement between 2003 and 2005 for clinical indicators for which financial incentives were provided, as compared with those for which they were not, did not differ significantly from the rate predicted based on the trend between 1998 and 2003. There may have been a halo effect between incentivized and non-incentivized indicators focused on the same conditions. The finding of no significant difference in the rate of improvement between clinical indicators for which financial incentives were provided and those for which they were not provided suggests that the P4P program may not necessarily have been responsible for the acceleration in improvement found between 2003 and 2005.
Study found a ceiling effect for Not reported primary care practices (2005: practices achieved 96.9% of available clinical quality payment points; 2007: practices achieved 97 .8% of available clinical quality points).
Continuity of care declined after implementation of P4P in 2005.
97
Assessment of Methodological Quality
Fair: Absence of a control group as P4P was implemented nationally. Small sample size to assess spillover effects. Results may not be generalizable to the US. UK program had EHRs in all clinical practices with prompts for clinical measures, national health insurance, substantial incentives, and a history of significant investments in QI efforts that started measures on upward trajectory prior to P4P
Fair: Absence of a control group as P4P was implemented nationally.
Small sample size to assess spillover effects. Results may not be generalizable to the US. UK program had EHR in all clinical practices with prompts for clinical measures, national health insurance, substantial incentives, and a history of significant investments in QI efforts that started measures on upward trajectory prior to P4P
Reference
ChunRi et al., 2010 3
Collier, 200738
Program Description
Palo Alto Medical Clinic physician P4P program (primary care). 9 incentivized clinical outcome and process measures during study period from 2005 to 2007.
A community health care system implemented a P4P program for 12 hospitalists regarding standards on access, timeliness of medical record dictation, and participation in monthly hospitalist meetings, quality measures, and self directed learning. (pre-P4P 2003-2004 vs. post-P4P 2005- 2006)
Unintended Consequences
Not reported
Not applicable
Improvements in Areas Not lncentivized by Program (Spillover Effects)
Accelerated improvement for 1 of 5 non-incentivized measures (BP control for hypertensive patients) from 65% to 72% (p=0.01)
Average LOS for patients (not incentivized) decreased more for patients of P4P hospitalists from 2005 to 2006 (5.22 to 4.84 days, excluding outliers,) than non P4P hospitalists (4.89 to 4.87 days, excluding outliers).
98
Assessment of Methodological Quality
Poor: Compares 2006- 2007 performance against 2005-2006 (pre-post) in same organization. Not match providers or patients within providers. One organization with unique characteristics (EHR, low patient turnover, high patient socioeconomic status (SES), history of physician feedback on performance); overlap of measures with the statewide IHA P4P program
Poor: Does not account for secular improvement trends in Joint Commission/CMS measures and declines in LOS. Concurrent non-contracted group and non-hospitalists (not matched). Only a single organization and analytic methods poorly explained. Unclear if results generalize.
Reference
Drake et al., 200?
160
Fagan et al., 2010
40
Program Description
CMS HQID incentivized hospital performance on 5 clinical conditions.
Evaluated 130 top performing hospitals on the pneumonic antibiotic timing measure in the 1st year of the HQID (2003-2004) and changes in antibiotic prescription rates for other clinical conditions.
Longitudinal study analyzing claims files of 20,943 adults aged �65 with diabetes receiving care from 9 primary care practices in Alabama, Tennessee.and Texas. Evaluated performance on 5 incentivized measures, 2 non incentivized measures, and 2 resource-use measures was evaluated (1,587 intervention patients and 19,356 patients in comparison practices). (2004- 2007)
Unintended Consequences
Increased rate of meeting the pneumonia antibiotic timing measure was correlated with an increase in inappropriate pneumonia antibiotic use among patients with CHF, asthma, and chronic obstructive pulmonary disease. There was insufficient data to assess antibiotic use rates for pulmonary embolism, pulmonary edema and respiratory failure, and bronchiolitis and respiratory syncytial virus.
Not applicable
Improvements in Areas Not lncentivized by Program (Spillover Effects)
Not reported
No evidence of spillover effect of P4P on non incentivized measures (short-acting antihypertensive medication (OR=1.11 95% Cl (.58, 2.13)) or prescribing an ACE for those with renal insufficiency (OR=0.76 95% Cl (0.54, 1.06)).
99
Assessment of Methodological Quality
Poor: No multivariate analysis, simply demonstrated that better performance on antibiotic timing was correlated with inappropriate prescribing in some circumstances
Good: Quasi-experimental longitudinal study (pre-post data). Relatively large region, difference-difference (like) design to control for time invariant confounders
Reference
Glickman et al., 2007
53
Herrin et al., 2008
60
Hittle et al., 2011
75
Program Description
Patients with non-ST segment elevation myocardial infarction enrolled in CRUSADE exposed to CMS HQID demonstration Evaluation program from 2003-2006.
Baylor Health Care System in Texas implemented a P4P program in 2001 at 5 hospitals. Bonuses to director/clinical managers and chief executive officers for patient experience, process, and efficiency measures. Study period from 2001-2005.
Medicare Home Health Agency P4P demo. lncentivized improvements in outcomes and cost savings to Medicare. Evaluation of demo from 2007-2008.
Unintended Consequences
No deleterious effect on other aspects of clinical care given simultaneous hospital participation in a QI registry not involving financial incentives.
Not reported
Not reported
Improvements in Areas Not lncentivized by Program (Spillover Effects)
For composite measures of AMI treatments not subject to incentives, rates of improvement were not significantly different between P4P hospitals and controls (P4P hospital composite OR =1.09 vs. 1.08 for controls, p=.49), except lipid lowering medication, which was significantly higher at P4P hospitals (OR=1.23 vs. 1.13, p=.02)
No evidence of spillover effects.
Compared 3 measures not exposed to P4P (percutaneous coronary intervention within 120 minutes, thrombolytic therapy within 30 minutes for AMI, and discharge instructions for CHF). P4P hospitals had smaller average increases or larger average decreases than comparison hospitals, but differences were not significant. No significant difference in mortality rate.
Among the non-incentivized measures, treatment sites performed slightly better (though not significant differences) than the control group. Two non incentivized measures (improvement in pain interfering with activity and improvement in urinary incontinence) showed significant differences, with treatment group outperforming controls.
100
Assessment of Methodological Quality
Good: Observational, patient level analysis. Large sample, multiple years of data. Solid design with a comparison group to account for fixed difference in outcomes across practices, adjusted for patient risk in mortality models
Fair: Weak study design (pre post), though some attempt to control for confounds. Comparison hospitals may differ substantially from 5 exposed to this intervention. Does not control for selection effects in measures reported to Joint Commission (which were voluntary)
Fair
Reference
Jha et al., 2012
73
Kerr et al., 2012
82
Program Description
CMS HQID incentivized hospital performance on 5 clinical conditions. Study examined association between performance on incentivized measures and inpatient mortality for AMI, pneumonia, and CHF. Program evaluation from 2003-2009.
Retrospective cohort study assessing measures within the VA for appropriate care and overtreatment of high blood pressure among a cohort of patients with diabetes. 1-year study period from 2009 to 2010.
Improvements in Areas Not lncentivized Unintended Consequences by Program (Spillover Effects)
Not reported No difference in trends in mortality rates between HQID and non-HQID hospitals (p=0.36) for outcomes that were not linked to incentives (CHF, and pneumonia)
-8% had potential overtreatment. Patients with potential overtreatment were found to be older, male, have ischemic heart disease, and have lower mean index BP.
Among patients older than 76 with diabetes, -12% were potentially over treated.
Not reported
101
Assessment of Methodological Quality
Fair
Fair: Retrospective cohort design shows that overtreatment are approaching rates of under treatment solely in the VA. Strength of the study is a very large sample of clinics and patients.
Reference
McDonald and Roland 2009
161
Program Description
Comparison of providers exposed to UK Quality and Outcomes Framework P4P program and medical groups in California exposed to IHA P4P program.
Qualitative interviews with 40 physicians to assess physician perspective on unintended consequences of P4P programs.
Unintended Consequences
UK physicians reported P4P changed the nature of the office visit (due to large number of performance measures (n= 80) and heavy reliance on EHRs to prompt delivery of services), while California physicians expressed resentment about P4P and less motivation to act on incentives. California physicians were less aware of targets and witnessed less change in the nature of office visits. California physicians reported frustration with the inability to exclude patients from performance calculations, with some reporting undesirable behaviors such as dropping non-compliant patients. California physicians in the medical group with the largest incentives reported accusing patients of damaging their performance rating or lying to patients about the financial consequences of their refusing to comply.
Most California physicians expressed concern that performance targets diminished clinical autonomy, while English physicians did not feel the same.
Improvements in Areas Not lncentivized by Program (Spillover Effects)
Not reported
102
Assessment of Methodological Quality
Poor: Difficult to generalize more broadly to other US P4P programs. California physician sample drawn from 4 organizations that ranged in size from 600 to 3,000 physicians, with various percentages of payment linked to P4P. The 4 U.S. groups may not be representative of the broader experience in the IHA program or nationally. All physicians in UK sample use EHR with prompts for quality indicators, while only 7 of the physicians in U.S. sample used EHR
Reference
Mullen et al., 2010
42
Nicholas et al., 2011
54
Program Description
PacifiCare implemented a QI program in California in conjunction with the IHA P4P program. Study analyzed effects of implementing both programs on incentivized and non incentivized measures. (2001- 2005).
Examined whether hospitals increase efforts on easy tasks relative to difficult tasks to improve scores under P4P, using the HQID demonstration data. Measures were classified as easy or difficult to improve based on whether they introduce additional per-patient costs and compared process compliance on easy and difficult tasks at hospitals eligible for HQID bonuses relative to hospitals engaged in public reporting. Study period from 2003to 2005.
Unintended Consequences
No evidence of disruptions in care
Study found little evidence that hospitals changed allocation of efforts across tasks to maximize performance scores at lowest cost.
P4P hospitals did not preferentially increase efforts for easy tasks in patients with CHF or pneumonia, but they did exhibit modestly greater effort on easy tasks for heart attack admissions.
Improvements in Areas Not lncentivized by Program (Spillover Effects)
Unclear effects on non-incentivized measures
No real gains associated with diabetic eye exam rates, despite other diabetic measures being rewarded by QI program and IHA.
No changes found for non-incentivized heart-related measures relative to control group.
Non-incentivized appropriate antibiotic use declined slightly.
Despite the presence of 2 other incentivized measures for women's health (breast cancer screening and cervical cancer screening), the non-incentivized Chlamydia screening rates decreased by -2-5% points relative to its time trend and the Northwest control group.
Not reported
103
Assessment of Methodological Quality
Good: Regional intervention but strong design with difference-in differences approach and multiple years of data
Good: Multiple years of a large national sample, strong analytic design using fixed and random effects and hospital characteristics to control for potential confounders
Reference Program
Description
Shen, 2003 1b Maine Office of Substance Abuse incentivized nonprofit providers to care for high-priority substance abuse clients through performance-based contracting. Study period from 2001 to 2005.
Youn.get al., 2010 o
Analyzed P4P programs in 3 safety net settings in Chicago, offering incentives to physician groups for performance on process-of-care measures. Study period from 2005 to 2007.
Unintended Consequences Improvements in Areas Not lncentivized
by Program (Spillover Effects)
Found selection effects, with Not reported the most severely ill group significantly declining in treatment under the performance-based contract by 7% (P:. 0.001 ), compared with 2% among the Medicaid comparison groups.
No evidence that P4P compromised quality on unmeasured areas. Survey responses indicated that participating physicians did not have strong concerns about unintended consequences.
Performance on non-incentivized measures (adolescent well-child visits, LDL screening, and nephropathy) increased during study period.
104
Assessment of Methodological Quality
Poor: Simple pre-post, small region
Poor: Limited to two case studies
Reference
Chien et al., 2010
22
Table 3.8. Unexpected Effects on Access and Disparities of Pay-for-Performance Programs
Program Description
Hudson Health Plan (Medicaid) implemented a P4P program that incentivized immunization delivery to 2-year-olds according to the recommended series. $200 bonus/child (15- 25% above base reimbursement) (2003-2007)
#of Providers
or Patients Studied
115 Hudson primary care practices; 16 comparison health plans
Effect on Access to Care
Not reported
Effect on Disparities
No exacerbation in preexisting disparities. Racial/ethnic disparities fluctuated, but remained essentially unchanged.
107
Assessment of Methodological Quality
Good: Regional but multiple years of observation. Case comparison and strong difference and difference design
Reference
Doran et al., 2008
91
Program Description
UK National Health Service Quality and Outcomes Framework P4P program. Bonus payments to PCPs achieving threshold quality targets for various clinical and patient experience quality measures. (2004-2007).
#of Providers
or Patients Studied Effect on Access to Care
7367 general Not reported primary care practices
Effect on Disparities
Primary practices in the more deprived quintile improved at the fastest rates (increase by 7.6% compared with the least deprived quintile, 4.4% increase). Gap in median achievement between highest and lowest deprivation quintiles narrowed from 4.0% (year 1) to 1.5% (year 2) to 0.8% (year 3).
The variation in achievement decreased at faster rate for practices in most deprived areas. Patterns were consistent across all 48 indicators.
By year 3, the SES gradient had almost disappeared, though the poorest-performing practices remained concentrated in most deprived areas.
108
Assessment of Methodological Quality
Good: Compared a large number of practices before and after intervention. Concern about generalizability from UK to the United States due to different characteristics of delivery system (national health insurance with universal access, national health IT system). Only practices with stable populations and complete data collection were included; only fairly unchanged indicators could be analyzed; analyses at the practice not patient level (comorbidity will have led to some patients being counted twice) deprivation was summarized at the level of super-output areas.
Reference
Jha et al., 2010
88
Program Description
CMS Premier HQID
lncentivized hospital performance on 5 clinical conditions.
Evaluation examined
association between the DSH index and changes in performance for AMI, CHF, and pneumonia.
(2003 4th quarter) and July 2006-June 2007)"
#of Providers
or Patients Studied
251 of 255 HQID hospitals compared with a national sample of 3017 hospitals
Effect on Access to Care
Not reported
Effect on Disparities
By 2007, after 3 years of incentives, the DSH index was no longer associated with terminal performance for the three conditions; for non incentivized hospitals (national sample), a higher DSH index was associated with lower terminal performance for the three conditions. Hospitals with more poor patients caught up to hospitals with fewer poor patients in the incentivized sample of hospital; this did not occur for the national sample comparison group
At baseline, among HQID hospitals, a 10-point increase in DSH was associated with a -0.8% (95% Cl, -1.3%, -0.3%) lower performance on AMI, and -1.1% (95% Cl, -1.7%, -0.5%) lower performance on pneumonia. Non-incentivized hospitals performance was also negatively associated with the DSH index for all 3 measures as baseline.
For HQID hospitals, a 10-point increase in the DSH index was associated with a 0.1 % lower terminal performance on AMI (p=0.23), a 0.07% higher terminal performance on pneumonia (p=0.72), and no significant difference in terminal performance on CHF (p=0.81 ). A higher DSH index was still associated with lower terminal performance in the national sample for each of the 3 conditions. In 2007, the interaction term btw the DSH and change in performance for HQID and non-HQID hospitals was significant and negative for AMI (-0.6, p=0.045) and pneumonia (-0.2, p=0.009), but not for CHF (p=0.65). The interaction term btw the DSH and terminal performance for HQID and non-HQID hospitals was statistically significant for pneumonia (-0.8, p<0.001 ), borderline significant for AMI (-0.4, p=0.064 ), and not significant for CHF (p=0.17 4 ).
109
Assessment of Methodological Quality
Poor: Two separate pre post analyses with different data sets (HQA data for national sample and HQID data for P4P hospitals). Limited adjustments for hospital characteristics. Did not adjust for difference in patient characteristics or match hospitals at baseline. Possible selection effects with HQID hospitals; may differ in ways that are not observed. Results are not generalizable to other hospitals.
Reference
Ryan, 201 O H �
Program
Description
CMS Premier HQID P4P program that incentivized hospital performance on 5 clinical conditions. (2000-2006)
#of Providers
or Patients
Studied
3,981,516 Medicare beneficiaries studied
Effect on Access to Care
Little evidence that the HQID P4P reduced access for minority patients. No significant pre-post differences in adjusted admission rates to HQID hospitals for any diagnosis. "Other race" beneficiaries had a significant reduction in adjusted admissions in the post period for AMI, but there was a secular reduction in AMI admissions pre-intervention. There was no evidence that hospitals close to thresholds for quality bonuses were more likely to avoid minority patients.
Effect on Disparities
Reductions in CABG rates for each racial and ethnic cohort between pre and post period reflected substitution of CAGB to percutaneous transluminal coronary angioplasty during that period (change in clinical practice). Marginally significant ( p<0.10) evidence of a reduction in probability of receiving CABG was found for minority patients and other race beneficiaries. Minimal evidence of minority patient avoidance, which may be due to practice of exception reporting (hospitals were allowed to exclude patients from counting toward quality performance).
110
Assessment of
Methodological Quality
Good: National sample, pre/post implementation of P4P. Strong estimation procedure including a difference-in-differences and time variant patient characteristics (co morbidity, admission type) and hospital characteristics. Results may not generalize to non elderly patients.
Reference
Ryan et al., 2012b
58
Program
Description
CMS Premier HQID P4P program that incentivized hospital performance on 5 clinical conditions, Phases I and II of intervention.
(2000-2008).
Between Phase I and Phase II, CMS shifted the incentive structure from only providing incentive payments to hospitals in the top 2 deciles of performance to paying hospitals that improved or had high absolute performance.
#of Providers
or Patients
Studied
266 hospitals (250 HQID hospitals and 250 comparison hospitals)
Effect on Access to Care Effect on Disparities
In Phase I, there were substantial gaps for receipt of any incentive payment (hospitals in the highest DSH quartile were 32.8 percentage points less likely (;<0.01) to receive any payments than hospitals in the lowest DSH quartile), total incentive payment (hospitals in highest DSH quartile received $26.84/discharge less than those in the lowest DSH quartile), and incentive payment per discharge across the DSH quartiles.
In Phase II, the gap was not significant for the receipt of any incentive payment. Gap was reduced but remained significant for incentive payment per discharge: payments per discharge increased for hospitals in the two highest quartiles of DSH, but decreased for hospitals in the lowest DSH quartile. There were no significant reductions in the gap for total payments.
From Phase I to Phase II, the median change in incentive payments per discharge -$2.58 for Quartile 1 (lowest DSH), $0.43 for Quartile 2, $6.99 for Quartile 3, and $14.85 for Quartiles 4 (highest DSH), indicating hospitals serving disadvantaged patients received more incentive payments per discharge.
Authors caution that the narrowing of the gap in incentive payments was not the result of lower performing hospitals improving more in response to Phase 2 incentives; changes in the distribution of payments were likely the result of a change in incentive scheme
111
Assessment of
Methodological Quality
Good: Large national sample, used match comparison group, and differences-in-differences to account for other time invariant differences between hospitals
Reference
An et al., 2008
49
Chien et al., 2012
43
Table 3.9. Factors Associated with Performance on lncentivized Measures
Program Description and # of Providers Studied
RCT of usual care vs. P4P for quit line referrals from 2005 to 2006. The study compared rates of referral; contact and enrollment after referral; and project costs in 25 usual care clinics with 24 P4P clinics.
Cross-sectional study of IHA P4P program. Examined the association between physicians organization located in lower SES areas and performance on P4P measures.
11,718 practice sites within 160 physician organizations (2009).
Metric Assessed
% of smokers referred to quit line services: number of unique
individuals referred divided by the estimated number of smokers seen in the clinic. Costs: Fixed clinic costs were divided equally across both groups. Development costs: time of physicians and staff of project, Fairview Physicians
Associates, and health plan. Implementation costs: information packages to clinics, feedback efforts to intervention clinics, including triage fees, staff time, and incentive payments. Pay rates based on annual salaries for participating
staff. Costs were from an insurer's perspective.
IHA composite performance score and PO area based SES measure based on Krieger's area based measure.
Characteristics of High Performers
No associations between the % of smokers referred and clinic specialty type, number of physicians, and presences of EHR. No difference in mean referral rates observed in highly engaged clinics between P4P vs. control clinics (15.1% vs. 14.1% p=0.85). Differences observed for engaged clinics (10.1% vs. 3%, p=0.001) and less engaged clinics (10.1 % vs. 1.1 %, p=0.02) for P4P vs. control.
Largest physician groups had a higher likelihood of being ranked in the top 40% of performance than smallest POs (RR=2.55; 95% Cl 1.67-3.90, p<0.001), as did medical groups when compared with independent practice associations (RR=2.93, 95%CI 2.00-4.28, p<0.001 ).
113
Characteristics of Low Performers
Not applicable
Significant positive relationship between PO SES and P4P performance (trend test p<0.001 ). POs in higher SES areas had higher performance scores. Median performance score of POs in the highest SES quintile was
almost 20 points higher than POs in the lowest quintile.
POs with higher percentages of Medicaid revenue were
less likely to be in the highest 2 performance quintiles (RR=0.68, 95% Cl 0.50-0.93, p=0.017).
Reference
Coleman et al., 2007
27
Program Description and # of Providers Studied
Access Community Health Network, a large system of federally qualified health centers, implemented P4P incentives in 2004 for absolute performance and improvement
on large set of process and outcome measures. This study examines effects on HbA1c testing and control. Evaluated 1 , 166 patients treated by 46 PCPs. (out of 266 who treated diabetic patients in the federally qualified health centers) (2002- 2004 ).
Metric Assessed
Avg. annual # of encounters per diabetic patient, % diabetic patients with any HbA1c test,% diabetic patients with recommended number of HbA 1 c tests, % diabetic
patients with controlled blood sugar (HbA1c <7, HbA1c<9).
Characteristics of High Performers
High performers remain at the top of the performance distribution.
114
Characteristics of Low Performers
Low-performing showed greatest improvement
Reference
Damber?,et al.,2010
Doran et al., 200891
Program Description and # of Providers Studied
IHA program is a statewide P4P program in California for physician groups. Bonuses for meeting patient experience, process and outcome measures, and health information technology infrastructure. Study examined relationship between performance on P4P measures and use of care management processes.
180 physician groups.
UK National Health Service P4P program (2004-2007). Bonus payments to PCPs that achieve a threshold proportion of patients meeting quality targets for various clinical and patient experience measures.
7367 general primary care practices.
Metric Assessed
Effect of care management processes on P4P composite performance measure (clinical processes of care).
48 clinical activity indicators.
Characteristics of High Performers
The Care Management Process (CMP) index demonstrated significant positive associations with performance on 2 of the composite measures, namely diabetes management and intermediate outcomes. Higher performance in diabetes management (3.2 points higher on a 0-100 performance scale) was associated with substantial investments in CMPs (>5 CMPs on a 0-6 scale); each 1.0-point increase on the CMP index translated into a 1.0-point gain for the intermediate outcomes composite (P <.001 ).
Higher engagement in external QI initiatives was significantly positively associated with the processes-of care component; a 1.0-point increase on the QI index translated into a 1.4-point gain on the CMP index (P = .02). Among the control variables, medical group organization type was significantly associated with higher performance for 2 of the composite measures (3.0-4.6 points higher for medical groups compared with independent practice associations). Physician organization size was positively associated with higher performance on the processes-of-care composite (1.5 points) (P = .002). The net effect of increasing the number of physicians within a PO from 10 to 100 physicians on the log scale would translate into a 3.5-point gain for the processes-of-care composite, with an effect size of 1.5. We observed no relationship between Medicaid revenue and performance.
Characteristic with positive association with achievement was the exclusion rate (a 1 % higher rate of exclusions was associated with a 0.35% higher rate of achievement in year 2 and 0.16% higher rate in year 3 (p<0.01 )). Other associations that were positive (though modest) were the number of PCPs/10,000, the percentage of female PCPs, the percentage medically educated in the UK. Area deprivation scores were significantly associated with reported achievement, but association was very modest. Prior practice performance was associated with increase in achievement over time (the lower the achievement, the greater the increase in achievement).
115
Characteristics of Low Performers
None reported
Larger practice size, population density, the percentage of PCPs >50 years of age, and percentage of patients >65 of age were negatively associated with achievement (p<0.01 ).
Reference
Doran et al., 2006
164
Jha et al., 2010
88
Lindenauer et al., 2007
59
Program Description and # of Providers Studied
The National Health Service funded $3.2 billion in 2004 to provide bonus payments to PCPs that achieve a threshold proportion of patients meeting quality targets.
8,105 practices with 1 or more family practitioners.
CMS Premier HQID incentivized hospital performance on 5 clinical conditions. Examined association between the DSH index and changes in performance for AMI, CHF, and pneumonia.
251 of 255 HQID hospitals compared with a national sample of 3017 hospitals.
(2003 (4th quarter) and July 2006-June 2007).
The HQID incentivized hospital performance on 5 clinical conditions. Study examined performance on 10 AMI, pneumonia, and CHF measures in HQID and control hospitals.
613 hospitals part of a national public reporting initiative, 207 of which participated in HQID.
Metric Assessed
2004-2005 performance on 10 clinical quality indicators.
Association between the disproportionate share index and baseline quality performance,changes in performance, and terminal performance for AMI, CHF, and pneumonia.
10 individual process measures of AMI, CHF, and pneumonia and composite scores for AMI, CHF, pneumonia, and all combined were considered in HQID and control hospitals.
Characteristics of High Performers
Achievement was higher in practices with a high ratio of family practitioners to patients. (p<.01) However, the multiple regression model explained only 20% of the variation between practices, and all of these effects were small.
High DSH index was associated with greater improvements for AMI and pneumonia.
Largest improvements among hospitals with the poorest baseline performance for CHF. In HQID hospitals, improvement on the composite of the 10 examined process measures was 16.1 % for hospitals in lowest quintile and 1.9% for those in highest quintile at baseline (p<0.001).
116
Characteristics of Low Performers
Achievement was also lower in larger practices and in practices with a high proportion of family practitioners who received their medical education outside the United Kingdom or were 50 years of age or older, lower in practices that were on the Primary Medical Services contract. (p<.01)
Higher DSH index was associated with lower performance for AMI, CHF, and pneumonia at baseline.
Not reported
Reference
Nicholas et
al., 2011 54
Rosenthal et
al., 2005 10
Program Description and # of Providers Studied
The HQID incentivized hospital process measures for 5 clinical conditions. Classified HQID process measures as easy or difficult to improve based on whether they introduce
additional per-patient costs and compared process compliance on easy and difficult tasks at hospitals eligible for HQID bonuses relative to hospitals engaged in public reporting.
145 (with sufficient data)/255 completing the 3 year HQID; 1089 control hospitals publicly reporting to Hospital Compare.
(2002-2005)
PacifiCare implemented a P4P program in California, incentivizing patient experience and process measures, but did not implement a P4P program in the Pacific Northwest. Medical group performance was compared between those in California and those in the Pacific Northwest.
Sample of 167 medical groups contracting with Pacificare in California exposed to a financial incentive and 42 medical groups in the Northwest not exposed to the incentive.
Metric Assessed
Process-of-care measures. Classified incentivized tasks as easy or difficult to improve by considering additional per patient costs. Hospitals categorized into quintiles based
on performance on process composite score in year 1.
Cervical cancer screening, mammography, and hemoglobin A1c testing. Total potential dollars that could have been distributed in each quarter and the total, average, and max payouts. Number of groups in each quarter that received any bonus and the number that reached at least half of the targets.
Characteristics of High Performers Characteristics of Low
Performers
Fail to find statistically significant effects for P4P hospitals Not reported at either end of the initial quality distribution relative to hospitals with average scores.
75% of the dollars were earned by groups that had Not reported
achieved the benchmarks prior to the incentive program. Physician groups with baseline performance at or above the target improved the least. Mammography rates of physician groups with baseline performance at or above the target improved by only 0.7%, whereas physician groups more than 10% below the target at baseline improved 6.6% (p=0.07). Groups below but within 10% of the target, and physician groups more than 10% below the target were statistically significant for cervical cancer screening (p=0.03; p=0.02).
117
Reference
Werner et al., 2011
56
Program Description and # of Providers Studied
The HQID incentivized hospital performance on 5 clinical conditions. Evaluated performance compared with control group.
260 out of 267 hospitals that joined in FY 2004; 780 control hospitals.
Metric Assessed
Hospital Compare data on AMI, pneumonia, and CHF and calculated the composite scores for pneumonia and CHF (excluded AMI composite because data missing mortality measure) for HQID and control hospitals. Compared performance btw the 2 groups and the change in distribution over time (cumulative % of hospitals meeting the performance thresholds after P4P implementation. Hospitals were stratified based on proxy calculations of bonuses received using the Medicare revenue for incentivized conditions divided by the total hospital Medicare revenue; effects of market competition using the Herfindahl Hirschmann Index score of the Hospital Service Area; and the baseline financial status by taking the average total margin of the 4 years pre-P4P implementation.
Characteristics of High Performers
Improvements were largest among hospitals that were eligible for larger bonuses, were well financed, or operated in less competitive markets.
120
Characteristics of Low Performers
Not applicable

Get help from top-rated tutors in any subject.
Efficiently complete your homework and academic assignments by getting help from the experts at homeworkarchive.com