If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Address correspondence to Ariel S. Frey-Vogel, MD, MAT, Department of Pediatrics, MassGeneral Hospital for Children, 175 Cambridge St, fifth floor, Boston, MA 02114
Affiliations
Department of Pediatrics, MassGeneral Hospital for Children (AS Frey-Vogel and K Dzara), Boston, MassHarvard Medical School (AS Frey-Vogel and K Dzara), Boston, Mass
Department of Pediatrics, MassGeneral Hospital for Children (AS Frey-Vogel and K Dzara), Boston, MassHarvard Medical School (AS Frey-Vogel and K Dzara), Boston, Mass
Department of Pediatrics, Hasbro Children's Hospital (EY Chung), Providence, RIThe Warren Alpert Medical School of Brown University (EY Chung), Providence, RI
The Accreditation Council on Graduate Medical Education requires residents to teach and many residency programs assess resident teaching competency. While much formal resident-led teaching is for large groups, no corresponding published assessment instrument with validity evidence exists. We developed an instrument for faculty to assess pediatric resident-led large group teaching and gathered preliminary validity evidence.
Methods
Literature review and our experience leading resident-as-teacher curricula informed initial instrument content. Resident focus groups from 3 northeastern pediatric residency programs provided stakeholder input. A modified Delphi panel of international experts provided iterative feedback. Three investigators piloted the instrument in 2018; each assessed 8 video recordings of resident-led teaching. We calculated Cronbach's alpha for internal consistency and intraclass correlation (ICC) for inter-rater reliability.
Results
The instrument has 6 elements: learning climate, goals/objectives, content, promotion of understanding/retention, session management, and closure. Each element contains behavioral subelements. Cronbach's alpha was .844. ICC was excellent for 6 subelements, good for 1, fair for 1, and poor for 3.
Conclusions
We developed an instrument for faculty assessment of resident-led large group teaching. Pilot data showed assessed behaviors had good internal consistency, but inconsistent interrater reliability. With further development, this instrument has potential to assess resident teaching competency.
We describe the development and pilot study of an instrument for faculty to assess resident-led large group teaching. The instrument was developed to allow residents to receive structured actionable feedback and residency leadership to assess resident teaching competency.
Residents teach frequently and desire more feedback on their teaching.
and may benefit from data about resident teaching competency. According to Miller's pyramid of learner assessment, assessing actual resident teaching allows for assessment at the highest level of “does,” which best predicts resident teaching competency.
While residents often teach in the large group setting, there is a gap in the literature on assessment of large group teaching. Multiple assessment instruments have been developed to broadly assess teaching adult learners.
Medical student evaluation of teaching quality between obstetrics and gynecology residents and faculty as clinical preceptors in ambulatory gynecology.
Of those designed for medical teaching, the instruments were largely designed for Observed Structured Clinical Examinations, 1:1 teaching, or medical student assessment of resident teaching. We identified no instruments with validity evidence for faculty evaluation of resident-led large group teaching. When programs assessed resident-led large group teaching in order to assess their resident as teacher (RAT) curricula, they utilized instruments without rigorous validity evidence.
There are recurring themes among published instruments, suggesting a shared understanding of key aspects of resident teaching, but no one instrument which uses the themes to allow for faculty assessment of resident-led large group teaching. These themes are: establishing learning climate, teacher enthusiasm, initial learner assessment, goal communication, session control, instructional techniques, teacher knowledge, active learner involvement, learner evaluation and feedback, and application of concepts taught.
This study aims to systematically develop an instrument, building on these themes, for faculty to assess resident-led large group teaching. The instrument will serve 2 purposes, to provide: 1) residents objective assessment and feedback and 2) residency program directors data on individual resident teaching competency.
Methods
Study Sites
The study authors are pediatricians (A.F.V., K.D., and E.C.) at 3 academic teaching hospitals who lead RAT curricula for pediatric residents and a medical education researcher (K.D.). These sites are MassGeneral Hospital for Children and Brown, both medium-sized urban residency programs, and Dartmouth, a smaller rural program, all in northeastern United States. At each site, all postgraduate year (PGY) 3 pediatric residents (and PGY4 medicine-pediatric residents at one site) participate in a RAT curriculum leading 2 to 8 case-based teaching conferences per year for a group of 10 or more mixed learners including medical students, residents, and faculty. Small groups require face-to-face interactions among and active involvement by all participants
; because these requirements are difficult to achieve with an audience of 10 or more, we define this as a large group. At each site, the specific requirements for conference vary, but most residents present a case. Residents receive varying degrees of mentorship, observation, and feedback on their teaching and the emphasis placed on large group teaching varies across sites. The Institutional Review Boards at Partners Healthcare, Dartmouth College, and Rhode Island Hospital exempted the instrument development from review and approved the pilot study.
Instrument Development
We developed a list of core instrument content based on our experiences and a literature review (Figure, Step 1).
Medical student evaluation of teaching quality between obstetrics and gynecology residents and faculty as clinical preceptors in ambulatory gynecology.
We revised the instrument with each subsequent round of input (2017–2018). First, a 1-hour focus group of 4 to 11 residents from all levels of training was held at each site to identify areas of assessment and feedback that would improve teaching. All residents were invited to participate via email; focus group sizes were determined by resident interest and availability. The focus groups were led by K.D., who has no evaluative role in any residency program (Figure, Step 2).
FigureSteps taken by the study authors to develop the assessment instrument during 2017 to 2018.
We then recruited a group of 14 international faculty with expertise in RAT curricula, assessment, and instrument development to create a modified Delphi panel; the study authors did not participate in this panel. The faculty were chosen based on our review of the RAT literature and personal knowledge of their work with a goal of recruiting a group with diverse geographical representation, medical specialty, training (MD and PhD), and expertise area. All panel members were asked for input simultaneously via survey for each round. Through 3 rounds of instrument review, the panel provided input on the importance of the subelements, the relationship between the subelements, and appropriate anchors for rating the behaviors (Figure, Steps 3–5).
Pilot Study
We invited all senior pediatric residents scheduled to lead upcoming conferences at each site to participate in a pilot (Spring 2018). Ten residents between the 3 sites volunteered to have 1 to 2 of their self-designed teaching sessions video recorded. Six of the teaching sessions were case-based discussions, 3 lessons learned from personal experiences, and 1 a gamified case series. Teaching sessions were 30 to 60 minutes depending on site requirements. Three study team members independently assessed each video using the instrument (A.F.V., E.C., and K.G.) without prior discussion about instrument use. We subsequently discussed our assessments of 2 videos together, revised the instrument to increase clarity and objectivity of the instrument's behaviors (Figure, Step 6 and Supplemental Content 1) and developed a guidebook for instrument use. We then independently reassessed the remaining 8 pilot teaching sessions using the guidebook.
Data Analysis
To determine internal consistency, Cronbach's alpha was calculated for the overall instrument as well as for each subelement. To determine inter-rater reliability, intraclass correlation (ICC) was calculated for each subelement by averaging the ICCs of each behavior within the subelement on the instrument. Calculations were completed using SPSS statistical package version 25 (SPSS Inc, Chicago, Ill).
Results
Instrument Development
The preliminary instrument encompassed a comprehensive list of potential content, including: 6 elements, 56 subelements, and an overall global rating scale (Figure, Step 1). From the resident focus group (Figure, Step 2), we found that the residents prefer comments over numerical scores because comments more easily translate to behavioral change. Residents indicated that the instrument would be helpful when preparing for teaching and act as a scaffold for debriefing with faculty. To meet resident needs, comment sections were added.
Fourteen Delphi panelists participated in round 1 (Figure, Step 3) to rate the subelements’ importance with the goal of decreasing the number of subelements. The tool was subsequently revised based on the feedback. In round 2, the panelists determined which subelements were actually behaviors describing other subelements to further decrease the number of subelements. Eleven of the invited 14 Delphi panelists participated in this round (Figure, Step 4). In round 3, 8 of the invited 14 Delphi panelists (Figure, Step 5) provided input on instrument structure (Supplemental Content 2). After this, the instrument had 6 elements, 12 subelements, 37 behaviors with anchors: “not at all,” “partially,” “consistently,” and “as a role model” (Supplemental Content 3).
Pilot Study
After applying the instrument independently to 2 of the pilot videos, some behaviors were too subjective to allow for rater agreement. We eliminated 1 subelement and 4 behaviors, leaving a final tool with 6 elements, 11 subelements, and 33 behaviors. The remaining 8 teaching sessions taught by 6 different residents representing all sites were used for the pilot study. The physicians on our team (A.F.V., E.C., and K.G.) independently assessed all 8 sessions using the finalized instrument. From these 24 assessments of 8 teaching sessions, the overall internal consistency of the instrument was excellent (Cronbach's alpha .844; Table). Four behaviors and their corresponding subelements were not analyzed because they showed no variance across assessor or teaching session. The ICC was excellent for 6 subelements, good for 1, fair for 1, and poor for 3 (Table).
TableIntraclass Correlation of the Items on the Tool and Inter-rater Reliability of the 3 Raters Who Assessed 8 Resident Teaching Videos From All 3 Sites During the Pilot Study in 2018. Measures Calculated Using Cronbach's Alpha and Intraclass Correlation (ICC), Respectively, Where the Average of the ICC for All of the Behaviors Within a Subelement Made up the Subelement's ICC
Categories of reliability derived from Cicchetti28 where ICC <0.40 is poor reliability, 0.40 to 0.59 is fair reliability, 0.60 to 0.74 is good reliability, and 0.75 to 1.00 is excellent reliability.
Had participants articulate further learning goals for themselves on the topic.
0.984
Excellent reliability
ICC indicates intraclass correlation.
There was no variability across learner or assessor for several behaviors and the internal consistency of the subelements to which they belonged was not calculated for this reason. Those behaviors were:
aUsed respectful and inviting verbal and nonverbal language.
bStarted and ended session on time.
cDetermined if the learning objectives were met.
dHad participants articulate further learning goals for themselves on the topic.
We created an instrument with potential for faculty to assess resident-led large group teaching. It was developed based on input from faculty who lead RAT curricula, literature review, resident focus groups, a modified Delphi process with RAT experts, and an instrument pilot. It incorporates themes from previously developed instruments in the literature
Medical student evaluation of teaching quality between obstetrics and gynecology residents and faculty as clinical preceptors in ambulatory gynecology.
with the specific goal of allowing faculty to assess resident-led large group teaching, which was lacking in previously published tools. Further validation studies are needed for use in high-stakes assessment; the instrument provides a framework to help residents plan their teaching and guide feedback.
When utilized in the pilot study, the instrument had overall high internal consistency. Some subelements had lower internal consistency or were not included in the calculations in part due to the lack of variability in scores across assessor and observed teaching session. For example, the section on closure included several behaviors which no residents performed in the pilot (“determine if learning objectives were met” and “had participants articulate further learning goals for themselves on the topic”). Conversely, other behaviors (“used respectful and inviting verbal and nonverbal language” and “started and ended session on time”) were not included in the calculations because they were always accomplished by the residents in the pilot study. These behaviors should be assessed because they are fundamental aspects of teaching, yet their inclusion makes calculating internal consistency difficult. These issues will be assessed in future validity studies. We did not include qualitative feedback in the pilot because its purpose is as formative feedback which was not our purpose with the pilot.
Inter-rater reliability was excellent or good for 7 of the 11 subelements. The low inter-rater reliability for the other 4 elements may be due to: 1) the assessors not undergoing a formal process to develop a shared mental model for instrument use and 2) some instrument behaviors having little variability among assessors or teaching episodes. One instrument with rigorous validity evidence assesses faculty-led large group teaching
Subsequently, novice assessor reliability improved through frame-of-reference (FOR) training, where experts explained their shared mental model and used it to give novices feedback.
FOR training would be a useful technique in training assessors for our instrument.
Our study has several limitations. The assessors developed the instrument, which may have inflated the instruments’ internal consistency and ICC in the pilot study. The instrument was not tested outside of academic pediatric settings. However, the teaching behaviors assessed should be applicable to any resident-led large group teaching setting, as evidenced by the recurring themes found in teaching assessment instruments across disciplines.
Medical student evaluation of teaching quality between obstetrics and gynecology residents and faculty as clinical preceptors in ambulatory gynecology.
The instrument was only piloted on senior residents, who may have less variation in teaching competency than all residents. We also did not conduct a formal qualitative analysis of the focus group data. Furthermore, we did not follow a strict Delphi panel method, but rather summarized the experts’ perceptions. We had a reduction in the number of Delphi panel participants with each round which was likely due to the degree of labor required to review the large number of behaviors and work spanning several months.
Our next step is to develop a shared mental model and update our instrument guidebook. Subsequently, we will use a FOR approach to train faculty at our 3 sites. These faculty will assess teaching sessions recorded over the course of a full academic year to determine the instrument's: ICC, internal consistency, and agreement with clinical competency committee assessment of senior residents on the nonreported ACGME pediatric teaching milestone, “Develop the necessary skills to be an effective teacher.”
Because the instrument remains lengthy, we hope to determine which behaviors are duplicative and could be removed using factor analysis. After refining the tool, we will consider whether a shorter version of the tool could assess less formal teaching, such as on rounds. It will be important to examine resident perceptions of the instrument after its use.
In conclusion, we created an instrument for faculty to assess resident-led large group teaching. Our pilot data demonstrate that our instrument has good internal consistency when used by instrument developers. Without formal rater training, our inter-rater reliability was excellent for over half of the instruments’ subelements. With formal rater training, our instrument may enable faculty to assess resident teaching at Miller's highest level and serve as a tool to improve their ability to give feedback on resident teaching and our residents to have a guide for planning their teaching.
Acknowledgments
The authors wish to thank Virginia Reed, PhD, MS, for her statistical support, the residents who volunteered to participate in our focus groups and to be video recorded for the pilot study, the experts who participated in our modified Delphi panel, the chief residents who video recorded the resident teaching sessions, and our residency program directors for their support. We would also like to thank Jacob Johnson, MD, and Daniel Saddowi-Konefka, MD, for their feedback on an earlier version of this work.
Financial statement: This study was funded by an Association of American Medical Colleges’ Northeastern Group on Educational Affairs collaborative research grant. The PI for this grant was Ariel Frey-Vogel, MD, MAT. The funder had no role in study design, data analysis, collection, or interpretation, in the writing of the manuscript, or in the decision to submit the manuscript for publication.
Accreditation Council for Graduate Medical Education
ACGME Common Program Requirements.
2017 (Section IV.A.5.c)(8) https://www.acgme.org/Portals/0/PFAssets/ProgramRequirements/CPRs_2017-07-01.pdf. Published 2017. Accessed January 10, 2019)
Medical student evaluation of teaching quality between obstetrics and gynecology residents and faculty as clinical preceptors in ambulatory gynecology.