Skip to main content

Work with DiMe on an upcoming project

Lessons Learned Implementing Apple ResearchKit for a Study at Mount Sinai

Noah Zimmerman

Apple ResearchKitĀ is already accelerating research by attractingĀ huge numbersĀ of people to participate in medical research for the first time. In this blog we share our experience using Active Tasks included in ResearchKit to assess cognitive function, and offer recommendations for how this critical resource can continue to evolve to further advance research and contribute to new clinical insights.

As a member of DiMeā€™s Scientific Leadership Board, I am excited about DiMeā€™s commitment to supporting the field of digital medicine through advancing best practice, collaboration, and knowledge sharing. In that spirit, it is an honor to be our first guest blogger, sharing a post originally written withĀ Matt JohnsonĀ for Mount Sinaiā€™sĀ Health Data and Design Innovation Center blog.

ā€˜Big Techā€™ Comes for Healthcare

Advancements inĀ digital medicineĀ have created a shift in the flow of health data and information, placing a stronger emphasis on the technologies used for measurement and intervention. Tech giants are taking notice, as both hardware and software are central to accelerating this transformation. The participation of technology-native companies into the healthcare ecosystem stands to benefit all stakeholders. For example, their resources and power for positive change have played a role inĀ helping consumers get more portable access to medical recordsĀ from entrenched interests like the dominant electronic medical record vendor, Epic.

Just in the past year, Apple has:

With these moves, Apple has the potential to become a digital platform for health, allowing 3rd party developers to build an app ecosystem for healthcare the way they have already done for mobile (amongĀ other possible goals).

In this post we focus onĀ Appleā€™s ResearchKit, one component of the Apple Health platform. We share some benefits and challenges we encountered while buildingĀ an app for N-of-1 trialsĀ using ResearchKit, and use this experience to frame specific recommendations for technology organizations involved in theĀ digital transformation of medicine.

ResearchKit Background

ResearchKit has proven to be a valuable resource in leveraging mobile technology for medical research. Not only does it allow researchers to develop and implement digital studies more rapidly, but it also moves studies out of the clinic and into the pocket of every iOS user. Early examples of studies built using ResearchKit, like theĀ Parkinsonā€™s mPower study, have demonstrated new opportunities for researchers to engage far broader audiences than previously possible. People that want to participate in research, but are unable to travel to an academic medical center, have new avenues for volunteering.

So, why is ResearchKit important for researchers?

ResearchKit makes it easier to collect data and information for research studies in a standardized way. Tools for electronic consent and administering questionnaires allow researchers to implement several of the most basic tasks of any study. The myriad sensors and functionality already baked into our smartphones are also accessible through ResearchKit, enabling novel measurement instruments. For example, theĀ Parkinsonā€™s mPower appĀ uses the microphone to record and assess signs of tremor in the voice of participants, while the multitouch screen assesses the manual dexterity and speed of participantsā€™ left and right hands during a tapping exercise.

ResearchKit includes 19 predefined sensor-based assessment instruments, calledĀ ActiveTasks, for domains including motor activity, fitness, cognition, speech, hearing, vision and dexterity. These tasks often mirror analog counterparts, such as the 9-hole peg test for manual dexterity, or the Amsler Grid test used to detect vision problems. The tasks can be used as-is, or as building blocks to create customized tasks.

A sample of Active tasks in ResearchKit and the sensors they use: Timed Walk (GPS, Gyroscope), Spatial Memory (multi-touch)

A sample ofĀ Active tasks in ResearchKit

Our experience implementing the ResearchKit Stroop Test

The Stroop Test is one of the ActiveTasks available for researchers using ResearchKit for digital studies. This test was originally developed in 1935 and continues to be used to evaluate executive functions in conditions ranging from Dementia to ADHD. Like many of the tests in Appleā€™s Research Kit, the validated paper-based test needed to be adapted for use on iOS devices. As is typical during conversion processes, some decisions need to be made ā€” size and placement of buttons, how to deal with user mistakes, choosing appropriate default or customizable parameters, etc.

The word blue, written in the color green, with 4 buttons to select the first letter of the name of the COLOR that is show

An example trial of Stroop task in ResearchKit

There are two key differences between the ResearchKit Active Task and the original Stroop test that render the straightforward analysis originally described by Stroop (a difference in means) impossible to calculate (details and history of the Stroop Interference Test). Instead, we had to choose a new metric to evaluate executive function. There are two measures collected by the ResearchKit variation of the Stroop test:

  1. AccuracyĀ ā€” the percentage of correct answers

Evaluating either of the measures individually do not capture the intent of the task (e.g. you can trivially get perfect accuracy by going very slowly; alternatively, you can respond very quickly and have poor accuracy). A good measure of executive function from data collected from this ActiveTask must integrate accuracy and reaction time into a single measure. As it turns out, there isĀ a whole literatureĀ on methods to combine speed and accuracy measures of performance in cognitive research!

In our application, we decided to use the rate correct score (RCS) fromĀ Wolf & Was 2006. The score is calculated by dividing the number of correct responses by the total time to complete the test. This seems reasonable, and it performed well across a variety of conditions whenĀ compared to other metrics.

Importantly, there is noĀ rightĀ way to implement and score the Stroop Test on a mobile device. The assessment called the Stroop Test in ResearchKit is different from the original paper-and-pencil version of the test. Although it departs from a validated test design, it also collects additional measurements which may broaden research findings. The Stroop Test can be used in different contexts ā€” different conditions, patient populations, and settings. While there is no single implementation across the board, there may be better ways in specific contexts of use. What is desirable, is that if we use the Apple ResearchKit Stroop implementation, another researcher can compare their results to ours; and an important part of that consideration is how they choose to score the test.


  1. The Apple ResearchKit team should provide a list of recommended or suggested analyses for each of the predefined ActiveTasks.

Rationale:Ā To facilitate comparison, which is an important component to achieving meaningful research conclusions, and help to prevent developers or researchers from analyzing results based on incomplete evaluation metrics. By including recommended or suggested analyses with ActiveTasks, Apple could go even further by promoting standardization and interoperability of research results.

2. Standardize the predefined ActiveTasks developed by Apple

Rationale: To limit unnecessary variability, parameters across all predefined active tasks should be standardized. For example, the duration of the countdown timer and whether the device vibrates before the start of a task should be consistent for all predefined tasks developed by Apple. Default values that are chosen for task-specific parameters, should also be well-thought through and vetted with supporting evidence or rationale. This would likely encourage adoption as it would provide researchers greater confidence in the outcome of the tasks and limit variability in various task-specific parameters. With more researchers using the same predefined active tasks, these tasks are more likely to become a standard and provide a strong foundation for further research.

Accelerating Research Through Collaboration

For ResearchKit, and any emergent frameworks, better guides for implementation of certain features and transparency regarding design choices and analytic approaches are critical to driving adoption by developers and to strengthening the science of researchers.

Where vendors and manufacturers are unable to make recommendations on analytic approaches or support knowledge sharing across study teams, multi-disciplinary, disinterested organizations could champion knowledge sharing and transparency initiatives. For example, theĀ Digital Medicine Society (DiMe), a 501c3 that serves as a professional society for those who practice in the digital era of medicine has established aĀ crowdsourced library of digital measuresĀ used as study endpoints in industry sponsored studies to promote knowledge sharing and collaboration.

We are excited to be using ResearchKit and are grateful to the developers that have contributed to the open-source platform. Weā€™d welcome feedback and new ideas from others in the research community that are early adopters of ResearchKit (tweet usĀ @zimmeeeĀ andĀ @immadbananas, or send email to info@hd2i.org).

Join our next project

Help streamline the path to regulatory and commercial success to optimize health outcomes for the greatest number of patients

Join the Integrated Evidence Plans project

Join us
Not today