Attachment A: Human Subjects Research Implications of “Big Data” Studies

Specific recommendations include the following, the background and rationale for each of which is set forth in this SACHRP report:

Recommendation One:

Given the apparent regulatory anxieties of many researchers and IRBs in the context of big data studies, and the likelihood that this anxiety leads to delayed or deferred studies, it would be useful for OHRP to clarify the application of the minimal risk and practicability consent waiver standards to such research. The guidance should suggest that compliance with existing rigorous data privacy and security regulations and best practices may be consistent with a finding of minimal risk to privacy. Guidance also should indicate that the practicability standard would be met by a finding that it is not reasonably possible to contact a sufficient number of study participants, due to the predictable unavailability of many of those subjects, without whose consent the database would be inappropriate and inadequate for the additional proposed research purposes.

Recommendation Two:

OHRP guidance should advise IRBs that in applying waiver standards, it is appropriate to consider risk not as static, but instead to suggest to researchers possible mechanisms under which risks – especially risks to privacy – could be reduced to minimal levels. The guidance could suggest procedures through which researchers and IRBs might collaborate with institutional information technology officers and related experts to identify privacy and security risk reduction measures in proposed studies.

Recommendation Three:

OHRP guidance should suggest possible methods by which investigators planning or institutions hosting “big data” studies might solicit or otherwise identify the concerns of populations involved in such studies. Such mechanisms would serve to respect principles of autonomy and beneficence, and if undertaken, would ameliorate IRB concerns regarding proposals for waiver of consent. Such measures might include – depending upon the nature, scope and siting of the research – holding focus groups, or consulting (or empaneling) community advisory boards.

Recommendation Four:

OHRP should reiterate the application of the “engaged in research” standard to institutions whose only involvement in big data studies is to provide identified data to external researchers. Further, use of central or single IRBs to consider applications for waiver of consent across institutions engaged in big data studies should be described in OHRP guidance. Central or single IRBs should be characterized as an appropriate means to ease consideration and analysis of proposed big data studies.

Recommendation Five:

SACHRP recommends that OHRP consider a change to the regulatory structure, which could include a new or revised exemption category structured to account for research involving big data. Such an exemption would need to address the standards for application of the exemption, such as the types and categories of data to which it applies, and a general description of the required types or levels of privacy protections. Such an exemption might also be conditioned on the researcher’s meeting standards for transparency of the study activity and for soliciting some level of opinions and concerns about a proposed study from the affected populations. Finally, any such exemption should be written with an awareness that technology changes rapidly, and the regulatory language therefore should not address specific technology features.

Recommendation Six:

Institutions that conduct quality assurance, benchmarking and similar studies using real world big data should consider mechanisms to minimize risk in those studies, even when they do not represent human subjects research. In guidance, HHS might suggest ways in which institutions could undertake such a process, as part of an overall program of considering how studies – both human subjects research and non-research – might pose privacy or other risks to patients and clients, and how those risks might be reduced.

Recommendation Seven:

FDA should clarify, through interpretation or guidance, or through exercise of enforcement discretion, that big data studies not involving prospective interventions do not constitute “clinical investigations,” thus allowing these studies to be conducted without specific consent from participants and to be submitted in support of FDA applications. Such studies can include post-marketing surveillance, as well as research use of previous data sets as historical controls, and new interpretations by FDA or its exercise of enforcement discretion could, in fact, encourage the use of these historical, existing data to yield valuable safety information and make some otherwise unduly large studies reasonable to perform.

Recommendation Eight:

OCR should clarify the extent to which HIPAA applies to big data research. It should engage in dialogue with the research community about mechanisms to assure appropriate privacy and security safeguards while minimizing impediments to the conduct of big data research. Guidance may be in the form of interpretation or modified regulations. In particular, the accounting of disclosures requirement should be eliminated in this category of research; challenges posed by respecting the various rights of patients (e.g., right to access records, right to challenge records) should be addressed, with institutions instructed how to avoid these complications through, for example, consideration and adoption of institutional definitions of “designated record sets.”

Introduction

Studies of already-collected data relating to human beings are undertaken with increasing frequency in industry, government and academia. These types of “real world” data studies[1]hold tremendous promise for improving human health and welfare, but can also present a risk to privacy. This is complicated by the fact that, typically, there is no consent process to collect these data, under which future uses of the data have been defined and permission for those uses has been obtained; and even when data are collected in research studies under IRB-approved consents, those consents are often lacking in information about, and permission for, future use in other studies. When multiple real world datasets are combined into “big data,” then the promise of useful knowledge grows, even while privacy risk increases as personal facts from many sources are identified to individuals. The questions that emerge from “real world” and “big data” studies relate both to regulatory compliance with human subjects research regulations as well as with more basic social and research ethics, particularly the appropriate protection of human privacy and the scope of consent when the data were originally collected – or even whether any consent was obtained at all.

In early 2014, OHRP asked the SACHRP subcommittees to consider whether research involving “real world data” and “big data” is intrinsically different than other types of research, and whether there should be interpretation or alteration of current regulations such that studies using these data should be treated differently, or held to different standards, than other research. The concern has been that regulatory requirements often confuse sponsors, IRBs, sites and investigators seeking to conduct big data research and that some potentially valuable research utilizing these data may not be conducted, or may be unnecessarily impeded, by current interpretations of human research regulations. SACHRP believes that research involving these types of data provides a significant opportunity for generating beneficial knowledge for society, that research using already-collected data is often of minimal risk, and that concerns about protection of personal privacy are best dealt with by emphasizing stringent conditions on security and use, rather than by disapproving and preventing the research. In general, the use of real world data (and of previously-collected research data) for research should be encouraged based on this risk-benefit profile, with appropriate attention to concerns about privacy and respect for autonomy.

Recent NIH Guidance Relating to Genomic Data: Re-Use of Data Previously Collected for Research

Big data studies most often are conducted with “real world” data that have been collected in the course of ordinary commercial transactions, routine information collection for government, and delivery of standard health and social services, and administration of benefits programs. In addition, data originally collected for pure research uses can also be aggregated and re-used for big data studies. A good example of the potential use of big data originally collected for research, and then planned for re-use many times for other research, is described in a 2012 White House announcement regarding a large NIH genomic data project:

National Institutes of Health – 1000 Genomes Project Data Available on Cloud: The National Institutes of Health is announcing that the world’s largest set of data on human genetic variation – produced by the international 1000 Genomes Project – is now freely available on the Amazon Web Services (AWS) cloud. At 200 terabytes – the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs – the current 1000 Genomes Project data set is a prime example of big data, where data sets become so massive that few researchers have the computing power to make best use of them. AWS is storing the 1000 Genomes Project as a publically available data set for free and researchers only will pay for the computing services that they use.[2]

The 1000 Genomes Project is regarded by NIH as “open access” data, freely available on the web and usable by anyone, for any purpose. More recently, in August 2014, NIH issued a general policy on the sharing of genomic data,[3] under which research institutions receiving NIH funds for projects generating genomic data must agree to share those data through NIH platforms and must, prospectively, obtain informed consent from research subjects for the sharing of even their de-identified data. In imposing this requirement of obtaining informed consent for use of de-identified data, NIH has gone beyond what is strictly required for use of these data, either under the Common Rule or, when applicable, HIPAA.

In any event, institutions must certify to the NIH that the de-identified data, which have been collected under such a broad consent, may be used either on an unrestricted basis or on a “controlled access” basis. Under “controlled access,” those researchers who seek to access and use the data must agree to a number of terms and conditions, including:

Using the data only for the approved research;
Protecting data confidentiality;
Following, as appropriate, all applicable national, tribal, and state laws and regulations, as well as relevant institutional policies and procedures for handling genomic data;
Not attempting to identify individual participants from whom the data were obtained;
Not selling any of the data obtained from NIH-designated data repositories;
Not sharing any of the data obtained from controlled-access NIH-designated data repositories with individuals other than those listed in the data access request;
Agreeing to the listing of a summary of approved research uses in dbGaP along with the investigator’s name and organizational affiliation;
Agreeing to report any violation of the GDS Policy to the appropriate DAC(s) as soon as it is discovered;
Reporting research progress using controlled-access datasets through annual access renewal requests or project close-out reports;
Acknowledging in all oral or written presentations, disclosures, or publications the contributing investigator(s) who conducted the original study, the funding organization(s) that supported the work, the specific dataset(s) and applicable accession number(s), and the NIH-designated data repositories through which the investigator accessed any data.

The NIH guidance contemplates that much data will already have been collected in studies in which consents did not anticipate future uses either of identified or de-identified data. The guidance therefore calls for “an assessment by an IRB, privacy board, or equivalent body … to ensure that data submission is not inconsistent with the informed consent provided by the research participant.” NIH has indicated that it will accept data derived from de-identified cell lines or clinical specimens lacking consent for research use that were created or collected before the effective date of its policy.

Origins of “Real World” Data Used for “Big Data” Research; Defining “Big Data”

Real world data are collected outside of a clinical trial, and often through the process of health care delivery (e.g., electronic medical records, health insurance claims). For purposes of this analysis, “big data” refers to large, often aggregated real world data sets that are beyond the ability of commonly available stand-alone personal computers and software to handle. “Big data studies” are not those studies that are undertaken as primary data collection from or about individual persons, but rather are studies that are undertaken on data that have already been (or are being, in a prospective manner) collected from or about individual persons in the course of other organized research, clinical care, and ordinary commercial and civic activities. “Big data studies” are distinguished from real world data research using already-collected data (such as retrospective medical records research) by the massive quantities of data aggregated and analyzed, the wide variety of data sources, locations (e.g., local servers, network servers, the cloud) and forms (e.g., numbers, addresses, images, text, emails, video, genetic sequences) of those data, and the increased velocity of analysis, primarily through the use of algorithmic electronic programs designed to identify unique features or patterns in aggregated data. It is also possible to theorize that “big data” studies combining myriad sources of information about identified persons may pose a greater aggregate risk to privacy than studies using single sources of data, such as a set of individual medical records.

A subset of “big data research” uses ongoing and constantly replenished and revised data systems, with analysis updated in real time as new information becomes available. In some instances these may be ongoing “longitudinal” studies; may involve Bayesian designs for data collection and analysis; and can involve “adaptive” study designs that change as new information becomes available and is added to the data being analyzed. Increasingly in the social and behavioral research context, longitudinal data systems link multiple ongoing data streams (e.g., student records, employment, social welfare services, health records, police encounters, arrest records), and these study designs can, over time, create risks of re-identification and misuse that are not present in studies using static data sets.

“Research” that uses real world data or big data can be undertaken for governmental, non-profit, academic, social and/or commercial purposes. Federal and state law (or law of other sovereign nations) may touch upon real world data research, mostly by controlling or limiting the sources of data that are intended to be aggregated for research use. For example, entities and persons that are not health care providers or insurers may seek to obtain identifiable data from health care providers or insurers. In these situations, these researchers, though often themselves not bound directly by HIPAA and other privacy laws, are seeking to extract data from entities that are bound by these privacy laws, and such extraction may not occur without the health care and insurer entities complying with those laws. Public and private entities may be governed by other laws that regulate how they may – and may not – allow researcher access to personal data they collect, such as the Fair Credit Reporting Act, Children’s Online Privacy Protection Act, the federal Privacy Act, Family Educational Rights and Privacy Act, and federal drug and alcohol treatment privacy regulations. Moreover, federal law governing human subjects research can directly govern real world, big data research activities when they are conducted using federal funds, occur in institutions that have agreed to subject all their activities to the Common Rule, or are conducted under FDA jurisdiction, including studies done with an intent for results to be used to support FDA submissions.

Current Situation regarding Big Data Study Use of De-Identified or Anonymized Information

Much of the use of real world data does not involve “human subjects” because data used have been stripped of identifiers (or for federally-funded research, because the research activity meets the requirements of the OHRP human research “coded data” guidance[4]). Data may have been de-identified or anonymized by various standards, such as those of the Common Rule, HIPAA, or EU member state laws. If de-identification or anonymization eases the legal and ethical restraints on the acquisition, processing and use of real world data for research, then a primary question becomes: what constitutes de-identification or anonymization that is sufficient and appropriate to allow the researcher to bypass consent and waiver of consent?[5]

If data retain any variables that could identify an individual, then use of the data could constitute human subjects research that would fall under either 45 CFR 46 (if federally funded) or under FDA regulations (if intended to be used to for an FDA submission). If, however, the entities and persons undertaking a real world data study are neither using federal funds nor operating under the jurisdiction of the FDA, then the primary federal laws governing human subjects research are not applicable, and studies can be undertaken without, for example, IRB approval, adverse events reporting, or informed consent.

An exemption from the requirements of the Common Rule may be available for some big data research, under 46.101(b)(4), which covers research using existing data, if the researchers record those data in a way that “subjects cannot be identified, directly or through identifiers linked to the subjects.” This exemption would allow researchers using already-compiled data to combine data sets and conduct research with those combined data, as long as the data recorded are not identifiable. Using this exemption may prove difficult in practice, however, because after obtaining, combining and using data sets, researchers would need to destroy all identifiable information, including source database copies they may have obtained; and this may not be preferable or even adequate for additional research studies, or for assuring research integrity.[6] For much of the research with real world data and big data that does meet the definition of human subjects research and is federally funded, a waiver of informed consent may be applicable under 45 CFR 46.116(d), as described more fully below.

HHS Regulatory Considerations Regarding Classification of Projects involving Real World Data

When federally-funded research is analyzed as to the applicability of the Common Rule, the process generally involves a triage of the applicable regulatory categorization and the subsequent requirements. The order is:

1. Is it research? (no, if the activity is not designed in a systemic way to produce generalizable knowledge)
2. Is it research involving human subjects? (no, if the data have been de-identified or are anonymous, having been collected through no interaction or intervention with living humans)
3. Is it exempt research? (categories listed in 46.101(b)(1) through (6))
4. Is the institution engaged in research?
5. Is it research that can be expedited?
6. Is it research that requires convened IRB board review?

Common Rule Analysis: Is it research involving human subjects?

In some cases, even if there is a possible way to link the individual to the data, it might be acceptable to find that there are no human subjects involved because the identity of the individuals cannot “readily be ascertained by the investigator or associated with the information.”

Under 45 CFR 46.101, a human subject is defined as follows:

(f) Human subject means a living individual about whom an investigator (whether professional or student) conducting research obtains

(1) Data through intervention or interaction with the individual, or
(2) Identifiable private information.

Intervention, in turn, includes both physical procedures by which data are gathered (for example, venipuncture) and manipulations of the subject or the subject's environment that are performed for research purposes. Interaction includes communication or interpersonal contact between investigator and subject. Private information includes information about behavior that occurs in a context in which an individual can reasonably expect that no observation or recording is taking place, and information which has been provided for specific purposes by an individual and which the individual can reasonably expect will not be made public (for example, a medical record). Private information must be individually identifiable (i.e., the identity of the subject is or may readily be ascertained by the investigator or associated with the information) in order for obtaining the information to constitute research involving human subjects.

Therefore, under the Common Rule, often there will not be human subjects involved, simply based on 46.101(f) standards; and even if the research activity does involve human subjects, with the identity of the subjects being “readily … ascertain[able] by the investigator or associated with the information,” a waiver of consent under 46.116(d) would be available. This indeed appears to be consistent with the ways in which many IRBs already are considering research using big data.

Engagement in Research

Often research involving real world data relies on data gathered at many different and distinct institutions or entities. There needs to be an assessment of whether each of the institutions is “engaged in the research.” As a practical matter the same finding will often apply to multiple institutions. However, the issue is complicated by variation in interpretation of regulations and guidance at individual institutions, as well as unique local policies that in some cases exceed regulatory requirements. One issue is whether an entity that provides identifiable data to another entity for the other entity’s research purposes would itself be engaged in the research activity. In most cases, the entity providing the data would not be engaged in the research – even though, for other regulatory and legal reasons, the entity should disclose the information to the research entity only under color of law (e.g., as allowed by informed consent, as permitted by HIPAA, as permitted by a HIPAA exception).[7] If the research activity is federally funded, the entity seeking and obtaining the identifiable data would, however, be engaged in the research and would be required to comply with the Common Rule and relevant guidance.

An exception defined by OHRP guidance allows an institution that would otherwise be engaged in research by virtue of its obtaining identifiable data from another institution to avoid this status, by accepting only coded information, with the key held solely by the institution providing the data.[8]

FDA Regulatory Considerations Regarding Classification of Projects involving Real World Data

For FDA analysis, the triage categories are slightly different:

Is it a clinical investigation?
Is it exempt? (categories at 56.104(a) through (d))
Is it research that can be expedited?
Is it research that requires convened IRB board review?

Under the FDA definition of a human subject and a clinical investigation, there is a distinct regulatory analysis. FDA regulations that require informed consent from subjects for all “clinical investigations,” and that contain no exceptions through a waiver of consent process, present particular difficulties, especially if an industry entity seeks to aggregate data from many previous studies in order to understand efficacy and safety. (Such studies would presumably be facilitated by the EMA policy requiring sponsors to make subject-level data publicly available for research use by others.[9]) Would, therefore, a real world or big data study using these aggregated previous study data be allowed to be used for FDA submissions when the terms of initial consents do not clearly allow this future use? And would it matter to the FDA that the data aggregated and used have been anonymized or de-identified before research use?

A human subject under 56.102(e) is:

[A] individual who is or becomes a participant in research, either as a recipient of the test article or as a control. A subject may be either a healthy individual or a patient.

The definition of a clinical investigation under 56.102(c) is:

(c) Clinical investigation means any experiment that involves a test article and one or more human subjects, and that either must meet the requirements for prior submission to the Food and Drug Administration under section 505(i) or 520(g) of the act, or need not meet the requirements for prior submission to the Food and Drug Administration under these sections of the act, but the results of which are intended to be later submitted to, or held for inspection by, the Food and Drug Administration as part of an application for a research or marketing permit. The term does not include experiments that must meet the provisions of part 58, regarding nonclinical laboratory studies. The terms research, clinical research, clinical study, study, and clinical investigation are deemed to be synonymous for purposes of this part.

Under FDA regulations, one therefore must ask, for a real world data or big data study, whether the research involves human subjects who “participa[te] in research, either as a recipient of the test article or as a control.” To ease regulatory concerns about real world or big data research, FDA could clarify that human beings are not “recipients of a test article” if the purpose of the proposed big data study is to look at records that involve the use of test articles that were given to individuals for the purposes of the primary study, but not originally given to those individuals for the purpose of obtaining data for a later big data study. One must also ask whether FDA would consider a big data study (using either previously collected clinical trial data or data collected during standard of care) to be a clinical investigation simply because the data will be submitted to FDA for safety and efficacy claims. If FDA takes the stance that these projects do not involve human subjects, then by default the project is not a clinical investigation, and FDA could accept the data as having been gathered in a context other than that of a clinical investigation. If, on the other hand, FDA does consider these to be clinical investigations, then consent is necessary – unless, that is, FDA elects to use enforcement discretion to tolerate IRB waiver of consent in such big data research studies, just as FDA currently allows for certain investigations of In Vitro Diagnostics with anonymous tissue.

Problems with current interpretations of “clinical investigations” tend not to arise in situations in which one industry entity seeks to combine its own clinical trials data sets, simply because each industry entity tends to use consent forms that allow that sponsor to use and re-use data for all the sponsor’s regulatory submissions. Instead, this problem tends to arise in contexts in which industry entities, for the purpose of post-marketing surveillance or post-marketing studies, or for use as historical controls, might seek to use another entity’s clinical trials data, or to use real world data gathered in standard of care treatment; in those contexts, there will either have been no consent form, or the form used would not have included permission for other industry entities to use the data for their own regulatory purposes. Without that consent, the FDA requirements have not been met. More recently, largely as a result of the adoption by the European Medicines Agency (EMA) of standards for clinical trials data sharing,[10] industry sponsors of research institutions have begun to use consent forms that describe a broader scope of future research uses. One example of language that might be used in this regard is attached as Appendix A. As time progresses and as more academic and industry entities seek to comply with EMA and other data sharing requirements and adopt broader consent language, some of these consent form issues under FDA regulations may abate. Nevertheless, as described above, the most useful approach at the present time that FDA could adopt would be to interpret big data studies as not constituting “clinical investigations,” with the result that the results of those studies could be submitted to FDA in support of applications. In any event, given the promise of better science offered by these studies, it would be in FDA’s interest to encourage them, and one means to do so is to cease regarding these studies, in a regulatory sense, as “clinical investigations” for which individual consent is required.[11]

Consent for Secondary Use of Data in the Context of Real World Data Research

Research with real world data most often involves secondary use of data. Frequently, consent has not been obtained for the secondary use because such use was not contemplated at the time of the original collection of the data. For this reason, any researcher considering a real world or big data study must consider the terms of consent gained for the primary acquisition of data that are to be used for the big data aggregation and study – or whether there was no such consent gained at all.

In many cases of data collection, there is no consent process, as when one pays cash for an item at the grocery store, with a camera recording data about the purchaser, including his or her apparent race, gender and age. In such a case, as when the data collection occurs in a public setting, there may be a theory of “implied consent” – that the individual impliedly consented not only to the primary data collection but also to the “downstream” data uses, simply by virtue of the individual’s having sought a good or service.

In other cases, however, such as in complex medical procedures, significant commercial transactions, and clinical research, consent forms, either for medical care, research or consumer transactions, may make various sets of promises – or be silent – on sharing and use of data for future research and other studies, including those with commercial or policy (and not academic) motivation. In the research context, most learned observers tend to believe that explicit promises must be honored, but it remains unclear how this would apply to data once they have been de-identified or anonymized. As described above, NIH has now taken a position that prospectively, in using NIH funds for genomic data studies, consent must be gained, at the primary study point, for future use even of de-identified data. Commercial consents, on the other hand, have long contained broad permissions for use of personal data, and only that law that applies to contracts and to financial institutions may govern the interpretation of the rights and obligations of the parties in regard to sharing and using those data.

In determining whether to grant a waiver of consent, an IRB typically would inquire after the terms and conditions under which the data were originally collected. If the original consent did not contemplate future research uses of the primary data, or if the informed consent was silent on these issues, or if there was no informed consent gathered at time of original data collection, then, for research subject to the Common Rule, IRBs could consider an application for waiver of informed consent. On the other hand, any request for a waiver of consent would likely be disapproved by an IRB if the original consent had expressly promised, for example, that other than the primary study, there would be no later research uses of the data.

When IRBs receive research proposals seeking waiver of consent for research use of big data, they generally proceed as with research involving smaller data sets, and apply the standards of 46.116(d):

(d) An IRB may approve a consent procedure which does not include, or which alters, some or all of the elements of informed consent set forth in this section, or waive the requirements to obtain informed consent provided the IRB finds and documents that:

(1) The research involves no more than minimal risk to the subjects;

(2) The waiver or alteration will not adversely affect the rights and welfare of the subjects;

(3) The research could not practicably be carried out without the waiver or alteration; and

(4) Whenever appropriate, the subjects will be provided with additional pertinent information after participation.

Under these criteria, IRBs are allowed to consider the issue of practicability and waive consent when it is impracticable to obtain it, which may have a more straightforward application in the big data context. For example, if the investigator submits a request to review 2,000 medical records, or to combine vast datasets from public benefit programs, an IRB should readily be able to find that it is impracticable to conduct the research without a wavier of consent, simply because of the vast number and dispersion of the research subjects. From a practical perspective, the usual “practicability” measure used by IRBs is administrative burden or cost, and these would both seem excessive, and thus “impracticable,” in most big data studies for which IRB approval is required.[12]

Further, in a previous communication to the HHS Secretary, SACHRP has recommended that “practicability” for the purposes of analyzing waiver applications should include considerations of scientific validity of studies, and of how studies involving huge subject populations, a significant portion of whom would be lost to follow up and not reachable, might be compromised if consent were required.[13] Real world big data studies would be prime examples of studies whose validity would be undermined by a requirement that consent be obtained, when obtaining consent from even a major portion of subjects is not reasonably possible or feasible.

A more difficult obstacle to obtaining IRB approval of big data studies – at least as has been related to SACHRP by various respondents in public sessions – has been IRBs’ unwillingness to grant waivers of consent out of fear of some harm coming to subjects. The main risk to human subjects in these contexts, as identified by IRBs, has been breach of confidentiality and any resulting psychological, social, legal, or financial harm. Unfortunately, as described by some “big data” researchers who have sought waivers of consent under these standards, IRBs often are reluctant to grant waivers, finding it easier to deny a waiver altogether than to approve a waiver in which there is any risk at all – however small and theoretical – of harm to participants. Some researchers have asserted that even when IRBs do grant waivers of consent for these studies, they delay in doing so, perseverating over possible risks to participants; and reportedly, inordinate delays in approving waivers either discourage researchers or disincentivize them to seek additional waivers in other possible studies, thus handicapping science.

Indeed, IRBs considering proposals for real world or big data studies might focus less on potential harms to subjects – most of which are speculative and very few of which have ever materialized as demonstrable harms – and focus more on additional measures to protect subjects once a waiver has been granted and the research proceeds. IRBs might also focus on the existing protections for personal health and other data that are legally required of investigators and their institutions, such as HIPAA, federal drug and alcohol treatment confidentiality regulations, consumer privacy laws, and the Federal Information Security Act (FISMA), among others. If an institution and its research are already required by law to adhere to rigorous privacy protections and precautions, and if their past compliance record with those requirements is acceptable, then IRBs considering a waiver of consent application should be encouraged to accept such existing compliance efforts as presumptively sufficient to protect the privacy of subjects. Further, even if those existing protections are deemed insufficient in particular cases, IRBs might ask whether, for example, the researchers can deploy additional security and privacy procedures and technology, including strictly limiting access to the aggregated study databases, and what screening measures have been used to select the research team members who would have access to identified data. Given the use of the “cloud” for hosting databases and for analyzing big data, IRBs might usefully set expectations in the terms of use agreed to by researchers in this context – for example, have the researchers agree to a stipulation that the provider may change terms of service or security at any time; ask whether the “cloud” and the software used for the research create access markers and audit trails; and determine whether the research institution, the IRB and the researchers have the ability to access those trails for audits.

Although IRBs may well feel that these sorts of information technology standards are beyond their expertise, IRBs and researchers are nevertheless most often part of research institutions or entities that receive federal funds, and that, even if not covered by the requirements of HIPAA’s Privacy and Security Rules, nevertheless have often signed pledges to abide by the Federal Information Security Act (FISMA), thus requiring assessment of all risk of electronic and other disclosure – and of all information in federally funded research projects, not limited to personal health information. In short, even though IRBs most often lack expertise in these areas, they nevertheless are required by the Common Rule to consider privacy risks, and are part of institutions that have other compliance obligations and thus other compliance expertise. Joining IRBs with other institutional sources of information technology and security expertise so that IRBs may rely on such expertise seems, at this point, not only indicated, but necessary, as IRBs will review, consider, approve and perform continuing review on big data studies that typically must use advanced technology for data aggregation, storage, and analysis.

For IRBs to focus more on protective measures that would reduce risk in big data research, instead of meditating primarily on unmitigated risk scenarios, might ease consideration of waiver requests, and result not only in more rapid and better science, but also in improved and appropriate humans subjects protections.

If waiver of consent will be used widely in these big data studies, then some method by which investigators might solicit or otherwise identify the opinions of the affected populations about a proposed study might be considered, not as a means of gaining “community consent,” but rather as a method of identifying, and thereafter addressing, issues of concern and of assuring some degree of transparency about the study itself. Establishing community advisory boards, or engaging focus groups from the affected populations, are examples of how this might be accomplished. Such consultations presumably should continue over the course of a lengthy big data study, particularly in studies that are prospectively obtaining new data as they are generated in the course of real world activities.

Finally, in the event that multiple institutions are “engaged in research,” then use of a central or single IRB, on whose actions the engaged institutions may all rely, would obviate the possibility of conflicting or inconsistent grants of waivers of consent.

HIPAA/HITECH and Other Federal and State Privacy Laws

HIPAA and other federal and state medical privacy laws may directly restrict the use of medical and mental health data for real world data research undertaken by health care providers, insurers or other employee benefit programs. In general, health care providers and insurers (because they are covered entities under HIPAA) cannot use identifiable data for real world research, or share the data with others to conduct their own research, unless there is either consent/authorization of the data subjects, waiver of consent from an IRB or a privacy board, or the research can be considered analysis for management, quality assurance or quality improvement, as addressed below.

While HIPAA permits data-sharing for these purposes, it also creates requirements, questions and challenges for researchers and their HIPAA covered entities that receive, use, or disclose data for real world and big data research. For example, an IRB-approved waiver satisfies OHRP regulations, yet for HIPAA covered entities, such a waiver means that the researcher (or research institution) must track disclosures of each person's data so that a full HIPAA "accounting of disclosures" can be provided upon request. This tracking responsibility (which cannot be done manually for big data sets) is challenging, and may be nearly impossible for researchers who collaborate on big data analyses with other institutions. Even the alternative allowed by HIPAA that a covered entity maintain a list of studies using more than 50 individual records[14] may be difficult to implement, given the velocity of big data studies, the multitudinous numbers of big data studies in operation at any one time, and the electronic nature of much of this research. Further, any increased use of central IRBs to approve and oversee big data studies eliminates day-to-day familiarity that an institution-based IRB has in regard the data studies it has directly approved, and this represents a possible information gap between an institution’s medical records office (which typically administers the accounting of disclosures compliance) and the big data researchers who use the institution’s data for their studies.

Another challenge under HIPAA is whether, or when, big data sets become the "protected health information" (PHI) of the researcher's covered entity. This question is important because covered entities have numerous obligations with respect to their PHI (such as allowing individuals access to PHI, complying with other individual rights, and analyzing security incidents or breaches, including of limited data sets when the identity of the individuals is not even known to the research institution). When a researcher accesses big data – which by definition is too large to "reside" at the home institution, then one must ask whether that access to and "receipt" of data means that the covered entity now must treat the data as the entity's own PHI in all respects. Alternately, it may mean that the researcher must adhere to core HIPAA privacy and security safeguards (such as not sharing the data beyond the authorized research team), but not automatically treat the big data as the covered entity's PHI for all purposes. It is critical in this regard those institutions proactively define their “designated record set” for HIPAA purposes as not including such research databases, as they presumably are used solely for research purposes, and not for treatment, operations or payment. However, defining a designated record set to exclude such research databases may result in those databases not being eligible for parallel use for operations such as quality assurance, benchmarking and billing analysis.

Questions also arise in the big data context about HIPAA entities' inherent need to rely on outside organizations in order to make this research possible. For example, if a group of medical centers accesses big data hosted by an outside company (such as a cloud service provider), would each center need to negotiate a HIPAA business associate agreement with the cloud company, even if the company maintains the data in encrypted form and lacks the encryption key to access the data? While OCR has provided business associate agreement templates, large companies (such as cloud providers) typically have their own agreements, as does each medical center, and negotiating these can be more time-consuming and complicated even than a set of IRBs resolving differences in their informed consent forms or their grants of waivers of consent. These and related questions would benefit from OCR guidance, particularly since "research" itself is not a HIPAA covered function and since developments in big data research continue to evolve rapidly and to present novel challenges to existing regulatory frameworks.

One of the privacy laws most important for real world or big data research that is commercial in nature and not otherwise covered by the Common Rule or by FDA regulations is the federal Fair Credit Reporting Act[15] (and various similar state laws), which establishes certain rights of individual persons to review the employment and financial information kept on them by consumer credit reporting entities and specialty agencies (such as agencies that sell information about check writing histories, medical records, and rental history records). These laws give individuals the right to obtain their credit files, review the data contained therein, and seek corrections to inaccurate information. Further, these data-collecting agencies may provide identified information only to people with a valid need – usually, for example, to consider an application with a creditor, insurer, employer, landlord, or other business. For “real world data” and “big data” research activities, the agencies covered by these laws may be prohibited from serving as a real world data source in some cases, and when they are a source, there may be an established method for identified individuals to challenge database information about them. These laws are important in the big data context because much big data use is for commercial and industry research and development, and the sources for these data may include these FCRA-covered entities.

European Union Data Protection Standards

The EU privacy laws, either in their current form (as an EU directive) or their impending form (as a directly applied EU regulation), apply to consumer and personal financial information, as well as health and mental health information, and essentially make it illegal for identifiable data to be processed or shared without individual consent, which is most often gained at the time the person begins to receive services or do business with an entity in the EU. Further, EU laws do not allow the sharing of information to entities outside the EU, even if those entities have a relationship with the EU entity that initially obtained the data and have a business reason for obtaining and processing the data, unless certain conditions are met, such as an “unambiguous consent” from the data subject to the sending of personal information outside the EU, or unless the entity outside the EU has entered into a contract with the EU entity under which the non-EU entity essentially subjects itself to EU standards and jurisdiction in regard to the shared data. In the big data research context, this regime results in significant potential limits on the ability of researchers to draw data from EU member states. Other countries outside the EU also have laws that may limit the export of data and the “downstream” uses of data for any research, including big data studies.

Real World, Big Data Studies as Quality Improvement; “Benchmarking” as a Research Activity

Identified data may be used in real world, big data research without consent if the “research” can legitimately be classified as conducted for purposes of quality assurance, quality improvement or management and administration oversight. In these cases, however, research and the other activities may not be mutually exclusive: an activity could be both at the same time, with elements of both quality improvement and research, or a management analysis project or quality assurance program could evolve into research, depending on what is found, how the project unfolds and how and whether the intent behind it broadens or changes. OHRP has recently posted two letters that describe in some detail OHRP’s application of human subjects research regulations in the context of “big data” quality improvement studies, registry studies, and studies using de-identified data collected as part of standard of care.[16] This OHRP correspondence reiterates that institutions and facilities whose sole involvement is providing data – even identified data – for studies are not “engaged in research,” and that central or single IRBs may be useful for studies in which multiple sites contribute data. In general, SACHRP views these letters as expressing reasonable, appropriate and useful application of the Common Rule to this group of “big data” studies.

“Benchmarking” is sometimes cited as an activity that straddles the line between human subjects research and QA or management analysis, but in reality, “benchmarking” may refer to a number of distinct activities, each of which may have its own risk of crossing over into research. Benchmarking may, for example, denote such distinct activities as performance benchmarking, process benchmarking, and “best practices” benchmarking – meaning, respectively, across organizations and/or within one organization: collecting and analyzing data about performance and outcomes measures; collecting and analyzing various processes in place for production of goods and services and their various levels of success; and collecting and comparing (and analyzing for possible adoption) the practices and policies of the overall best-performing organizations within a specific economic, academic or service activity. In many cases, benchmarking requires the aggregation and systematic analysis of massive data sets, as for example, in an effort to understand and compare health outcomes as they may be influenced by various clinical and laboratory procedures.

A recent national emphasis on “learning health systems” – in which health care providers conduct real-time, ongoing analysis of performance data and concurrently identify best practices and integrate them into daily care – itself depends on analysis of millions of clinical data points that are replenished, renewed and revised continuously. Other programs, such as the Qualified Clinical Data Reporting System and the Physician Quality Reporting System, compile massive quantities of data relating to provider performance, for the purpose of analyzing and understanding “best practices” in clinical care. Participation in these systems by providers, and indirectly by patients, whose anonymized data and outcomes information are the source for information relating to provider performance, are mandatory to receive CMS supplemental payments. Although the primary purpose of these programs is quality improvement generalizable knowledge may be derived from analysis of the data.

One of the primary questions in the constellation of “learning health systems” activities – which although intended to improve care in specific settings, also result in generalizable information – has been whether these systematic analyses constitute human subjects research, or not. If these activities do qualify under applicable regulations as human subjects research, then, of course, IRB review and approval of protocols, and continuing review, would be required, and would presumably include either the consent of each patient in the learning health system, or a waiver of consent granted by the IRB. In terms of practicability, an IRB may well believe that as each patient in a health system must go through an admission process already, in the course of normal patient flow, then it would be entirely practicable for the health system to obtain informed consent for use of personal data for research. Yet this also, of course, means that individual patients may choose to opt out of the use of their data for learning health systems research – even though those same patients receive ongoing benefit from their fellow patients’ participation in the “research,” and even though if a substantial number of patients refuse their consent, this would undermine the effectiveness of the “research” itself, as study populations would become unrepresentative of the whole.

Some commentators, including Nancy Kass at the recent IOM convened meeting on learning health systems, have opined that our current regulatory categories of “research” and “non-research” are outmoded, and have proposed that these big data activities in learning health systems instead be analyzed and regulated according to evaluation of risk, with pure data analysis of standard of care delivery being classified as a very low risk activity, certainly not meriting full IRB review and approval and specific informed consent.[17] But given that the regulatory categories of what is and is not “research” and the requirements for waiver of consent have been set, and given that regulatory inertia may make them effectively immutable for the time being, there are two possible approaches here. One approach is to choose to regard these activities, if their intended goal is to improve care within a health system, as non-research, even if the process within a health system produces, as a useful by-product, generalizable knowledge. The other option would be for IRBs to exercise aggressively their discretion to grant waivers of consent, finding that the marginal cost of gaining consent from each patient for use of his or her data for these purposes is impracticable, when applied over an entire patient population, and that the risks of these data analysis activities are minimal or less.[18] Either approach would be facilitated if OHRP were to adopt guidance that contemplated their appropriateness, in defined circumstances. The primary consideration in both approaches, however, would be to assure that the activity presents no more than minimal risk, or that risk mitigation mechanisms be imbedded in the activity so that risk is reduced to a minimal level.

Some commentators have asserted that quality assurance and similar studies, even if designed and conducted not to yield generalizable knowledge, nevertheless should be subject to institutional risk analysis, to assure that risk to patients or clients is minimal and has been minimized, to the extent possible.[19] In big data studies that do not qualify as “human subjects research,” the same considerations may pertain, with institutions, in an optimal world, analyzing and minimizing risk to protect patients. Further, if such a study later became research because intentions of those conducting the study have veered toward or have begun to include the seeking of generalizable knowledge, then at least risk will have been minimized in the pre-IRB protocol period; which, in turn, may make it somewhat easier and less anxiety-provoking for an IRB, when considering a new application for approval of a study in such a context, to grant a waiver of consent.

Registry Studies

Registry studies are those in which subjects (who are patients) typically receive standard of care clinical services and whose data, collected in the course of regular clinical care, are compiled and analyzed in order to understand some feature of illness or of treatment – for example, the natural history of a disease, treatment efficacy, or factors that may contribute to adverse effects of standard therapies. Data from registry studies, especially if collected across treatment sites or systems, can also be used to compare efficacy and efficiency of providers, at the level of individuals, sites or systems. Other than any risk inherent in the collection, storage and handling of personal data, registry studies in which patient-subjects receive standard of care pose no demonstrable additional risk to those patient-subjects; and if a registry study is done in the context of U.S. health care delivery, all risk inherent in the handling of personal data have already been mitigated – in almost all institutions and physician practices – by the requirements of federal and state medical privacy laws and regulations. In such cases, behavioral standards – through legislative and regulatory measures – have already been set, outside of the research context, so that risks to privacy have been mitigated and reduced to reasonable levels, assuming parties’ compliance with law. Although registry studies are, if undertaken as systematic attempts to derive generalizable knowledge, human subjects research requiring IRB approval, there should be little concern in most cases with IRBs granting waiver of consent on the basis of the minimal risk criterion.

However, the question remains, under current regulatory standards, whether it is impracticable in these studies for researchers to gain informed consent from patients/subjects, who are by definition presenting themselves in clinical settings for care and therefore are available to give informed consent. In order to meet the impracticability standard, researchers would need to demonstrate some actual hardship in gaining informed consent, and this would seem difficult to do in limited studies, in which the cost and trouble of seeking consent would be minimal. A much more compelling case might be made if the “registry study” were to be conducted over large populations, in many different institutions or systems, for which the transaction cost of gaining individual consent (and training clinical staff and interacting with them to assure that they obtained consent) became an actual burden on the research. Moreover, in such huge studies, it is likely not even possible to reach the individuals whose data would be used; and failure to be able to reach any significant portion of these individuals could lead to unreliability of the database formed with the data from those subjects who could be reached and consented. In those circumstances, registry studies may be so comprehensive and massive that they effectively become “big data” studies, and IRBs should readily allow waiver – especially given that actual study risk to subjects would be extraordinarily low.

There have been recent proposals that registry studies tracking delivery of standard of care and that involve entities covered by HIPAA (including the institution in which the research in conducted) are so low risk, and their potential benefit so great, that a new exemption should be adopted, allowing these studies to proceed without any IRB review. Establishing such an exemption would eliminate transaction costs and delays for these studies, and would pose risk to subjects that would be vanishingly small, as long as the protections of HIPAA are applied to the data. OHRP should consider crafting such a new exemption, with standards that include HIPAA protections for the data, as well as transparency in regard to the research methods and results and, if feasible, some modicum of consultation with subject populations, perhaps through, for example, focus groups or consulting established community advisory boards.[20]

Conclusions

Studies using “real world big data” are increasingly common in government, academia and industry, although they are regulated only in specific domains (e.g., federal funding, FDA jurisdiction) and/or with specific restraints on the sharing and use of personal data (e.g., HIPAA, COPPA, FERPA). For the purposes of SACHRP’s own remit, we can only address that which falls within federal, and specifically HHS, jurisdiction. Specific recommendations therefore have been limited to those that could be meaningfully implemented by OHRP and other parts of HHS. There are two sets of countervailing concerns here: one, that real world, big data studies are somehow of such a size and complexity that they present unusual and enhanced risks to subjects and therefore must be regulated in more stringent ways; and second, that these studies are not fundamentally different from other data studies, do not present enhanced or peculiar risks, and yet are being impeded by IRBs’ fears and their resulting reluctance or delay in granting the waivers of consent that would allow these valuable studies to proceed. Based on our review and on the various presentations offered to SACHRP, there likely should be, in most cases, more concern about issues expressed in the second view than in the first. We do know, and there appears to be general agreement that, the primary risk of these studies lies in violations of privacy; but to the extent that these studies fall under federal jurisdiction, they most often occur in settings in which privacy and security of personal information (especially health and mental health information) is already well protected by national regulatory standards. In such studies, wherein privacy risks are already reduced to minimal levels by uniform standards that apply to all use and disclosure of personal information, far beyond research activities alone, IRBs – except in peculiar cases, as when there is some evidence that institutions and researchers have a history of non-compliance – should be satisfied that risk is already being managed appropriately. On the other hand, when regulated research activities occur in settings and among institutions and researchers that are not already subject to such privacy practice standards, there ought to be much more concern and more searching inquiry by IRBs into levels of risk; although even in these cases, IRBs might concentrate on mechanisms and controls to reduce risk to acceptable levels, rather than perseverating on risks as though they are unavoidable or unalterable.

In short, real world big data studies – using data collected in real world activities of daily living or in previous or ongoing research – have enormous promise for advancing knowledge, in clinical care and in behavioral and social sciences. It is essential that such studies be allowed and encouraged, but when subject to federal regulation, those regulations and interpretive guidance must adequately and appropriately protect the privacy and welfare of subjects. As electronic data systems multiply and become ever more complex, it will be essential that SACHRP, OHRP and other offices within HHS track new issues and trends regarding this category of research, in order that regulations and guidance optimally both encourage and facilitate this research, while simultaneously protecting the privacy and welfare of subjects.

APPENDIX A

INSTRUCTIONS: This template was developed by a multi-stakeholder group led by the Multi-Regional Clinical Trials (MRCT) Center at Harvard and is intended to provide language that can be used in consent forms in order to describe to participants how their data are protected and how they may then be used or shared. Such language anticipates research practices and/or regulations requiring that participant-level clinical trials data be made available by sponsors to third-party researchers and to the public. The following language can be inserted as the “privacy” section of an informed consent form and has been drafted to enable broad use of the coded data for downstream research purposes. While this language seeks to provide “best practice” guidance in this area, each study site will need to consider whether customization of this language is required based on applicable national law and the research and privacy policies of individual study sites.

INFORMED CONSENT LANGUAGE FOR CONFIDENTIALITY AND DATA SHARING

What information about me will be used in the study?

If you join the study, information about you will be used for the study. This information may include your name, address, or birth date. It may also include information from your medical record. As part of the study, new information about you will be collected, such as heart rate, blood pressure and results of study tests, for example tests on your blood and other samples.

Who may see and use information about you and your health?

Information that directly identifies you is held at the study site. Study doctors and other people at the site who are assisting with the study or your care will be able to see this information. In certain cases other persons may need to see this information, for example, the ethics review committee (sometimes called an institutional review board) that reviews the study to ensure that it meets scientific and ethical standards. In addition, people from regulatory agencies overseeing the study, and persons engaged by the study site to help with the study, such as the site’s attorneys and data storage companies, may also need to see your information. The study site will use care to protect your privacy when sharing information and data, as described in the next paragraph. Some people or groups who receive your health information might not be required by law to follow the same privacy rules that the study doctors and study site must follow.

As part of the study, information and data will need to be transferred from the study site to SPONSOR and other researchers working with SPONSOR. Before this transfer takes place, researchers at the study site (the “Site Study Team”) will give your information a unique study number. This number will then be used in place of your name and other information that directly identifies you. We will call this information “Your Coded Information.” The Site Study Team will keep the link between your directly identifiable information and Your Coded Information. The Study Site Team gives only coded information to SPONSOR unless there is a regulatory reason that SPONSOR needs to see information that directly identifies you.

How will my Coded Information be used and protected?

SPONSOR will protect Your Coded Information as described here and will follow laws that protect the use of health information. SPONSOR and those working with SPONSOR will use Your Coded Information for health research purposes only. SPONSOR and those working with SPONSOR may use Your Coded Information in the following ways:

Keep it electronically and analyze it to understand the study and the study results.
Share it with regulatory agencies that approve new medicines and others as required by law. For example, it is possible that as part of efforts to make research data more widely available to researchers, regulatory agencies in some countries may require that Your Coded Information be made publicly available on the internet or in other ways.
Combine it with data from this study or other studies to learn more about [DISEASE/CONDITION] or other conditions, to develop new commercial products, medicines or devices, and to advance science and public health.
Use it to improve the quality of this study or other studies.
Publish summaries of the study results in medical journals, on the internet or at meetings so that other researchers may learn about this study. Your name or other data that directly and easily identifies you will not appear in any of these publications without your specific permission.

SPONSOR may also share Your Coded Information with other researchers for the purposes of researching [DISEASE/CONDITION] or other conditions, to develop new commercial products, medicines or devices, and to advance science and public health. If SPONSOR makes Your Coded Information available to other researchers, SPONSOR will take additional steps to safeguard your privacy. For example, before allowing other researchers to access Your Coded Information, SPONSOR will replace the unique code assigned to the information by the Site Study Team with a new unique code and may remove other information that may indirectly identify you. Despite these and other precautions, however, your privacy cannot be guaranteed.

Your Coded Information may be sent to another country where SPONSOR or researchers with whom SPONSOR shares Your Coded Information are located. This may include countries where the data protection laws are not as strict as the rules in the country where you live. In such cases, Your Coded Information may be protected less strongly and securely by the data protection laws of these foreign countries, as compared to those of your own country.

What other general information about this clinical study is shared?

A general description of this clinical study will be available on the SPONSOR Clinical Study Register: <insert web address>. Information about the study may also appear in clinical trial/study registries in countries in which the clinical study is conducted or in other countries where regulatory agencies require information about the study to be made available on a website known as a clinical trial registry. These websites may contain Your Coded Data, but they will not include information that can directly identify you.

[IF THE STUDY IS SUBJECT TO CLINICALTRIALS.GOV REGISTRATION REQUIREMENTS, INCLUDE THIS LANGUAGE:][“A description of this clinical trial will be available on http://ClinicalTrials.gov/ , as required by U.S. Law. This Web site will not include information that can identify you. At most, the Web site will include a summary of the results. You can search this Web site at any time.”]

Do I have to participate in this study?

No. You do not have to participate.You have the right not to sign this form. If you do not sign it, you cannot take part in this research study. It is your choice to sign this form. You should feel that all your questions about the study and the use of your data have been answered before you sign.

For how long will my data be used?

Your data may be used and shared as described in this form for as long as they remain useful for research purposes.

Can I change my mind about participating in this study?

Yes. You have the right to withdraw your permission for us to use or share your information for this research study. If you withdraw your permission, you will no longer be able to participate in the study and no further information will be collected about you. However, all of the information collected before you withdraw your permission will still be used. We will not be able to take back information, including Your Coded Information, that has already been used or has been shared with others. We cannot remove your information that is already part of larger data sets that have been and are being shared for further research. If you wish to withdraw your permission, you must notify your study doctor.

[Note to researcher and SPONSOR: This section on withdrawal of permission to share information relates to the United States; other jurisdictions may not allow the retention of information once a subject withdraws from the study. Please review the laws of the jurisdiction in which the study takes place before using this language.]

OPTIONAL: The below table may be included in addition to the above text to provide a summary of how information will be used and protected.

Type of information	How will it be used	How will it be protected
Information that directly identifies you e.g. name, address and date of birth.	Information that directly identifies you will be used: For study purposes. To ensure the study is being conducted correctly.	Access will be granted only to the following persons who need access to complete the research: Doctors and other people who are assisting with the study or your care. People who ensure the study is done correctly such as the ethics committee and regulatory agencies.
Coded information (Your name and other information that directly identifies you are removed and replaced with a code).	Coded information: Will be transferred from the study site to people who are working with SPONSOR. Will be used to understand the study and study results. May be used by SPONSOR and other researchers for health research purposes including to learn more about other diseases/conditions and to improve the conduct of clinical trials in general. May be used by SPONSOR and other parties to develop new commercial products, medicines, or devices, and to advance science and public health. May be sent to other countries where the data protection laws are not as strong as those in your country of residence.	SPONSOR will protect coded information by: Following laws that protect the use of information stored electronically. Taking additional steps when sharing data with third-party researchers, for example, removing data fields that are not needed by the third party and/or assigning a new code number.
Results Summaries (combined statistical data and results of the study)	Results summaries may: Help other researchers learn more about the study. Be provided in medical journals, on the internet or at meetings.	SPONSOR will: Not include information in the results summaries that directly identifies you without first obtaining your explicit permission to do so.

Type of information

How will it be used

How will it be protected

Information that directly identifies you e.g. name, address and date of birth.

Information that directly identifies you will be used:

For study purposes.
To ensure the study is being conducted correctly.

Access will be granted only to the following persons who need access to complete the research:

Doctors and other people who are assisting with the study or your care.

People who ensure the study is done correctly such as the ethics committee and regulatory agencies.

Coded information

(Your name and other information that directly identifies you are removed and replaced with a code).

Coded information:

Will be transferred from the study site to people who are working with SPONSOR.
Will be used to understand the study and study results.
May be used by SPONSOR and other researchers for health research purposes including to learn more about other diseases/conditions and to improve the conduct of clinical trials in general.
May be used by SPONSOR and other parties to develop new commercial products, medicines, or devices, and to advance science and public health.
May be sent to other countries where the data protection laws are not as strong as those in your country of residence.

SPONSOR will protect coded information by:

Following laws that protect the use of information stored electronically.
Taking additional steps when sharing data with third-party researchers, for example, removing data fields that are not needed by the third party and/or assigning a new code number.

Results Summaries (combined statistical data and results of the study)

Results summaries may:

Help other researchers learn more about the study.
Be provided in medical journals, on the internet or at meetings.

SPONSOR will:

Not include information in the results summaries that directly identifies you without first obtaining your explicit permission to do so.

[1] For purposes of these recommendations, “real world” data may also include data that are collected in “virtual worlds,” such as gaming data from electronic games or workplace or educational simulations. The point here is that these “big data” studies most often are using masses of data already generated and collected in the course of the daily activities of individuals.
[2] Executive Office of the President, Big Data across the Federal Government (March 2012).
[3] NIH Genomic Data Sharing Policy
[4] OHRP, Guidance on Research Involving Coded Private Information or Biological Specimens (October 16, 2008) SACHRP notes that this guidance document refers to the ability of the investigator to re-identify the individual as the applicable standard, but emerging requirements that de-identified research data be shared widely would make the standard more broadly applicable, so that the ability of anyone to re-identify the individual would become the appropriate measure.
[5] As discussed above in regard to the NIH’s recent policy on the aggregation and use of genomics research data, which would often include associated phenotypic data, de-identification in some quarters may no longer be presumed to allow big data studies to be undertaken with previously-collected research data, unless those previous collections included a broad consent for future data use. But this rigid approach remains the exception, and not the rule, in this area.
[6] The 46.101(b)(4) exemption is also available under Subparts B and D (pregnant women and neonates, and children, respectively), as set forth in 46.201(b) and 46.401(b), but is not available for research involving prisoners, as set forth in 46.101(i), footnote 1.
[7] See Engagement of Institutions in Human Subjects Research (2008)
[8] Id.
[9] See Policy on Publication of Clinical Data for Medicinal Products for Human Use, Policy/0070 (Oct. 2, 2014). This EMA policy will apply to all clinical trials whose results are used to support EMA marketing applications, regardless of the national jurisdiction in which a trial was conducted. U.S.-based trials, therefore, will be subject to these EMA requirements, if their results will – as is likely- be used to support EMA applications.
[10] European Medicines Agency, Policy on Publication of Clinical Data for Medicinal Products for Human Use, Policy/0070 (Oct. 2, 2014)
[11] This is consistent with a previous SACHRP recommendation that the FDA interpret its definition of “clinical investigation” in a more circumscribed and limited way, so as to exclude data studies.   SACHRP letter to HHS Secretary, Attachment A (“Recommendations for Applicability of FDA Regulations to IRBs”), March 30, 2012.
[12] Because “big data” research typically examines the data of very large numbers of subjects, it would be expected that such studies would sweep in subjects who are pregnant women, children and persons in correctional custody – even though in most cases such studies would not be limited to those groups and would not focus on specific issues related to those groups. Waiver of consent is available under Subpart B for research involving pregnant women and neonates, by OHRP interpretation, and under Subpart D for research involving children, 46 C.F.R. 408(c), assuming that the other requirements of Subpart D are met in regard to level of risk. (In fact, the discussion within this report relating to an IRB’s determination of level of risk in “big data” studies is in large part directly relevant to the IRB’s determination of minimal risk under Subpart D in “big data” studies involving children.) Subpart C is a more complex analysis, as approval of any study under Subpart C (regardless of the separate requirements for waiver of consent) requires that the research fall within one or more specific categories. For an IRB (and the HHS Secretary) to approve any “big data” study involving prisoners as subjects, the applicable and allowable approval category would likely be “research on practices, both innovative and accepted, which have the intent and reasonable probability of improving the health or well-being of the subject.” 46 C.F.R. 306(a)(2)(iv). This position would rest on the assumption that such studies, although not involving any specific interventional benefit to the subjects, would add to the general knowledge affecting all subjects, including those in correctional custody. See Report of SACHRP Subcommittee on Subpart C, Section VIII (“Control Group v. Placebo”), April 18, 2005, available at http://www.hhs.gov/ohrp/sachrp/sachrpltrtohhssecapda.html.   Another category of research involving big data that might be allowable under Subpart C is research which falls under the HHS Secretarial waiver for epidemiological research conducted or supported by HHS functions. The criteria for this category are that the research must have as its sole purpose (i) to describe the prevalence or incidence of a disease by identifying all cases, or (ii) to study potential risk factor associations for a disease. See OHRP Prisoner Research FAQs, available at Prisoner Research FAQs.
[13] SACHRP, letter to HHS Secretary Michael Leavitt, January 31, 2008
[14] For disclosures of protected health information for research purposes without the individual’s authorization pursuant to 45 CFR164.512(i), and that involve at least 50 records, the Privacy Rule allows for a simplified accounting of such disclosures by covered entities. Under this simplified accounting provision, covered entities may provide individuals with a list of all protocols for which the patient’s protected health information may have been disclosed under 45 CFR 164.512(i), as well as the researcher’s name and contact information. Other requirements related to this simplified accounting provision are found in 45 CFR 164.528(b)(4). In the context of “big data” studies, however, with multiple, continuing disclosures of PHI in rapid electronic programs, even this exception becomes difficult to administer.
See HIPAA for Professionals - Research
[15] See A Summary of Your Rights Under the Fair Credit Reporting Act
[16] OHRP, Correspondence Regarding the Application of 45 CFR part 46 to the Activities Related to a National Health Registry (letters dated August 11 and December 29, 2011)
[17] IOM (Institute of Medicine). 2014. Integrating research and practice: Health system leaders working toward high-value care: Workshop summary, Chapter 6 (“Addressing Issues of Regulatory Oversight”). Washington, DC: The National Academies Press.
[18] Because learning health systems in the U.S. are almost invariably covered entities under HIPAA, all of the regulatory requirements of privacy and security already attach to their activities, thus a priori conforming to uniform regulatory expectations of appropriate risk management and risk reduction. The risk analysis, and the confirmation that big data study activities, are minimal risk, would need to be more precise in regard to entities that are not required to comply with HIPAA privacy and security standards. The various risk mitigation and reduction factors that IRBs might consider – or that IRBs might call upon their own institutions to confirm- have been discussed above.
[19] See, e.g., E. Bellin and N. Dubler, The Quality Improvement-Research Divide and the Need for External Oversight, 91 Am. J. Pub. Health 1512-17 (September 2001).
[20] In regard to randomized cluster studies, for example, SACHRP recently recommended that:
When IRBs approve a waiver of consent for a CRT, the IRB, institution and investigator may wish to consider whether it would be appropriate to perform community outreach to provide knowledge to the affected population of existence of research. This does not substitute for informed consent from individuals, but may be respectful of autonomy in those cases where the IRB has made the finding that the research meets the regulatory criteria for waiver of approval.

Attachment C: Recommendations on Regulatory Issues in Cluster Studies