HomeTrusted Research Environments

Trusted Research Environments

Trusted Research Environments (TREs), also known as “Data Safe Havens” or “Secure Data Environments”, are highly secure and controlled computing environments that allow approved researchers from authorised organisations a safe way to access, store, and analyse sensitive data remotely. TREs provide access to data while keeping it safe, secure and accessible, while also promoting collaboration and driving innovation.

TREs are being widely adopted across the world by organisations such as biobanks, governments and health providers, to provide both data accessibility and security. In the UK, for example, the government’s public sector research endeavour, Genomics England, has benefited from the adoption of TREs during the COVID-19 pandemic, where approved researchers could securely access the clinical and genomic data of over 135,000 patients in a secure, cloud-based TRE.

This article focuses on use cases of TREs and potential opportunities for TREs within the biobanking, research and health sectors of countries in the Nordic region as a means to achieve both data accessibility and security. It also highlights the guidance and regulation in establishing these critical data access and compute spaces.

There is an increasing need for secure data sharing within the Nordic region

The Nordic region, composed primarily of Denmark, Estonia, Finland, Iceland, Norway and Sweden, have all made significant contributions to population sequencing and genomics research. Having shared ancestry and highly homogenous populations makes it easier to study disease-associated genetic variants and rare genetic conditions in these populations. Due to this, there has been a high appetite for large-scale population sequencing to inform on personalised medicine in the Nordic countries. Most importantly, there are national strategies, funding initiatives and collaboration initiatives to foster sharing of this data within the Nordics. 

Trusted Research environments (TREs) in the Nordic region

An example of the Nordic region’s aim to position itself as a leader in research cooperation and infrastructure was the creation of NordForsk, by the Nordic Council of Ministers in 2005. One of the outcomes from NordForsk was investing NOK 165 million (about USD 15 million) into research efforts in personalised medicine. This effort was in collaboration with national funding agencies from Sweden, Iceland, Denmark, Finland and Norway. Awarded projects from this investment include using personalised medicine in a range of diseases, such as prostate cancersleep apnea, inflammatory bowel diseaseischemic heart diseasesevere infectious diseasesrheumatoid arthritis, and also generation of new health economic evidence to address important health care decisions. 

Being at the forefront of precision medicine is a shared goal of the Nordic countries, and has resulted in the formation of the Nordic Precision Medicine Initiative (NPMI) and development of a roadmap for the precision medicine initiative in this region. These initiatives and strategies have fostered collaboration and data sharing between governments, universities, research organisations and the private sectors. However, sharing of this data poses challenges and considerations, such as having the infrastructure to securely store and share information safely. The combination of wanting to foster collaboration and sharing of data, while keeping data safe and secure, has resulted in the adoption of Trusted Research Environments (TREs) across the Nordics.

Examples of use of TREs/Data Safe Havens with the Nordics

With the increase in genomic research and data, some Nordic countries have already adopted Trusted Research Environments and similar models to be able to securely store the data, while promoting research collaboration. Listed below are some examples: 

Danish National Genome Center (DNGC)

The Danish National Genome Center (DNGC), a government agency and authority established in 2019, was created to implement the Danish Government’s National Personalised Medicine Strategy. The DNGC is deploying a fully on-premise TRE and Data Analysis Platform for managing whole genome and clinical data, within the Center’s national supercomputing cluster. This TRE will ensure a scalable and secure data management and analysis platform for Denmark’s national researchers, clinical scientists and international collaborators. During the first phase of the strategy, the Danish National Genome Center and its collaborators will recruit and sequence whole genomes of 60,000 patients diagnosed with cancer, autoimmune disorders and rare diseases by 2024.

FinnGen

The FinnGen project is a large public-private partnership in Finland launched in 2017, with the aim to collect 500,000 biological samples from participants, which equates to about 10% of the Finnish population. This is not a TRE model, but it is similar in that it is a data environment, where Finnish universities, hospitals and hospital districts, biobanks, international pharmaceutical companies and Finnish citizens can collaborate and progress precision medicine.

Genomic Medicine Sweden (GMS)

Genomic Medicine Swedenwas founded in 2018 and receives funding from the Swedish Innovation Agency, Vinnova, and regional university hospitals and medical faculties, which established a national genomics infrastructure comprising seven regional centres and aims to to have sequenced 65,000 samples by 2025. GMS collaborates with the 11National Genomics Initiative (NGI) platform, which provides access to next generation DNA sequencing, genotyping and associated bioinformatics support.

Future-looking Nordic initiatives that will recure secure data sharing

The Nordic countries have well-established healthcare systems, biobanks, universities and support from private companies to drive population-level studies. Listed below are national strategies and funding initiatives to further promote personalised medicine in the Nordic regions, resulting in generation of more genomics data that will need to be hosted and shared safely and securely, for example in a TRE. 

  • The Nordic Council of Ministers’ Action plan for Vision 2030 includes the goal to enable health data to be shared securely and quickly between Nordic countries. It also plans to further contribute to the existing goal of turning Nordic companies into world-leaders in the global life sciences sector. 
  • Denmark’s governmental agency, Innovation Fund Denmark (IFD), has stated that it will have an investment strategy within personalised medicine as part of its overall health strategy from 2023-25, on top of already funded ongoing projects in personalised medicine.
  • Business Finland’s programme on Personalised Health has provided funding of up to EUR 80 million for innovation to various organisations such as start-ups, SMEs, large companies, universities, research organisations, hospital districts and other health care organisations, with the aim to create new businesses around individualised healthcare platforms.
  • Genomic Medicine Sweden’s Strategic Plan 2021–2030 includes long-term goals of GMS-analysis of complex diseases and integrated omics in clinical routine, the genomics platform integrated in national clinical studies and advanced AI-based interpretation support on the National Genomics Platform.
  • Norway’s national 2023–2030 eHealth strategy includes long-term goals of further developing personalised medicine, which is supported by advanced data analysis. This includes using artificial intelligence and personalised medicine to make clinical and administrative processes more efficient. 

Considerations for establishing TREs within the Nordic region

When establishing a TRE within the Nordic region, there are a number of key considerations to ensure that data is safely stored and utilised. Providers of TREs should also be aware and be compliant with region-specific and national legislations relating to health and genomic data.

Data privacy laws in Nordic countries

  • The Nordic countries are subject to General Data Protection Regulation (GDPR) laws, being EU or European Economic Area (EEA) member countries, which lays out how personal information must be used by organisations, businesses and the government. Therefore, establishment of a TRE housing health data in any of the Nordic countries must comply with GDPR. best practices for building a Trusted Research Environment
  • European Health Data Space (EHDS) – The European Commission is proposing a regulation for the European Health Data Space – which builds on GDPR and emerges from the European Strategy for Data. It will consist of rules, common standards and practices, infrastructures and a governance framework, with an aim to maximise safe and secure exchange, use and reuse of health data in the EU.
    EU Data Act – in final stages, the Act proposed to make data more available for use and rules for who can use and access data. This Act differs from GDPR (which regulates personal data) as it also regulates non-personal data. This Act includes the introduction of the obligation of data from the private sector to be made available to public bodies. 
  • NIS 2 Directive ((EU) 2022/2555 (known as NIS2), replacing Directive (EU) 2016/1148) – which aims to improve the existing cyber security status across the EU.

Relevant certifications

ISO27001 – internationally recognised specification for an Information Security Management System (ISMS)

Data standardisation and interoperability

The European Health Data Space (EHDS) proposes standardisation of electronic health records and health data exchanges. As part of secure data sharing, data needs to be interoperable, particularly when combining data from disparate sources, so that data can be easily used. Therefore, data should be standardised to a common data model, such as the Observational Medical Outcomes Partnership (OMOP) common data model for clinical data. Accordingly, a key consideration would be to standardise data in the common data model that the EHDS implements.

Conclusion

As described above, the use of TREs and Data Safe Havens in the Nordic region has increased in recent years, however, there is scope for continued adoption with the rollout of several precision medicine initiatives and funding. Increased data access and usage of health data from Nordic populations will not only help people and the health and care system in these countries, but will also benefit international research collaborations. Virtually connecting these sensitive datasets, enabling research studies  without compromising security, can lead to greater research insights for the benefit of local, national and international populations.

Moving forward, if an organisation is endeavouring to establish a TRE for efficient and secure data access, they will need a well-defined security-by-design and governance framework in place to ensure compliance, in addition to wide-ranging technology capabilities. 

Lifebit works proactively with clients, including Genomics England, the Danish National Genome CentreBoehringer IngelheimNIHR Cambridge Biomedical Research Centre, and others to comply with sensitive data requirements. We ensure that organisations can meet and exceed industry standards amidst the changing regulatory and regional landscape – enabling valuable research at scale to improve patients’ lives. 

To find out more:



In the last twenty years, there has been an explosion in the production of patient-derived biomedical data. This includes datasets derived from clinical-genomic, Electronic Health Records (EHRs), and real-world data (RWD) sources, which, when utilised together, can hold the answers to the underlying causes of disease. Unfortunately, the transformative potential of this health data has yet to be realised. To preserve patient privacy, much of the world’s health data is stored within institutional siloed environments that are unavailable to researchers or are difficult to access. To support research and innovation through the power of data, solutions are needed to enable data access and linkage while maintaining security.

Trusted Research Environments (TREs) are highly secure and controlled computing environments that solve this problem. Also known as “Data Safe Havens” or “Secure Data Environments”, TREs allow approved researchers from authorised organisations a safe way to access, store, and analyse sensitive data remotely. Here, we focus on how TREs are being utilised within the biobanking, research and health sectors of the United Kingdom as a means to achieve both data accessibility and security. We also highlight the guidance and regulation in establishing these critical data access and compute spaces. 

Trusted research environments are being widely adopted across the UK

Across biobanking, government and health providers, TREs are being increasingly adopted as a means to achieve both data accessibility and security. As a global leader in genomic research  that has actively and significantly invested in health data science, several TREs have been implemented across the UK. This article highlights several use cases of TREs within the UK health research sector:

Trusted Research Environment

Genomics England

The UK government’s public sector research endeavour, Genomics England currently hosts the data from over 135,000 NHS patients within a TRE for approved research use. The TRE is a cloud-based tool (powered by AWS and Lifebit) that approved researchers can use to access the clinical and genomic data from participants with cancer, rare disease, and COVID-19. With separate data access processes distinguishing public from the private sector, researchers that want to access data must apply to become a member of either the Genomics England Clinical Interpretation Partnership (academics, students, and clinicians) or the Discovery Forum (industry partners). 

Research enabled: Approved research use of Genomics England’s data has resulted in over 200 publications and 560 collaborative research projects. These research studies span a variety of topics, including COVID-19rare disease and cancer genetics and more. With the recent implementation of Genomics England’s TRE, the collaborative potential for research using this data will continue to grow. 

National Health Service England (NHS England)

Recently, NHS Digital, in partnership with Health Data Research UK, developed a TRE that provides academic researchers access to cardiovascular and cancer data for COVID-19 research. Published in the British Medical Journal, the partnership with national health data custodians provides linked, nationally collated electronic health records for approved research within secure, privacy-protecting environments.

Research enabled: By combining individual-level data across national healthcare settings, data on age, sex, and ethnicity are complete for around 95% of the population in England. This resource has already proven essential for accurate recording and research on cardiovascular disease and COVID-19, providing researchers across the UK with rapid access to data.

Moving forward, the NHS has committed  to establishing a Federated Data Platform and a network of sub-national SDEs for NHS data sources across England – this will allow hospital trusts to safely link the secure environments that house NHS data for more efficient access, without having to physically move the data. 

Honest Broker Service (Nothern Ireland)

The Honest Broker Service is the TRE for health data within Northern Ireland. Here, a variety of health data sources including general medical, dental, maternity, cancer, COVID-19 and other data types can all be accessed by the Department of Health, approved Health and Social Care (HSC – Northern Ireland’s public health care system) affiliates, and approved researchers. The TRE provides access to linked, de-identified data for approved research projects. Users can also collaborate on projects and access a range of analytical tools to support their work. 

Research enabled: Access to this rich health data source has led to numerous studies focused on the Northern Irish population covering a variety of areas – from mental health,  dementia, to maternity studies. In particular, a policy report on the routes to cancer diagnoses within Northern Ireland offers several recommendations to promote earlier cancer detection in order to help benefit patients.

Secure Anonymised Information Linkage (SAIL) Databank (Wales)

SAIL is a rich population databank, whose TRE provides global researchers secure remote access to datasets with anonymised health and social care data records for the population of Wales. In operation since 2007, the SAIL Databank operates on the UK Secure Research Platform, a private research cloud with customisable technology. 

Research enabled: Research publications resulting from the databank are in the hundreds – a recent example, in the largest study of its kind, found that COVID-19 vaccines offer effective protection against infection for high-risk healthcare workers.

The Scottish National Safe Haven (Scotland)

The Scottish National Safe Haven was established in 2013 by what is now Public Health Scotland. The Scottish National Safe Haven is the single point of entry for access to nationally-held health data held by NHS Scotland, and can be accessed on computers physically located in safe settings across the country. Numerous data types are available including hospitalisations, prescribing data, COVID-19 vaccinations, census and medical imaging data – all which are listed on the HDR UK Innovation Gateway

Research enabled: Access to this data directly powers the outputs of Public Health Scotland, with a wide range of research, guidance and statistical analyses available on cancer diagnoses, immunizations, and more. Further, this TRE has been linked to the Outbreak Data Analysis Platform to help power research efforts in the studying of COVID-19. From this, hundreds of researchers have been able to securely access this data, resulting in 101 research publications. 

Establishing TREs within the UK

When establishing a TRE within the UK, there are a number of key considerations to ensure that data is safely stored and utilised:

Key Principles of TREs

With the increasing adoption of TREs across the UK, there are emerging data governance standards that outline how TREs should be operated. At a UK-national level, the UK Health Data Research Alliance, convened by HDR UK, has adopted a set of principles to ensure that data services, including TRE owners, provide safe research access to data. These are based upon the Five Safes Framework, initially established by the Office of National Statistics, and now broadly adopted across the international research community. Similarly, the NHS has also published a clear public guide to Secure Data Environments and their policy guidelines, which are also based on the 5 SAFEs.

The Five Safes framework for safe research access to data:

Key Principles of TREs

01 — Safe people
Only authorised analysts or researchers can access the data and only on approved projects. Data Custodians need a process to verify the authorisation status of these individuals and need to be able to segregate data access between users. All user access and activities performed over the data management platforms must be recorded and logged to enable full auditability.

02 — Safe projects
TREs need a transparent application process for data access, eg individuals need to be clear on what they are using the data for.

03 — Safe settings
TREs must hold data securely and have industry-standard security controls such as data encryption, no export of individual-level data and the ability to track researcher/user activity.

04 — Safe data
Data needs to be de-identified and encrypted both at rest and in transit.

05 — Safe outputs
TREs need a robust and transparent process to support the export of data results, which prevents unauthorised removal of data, known as an Airlock.

The UK Data Protection Act and UK GDPR

The Data Protection Act (DPA) of 2018s is the UK-equivalent and implementation of the General Data Protection Regulation (GDPR) – each laying out how personal information must be used by organisations, business and the government. When establishing a TRE that will house health data within the UK, an organisation must comply with the DPA/GDPR. This will ensure that data is used fairly, in a way that is relevant and limited to what is necessary, with the appropriate security measures in place. Given the increasing use of genomic data in research, the PHG Foundation, a non-profit policy think tank based out of the University of Cambridge, has recently published a policy report about how genomic research in healthcare is impacted by the GDPR.

Relevant certifications

When establishing a TRE within the UK, certain industry-recognised regulatory frameworks require certification. This means that a TRE owner may have to undergo an external audit to confirm compliance with the requirements. These certifications include the following:

  1. ISO 27001 is a world-recognised industry standard and represents the foundation of numerous countries’ compliance programs for information security management systems, including TREs. 
  2. Cyber Essentials Plus is a UK-specific certification that covers the basics of cybersecurity within an organisation’s corporate IT system, including rigorous vulnerability testing to ensure that an organisation is protected against hacking attacks. 
  3. If the data within the TRE includes NHS patient data, they must comply with the standards of the NHS Security and Protection Toolkit to provide assurances to confirm the data will remain secure.
  4. TREs that house sensitive health data for approved research must ensure that the data is anonymised to maintain patient security. The UK Statistics Authority has developed an accreditation scheme for data processors to anonymise the patient data. Once the de-identified data is within the TRE, the accredited processor will also ensure that all data is safeguarded to minimise the risk of data subjects being re-identified. Should a TRE owner be unable to become an accredited data processor, they may work with an accredited external partner. 
  5. Finally, if the TRE includes a cloud-based component, there are specific certifications including ISO 27017/27018, and if using NHS data a cloud security good practice guide, to ensure best practices for cloud services.

In summary, working with TRE providers can significantly simplify the process in establishing a TRE — providers already comply with national and regional data governance frameworks, including having the required certifications.

TRE Accreditation

Within the UK, there is an increasing prevalence of accreditation schemes to audit and certify TREs – thus further defining a clear set of standards that align with national data protection laws and frameworks to regulate how TREs operate. Examples include the NHS Secure Data Environment and the Our Future Health TRE accreditation processes.

These rigorous processes will review TRE owners and providers to ensure TREs meet the necessary standards across information governance, cyber security, operational, privacy, and technical requirements. In the case of Our Future Health TRE Accreditation, this includes an audit and review of internal policies and documentation of the Data Custodian and TRE provider against over 200 specific requirements. With TRE accreditation granted by such organisations, TRE owners and providers can then host and utilise expanded data sources in a controlled manner — furthering the potential for research progress whilst minimising security risks.

Closing remarks

TREs are emerging as essential entities across the UK that can scale with increasing volumes of patient data and ensure its protection, all while enabling secure access for approved research. While guidance exists highlighting their key principles, implementing accreditation frameworks and bodies that regulate the use of data will ultimately support a safer TRE ecosystem, help foster trust from the broader public, and ensure the best interests of the public and patients are protected. 

Moving forward, if an organisation is endeavouring to establish a TRE for efficient and secure data access, they will need a well-defined security-by-design and governance framework in place to ensure compliance, in addition to wide-ranging technology capabilities. 

Lifebit works proactively with clients, including Genomics England, the Danish National Genome CentreBoehringer IngelheimNIHR Cambridge Biomedical Research Centre, and others to comply with sensitive data requirements. We ensure that organisations can meet and exceed industry standards amidst the changing regulatory and regional landscape – enabling valuable research at scale to improve patients’ lives. 

To find out more:

In this article:

  1. What is a Trusted Research Environment
  2. Why do we need Trusted Research Environments
  3. Defining the Key Features of a Trusted Research Environments
  4. Advantages of Trusted Research Environments
  5. How are Trusted Research Environments Being Used in Healthcare Data Management
  6. Challenges and Priorities for Trusted Research Environments in the Future

What is a Trusted Research Environment?

Trusted Research Environments (TREs) are highly secure and controlled computing environments that allow researchers to gain access to data in a safe way. Also known as “Data Safe Havens” or “Secure Data Environments”, these secure digital environments enable approved researchers to remotely access, store, and analyse sensitive data in a single location.
Designed to protect the privacy and security of sensitive data, trusted research environments have been supporting the secure sharing of sensitive data in the UK since 2013. TREs are used by a range of organisations and industries, including research institutions, universities, health systems, charities and government bodies. [1][2][3][4] These can be fully open-source (eg OpenSafely), in-house built, or built by commercial companies, with diverse benefits and features across these varied approaches.

TREs support the highest level of data governance by removing the need to share data physically among researchers and organisations.
Data instead remains in a secure environment and is analysed in situ by authorised researchers with tools available in the TRE.

With clear evidence that health, care and research and development sectors require deeplinked health-related data, trusted research environments are increasingly recognised as a solution that can provide secure access and analytics functionality to authorised researchers, while also increasing public trust in data use. As such, the trusted research environment landscape and associated technology are evolving rapidly in the UK and further afield.

Featured Resource: What is a Trusted Research Environment?

TREs support the highest level of data governance by removing the need to share data physically among researchers and organisations

Why do we need Trusted Research Environments?

Making use of large-scale health data

The opportunities for data-driven research and innovation today have never been larger. The availability of large-scale health data for research is immense. In the genomics field for example, there is now roughly 2 to 40 billion gigabytes of data generated each year. This health data holds huge potential to accelerate society’s understanding of how to detect, prevent, and treat disease. 

Studying larger sample datasets can lead to increased insights, as shown in numerous genetic association studies. For example, the first schizophrenia-associated variant was identified using a cohort of 3000 individuals, yet subsequent analysis of a cohort 10x larger uncovered over 100x the variants. [5]

Traditional data sharing models are not longer secure or scalable

However, the potential of health data is far from being realised. To preserve patient privacy, much of the world’s health data is stored within institutional siloed environments that are unavailable to researchers or difficult to access. [6] Agreements to enable data sharing between organisations are complex, and even where researchers are approved for access, it can typically take organisations six months or longer to make these approvals for data access. [7]

Traditional modes of data access and sharing rely on sensitive datasets being copied, moved, or downloaded into personal/organisational devices or centralised platforms. With the sensitive nature and sheer scale of health and genomic data, this mode of data access is inefficient or unsustainable.

Further, with an alarming rise in reports of large-scale data breaches and data mining activities, and a long-overdue shift in public awareness towards personal data sovereignty, maintaining public trust in health data research is critical. [8][9][10]

Trusted research environments are a scalable, long-term solution for health data access

TREs can address some of the concerns around data security and patient privacy – with multi-layered security controls and robust monitoring and auditing capabilities. Importantly, trusted research environments represent a shift in data access from a ‘lending library’ to a ‘reading library’ approach. In the TRE model, approved researchers can use the data within the library, but this information never leaves the library.

Further, trusted research environments provide the functionality and infrastructure to support the research on sensitive health data at scale. They are solving the problem of authorised data sharing by enabling research progress without sacrificing data security –  ensuring data are handled in a secure and responsible manner

Defining the Key Features of a Trusted Research Environments

In order to power research and progress therapeutic development while maintaining public trust, trusted research environments must strike the delicate balance between usability and security. As trusted research environments are built and procured across industries, there are several important features needed to ensure safe data access:

The Five Safes framework

A central feature of trusted research environments is recommended to be the Five Safes framework, originating from the UK’s Office for National Statistics, it consists of five pillars – safe people, safe projects, safe settings, safe data and safe outputs. The framework’s pillars span all stages of data management to make data available for research, while protecting confidentiality at all times. This set of principles is widely regarded as the gold standard for sensitive data protection.

A recent white paper from the UK Health Data Research Alliance, convened by Health Data Research UK (HDR UK), built upon this framework to establish guidelines and best practices for building trusted research environments, ensuring data services (like trusted research environment providers) provide safe access to data.[2] 

five-safes-framework

Beyond the 5 Safes, there are several key features and best practises of trusted research environments that are needed to enable researchers to safely and effectively access and analyse data – both in terms of safeguarding sensitive data and providing the analytics and infrastructure to support research at scale.

How data is safeguarded in a Trusted research environment

Custodians (e.g., biobanks and healthcare providers) of health data cohorts have been tasked with a critical role of safeguarding participants’ data. As part of an organisational-level data governance framework, trusted research environments need a multi-layered approach to safeguarding sensitive data, to ensure data are handled in a secure and responsible manner. Alongside ethical approval for data access that involves patients and the public in decision making, this governance framework can help to build public trust. 

Well-defined governance frameworks lay out the roles and responsibilities of different stakeholders, including researchers, institutional review boards, and information security teams, to ensure that patient data is handled responsibly. However, this can become increasingly complex, with data governance standards rapidly changing across regions and between institutions. Working with a trusted research environment provider can alleviate these complications. When choosing a provider, certifications in industry-recognised standards including ISO27001 and Cyber Essentials Plus signify that the provider is well equipped to manage private and sensitive data.

 

What measures can be taken to safeguard participant data within a Trusted research environment?

Encryption

Data encryption is the process of converting plain text (unecrypted) information into an unreadable ciphertext (encrypted) format, using an encryption algorithm and a secret key, with the purpose of maintaining the confidentiality and privacy of the information. The encrypted data can only be decrypted and read by someone with access to the correct decryption key.

Pseudonymisation

Data pseudonymisation is a privacy-enhancing technique that replaces identifiable information, such as personal names and addresses, with a pseudonym, or a fake name, that cannot be traced back to the original information without additional information. Pseudonymisation reduces the risk of a data breach and protects the privacy of individuals by making the data less easily linkable to specific individuals.

Role-based access controls

Role-based access control (RBAC) is a method of restricting access to a computer or network based on the roles of individual users within an organisation. In RBAC, users are assigned to specific roles, and each role is granted certain permissions, such as access to specific files or applications, or the ability to perform specific actions. The permissions are determined based on the responsibilities and duties associated with each role. This type of access control provides a flexible and scalable way of managing and organising user access.

Data export control (Airlock)

These are controls that stop data from being exported or downloaded to external environments, without first obtaining approval, an example of this is an ‘Airlock’. Genomics England has a world-renowned Airlock process which means only the results of an analysis can be exported by users, and authorised personnel must approve and validate the purpose of any data download from the TRE.

Monitoring, logging and auditing

Data activity monitoring is needed so that TRE owners have visibility of who is doing what with the data and for which purposes. TREs should have monitoring capabilities that track and audit analyses and datasets. The TRE must have systems in place that proactively monitor the security of their data in real-time, to identify suspected unauthorised data access, data leaks or anomalous activity and automatically alert the TRE owner.

Data access committee

Data Access Committees are a group of individuals whose responsibility is to review and assess data access requests.11 This can promote the benefits of sharing data while reducing potential harm from making data openly available without restrictions. Examples from biobanks include Genomics England’s Access Review Committee and the Our Future Health’s Access Board.

User authentication

TREs should have industry-standard user authentication in place to verify the identity of a user attempting to gain access. Some examples include OktaOAuth and Active Directory.

Segregated Workspaces

To enforce restrictions on data access, the TRE should establish segregated workspaces that apply to users, projects, tools and data. Within workspaces, authorised users can only view the subset of data corresponding to their approved research project.

Data Minimisation

In line with the EU General Data Protection Regulation (GDPR), TREs should support data minimisation approaches. This means reducing the information shared about each patient to the minimum needed to conduct the analysis. As an example, a TRE may have one-way ingestion to create analysis-ready data, yet this data does not persist beyond purposes directly relevant to the research.


Future-proofing TRE capabilities

The vast majority of existing data management platforms are secure yet largely siloed, with limited ability to combine datasets and effectively pool research resources for analysis. [12][13] There are several key features trusted research environments must have to maximise research utility when working with large-scale data.

Scalability

Biobanks with hundreds of thousands of these datasets quickly scale to housing petabytes in volume – this creates challenges with cost, computational resources and storage. Cloud-based Trusted Research Environments can form part of the solution – with the “elastic” nature of cloud computing, TRE-owners only pay for the resources they need

Integration

As data will be ingested into the TRE from a range of sources (e.g., electronic medical records and laboratory information management systems), TREs should be able to integrate with diverse sources and systems.

Federation

When integrating data from various sources, it is important to consider the risk and financial costs associated with physically moving data. Federation capabilities simplify the linking of disparate data sources without physically having to move the data itself. Within a federated architecture, data will remain within appropriate jurisdictional boundaries, while metadata is centralised and searchable. 

Automated data transformation

Health data comes from a wide range of sources. With this diversity comes wide variability in how data are described and stored, which creates challenges for researchers preparing data for analyses.

TREs need automated systems within the platform to efficiently convert raw data to standardised analysis-ready data. This includes established ETL (Extract, Transform, Load) pipelines and APIs for interfacing between TREs and the data source. FAIRifcation of data within the trusted research environment further makes data Findable, Accessible, Interoperable, and Reusable with the incorporation of unique identifiers for data and metadata management.

End-to-end solution

Once the data is in a usable format, trusted research environments should incorporate built-in analytics to transform the analysis-ready data into insightsGenomics England’s Trusted Research Environment includes integrated, open-source tools to enable researchers to analyse the data that is housed within the Trusted Research Environment.

TRE

Featured Resource: Key Features of a Trusted Research Environment

Advantages of Trusted Research Environments

Health and multi-omics data are of high value for research, yet the scale and sensitivity of this data bring unique challenges for enabling secure data access. trusted research environments can solve many issues surrounding secure data access in healthcare settings. There are numerous benefits for researchers, organisations, and patients, compared to traditional methods where data is copied and moved.


Key advantages for using a trusted research environment for using a trusted research environment in health data research and management:

  1. Improve collaboration between organisations: TREs enable data access in a secure and controlled environment, supporting collaboration between researchers across different institutions or even countries. With increased access to a wider range of data, researchers can gain new insights and perspectives on the issues they are studying.
  2. Facilitate population-scale studies: Population-scale data is critical to understanding the drivers of disease and identifying patterns and trends in health and illness. TREs can be used to store and process large amounts of patient data, making it possible to conduct research on a much larger scale than would be possible with traditional methods.
  3. Improve clinical trial management: TREs can streamline the process of collecting, storing and sharing sensitive data from clinical trial participants, making it easier for researchers to access, analyse, and share data in a controlled and secure environment. This can lead to more accurate, reliable and efficient clinical trials.
  4. Improved patient outcomes: Better research and more accurate data can enable healthcare professionals to make more informed decisions and provide better patient care. Using a TRE, researchers can uncover new insights into the causes of diseases and develop more effective treatments.
  5. Improve data security: TREs allow approved researchers to securely conduct their work while keeping patient data safe from unauthorised access and potential security breaches. This is particularly important when working with sensitive information, such as genomic data, which can be used to identify individuals. Additionally, TREs provide increased oversight on what data is being used for.
  6. Compliance with regulations: The healthcare industry is heavily regulated, and organisations must comply with laws and guidelines to protect patient data, such as HIPAAGDPR, and security standards like ISO 27001. A TRE supports organisations in meeting these requirements by providing the necessary controls and oversight to ensure compliance with regulations.
  7. Cost-effective way to provide secure data access: By consolidating data storage and analysis in a single environment, researchers can reduce the costs of maintaining multiple systems and performing data migrations. Additionally, a TRE can help approved users avoid costly data breaches and non-compliance penalties.
  8. Sustainability: In traditional methods of data sharing, data is copied and moved, which requires significant consumption of resources. Using a TRE minimises data duplication and eliminates transfers of files, reducing resource consumption.

 

Featured Resource: 8 Advantages of Using a Trusted Research Environment in Healthcare Research & Data Management

To preserve patient privacy, much of the world’s health data is stored within institutional siloed environments that are unavailable to researchers or difficult to access

How are Trusted Research Environments Being Used in Healthcare and Research Today?

Across biobanking, governments and health providers, trusted research environments are being increasingly adopted as a means to achieve both data accessibility and security.
 
We highlight some case studies of how trusted research environments are being used across the life sciences industry:


National Health Service England (NHS England)
Recently, NHS Digital, in partnership with Health Data Research UK, developed a TRE that provides academic researchers access to cardiovascular and cancer data for COVID-19 research. Published in the British Medical Journal, the partnership with national health data custodians provides linked, nationally collated electronic health records for approved research within secure, privacy-protecting environments. [14] 

By combining individual-level data across national healthcare settings, data on age, sex, and ethnicity are complete for around 95% of the population. This resource has already proven essential for accurate recording and thus research on cardiovascular disease, providing researchers across the UK with rapid access to data.

Secure Anonymised Information Linkage (SAIL) Databank
A rich population databank, whose TRE provides global researchers secure remote access to datasets with anonymised health and social care data records for the population of Wales.1 In operation since 2007, the SAIL Databank operates on the UK Secure Research Platform, a private research cloud with customisable technology. 

Research publications resulting from the databank are in the hundreds – a recent example, in the largest study of its kind, found that COVID-19 vaccines offer effective protection against infection for high-risk healthcare workers. [15]

Genomics England
The UK government’s public sector research endeavour, Genomics England currently hosts the data from over 135,000 NHS patients within a TRE for approved research use. The TRE is a cloud-based tool (powered by AWS and Lifebit) that approved researchers can use to access the clinical and genomic data from participants with cancer, rare disease, and COVID-19. With separate data access processes distinguishing public from the private sector, researchers that want to access data must apply to become a member of either the Genomics England Clinical Interpretation Partnership (academics, students, and clinicians) or the Discovery Forum (industry partners). 

Danish National Genome Center
federated TRE deployed within the Danish National Genome Center’s supercomputing cluster will serve as the scalable and secure data management and analysis platform for Denmark’s national researchers, clinical scientists, and international collaborators. Powered by the Lifebit Platform, the TRE will deliver a next-generation computational infrastructure. The Danish National Genome Center and its collaborators will recruit and sequence whole genomes of 60,000 patients diagnosed with cancer, autoimmune disorders, and rare diseases by 2024.

Challenges and Priorities for Trusted Research Environments in the Future

Looking to the future, many governments, health systems, and biobanks see TREs as a secure long-term solution for research and clinical use of sensitive health data. 

This is most apparent in the UK, as set out in recent national policy guidance. In 2022, the UK government commissioned an independent review by Professor Ben Goldacre on the use of National Health Service (NHS) health data for research and analysis. This review, and others, have recommended that TREs, or ‘Secure Data Environments’, should be the default way to access health and social care data for R&D going forward.

Yet with a rapidly changing data, regulatory, legal, and technology landscape, TRE owners and suppliers must keep pace with developments to ensure TREs are sustainable into the future. We explore some key priorities and challenges for the future that relate to TREs for health data.

Trusted Research Environment accreditation policies

Countries are increasingly taking measures to protect and retain sovereignty over their national data, with strict national data protection laws and regulatory frameworks governing the movement of patient data limiting transfer between national jurisdictions. [16]

In line with this, there is an increasing prevalence of accreditation schemes to audit and certify TREs – examples in the UK include the NHS Secure Data Environment and the Our Future Health Trusted Research Environment accreditation processes. The processes will review trusted research environment owners and suppliers to ensure trusted research environments meet the necessary standards across information governancecyber securityoperationalprivacy, and technical requirements.  

Implementing accreditation frameworks and regulatory bodies that regulate the use of data can support a safer trusted research environment ecosystem, help foster trust from the broader public, and ensure that the best interests of the public and patients are protected.

Keeping public involvement at the forefront

Conducting meaningful Patient and Public Involvement and Engagement (PPIE) in the design and use of trusted research environments is becoming a best practice to minimise the risks of data misuse and focus research on studies where there is a demonstrable public benefit.
 
There are widespread examples demonstrating how patient and public involvement in decision-making on trusted research environments can lead to improved research output. Maintaining transparency on trusted research environment design and governance procedures is vital to ensure that public trust is maintained to allow long-term success and growth of population health initiatives that will ultimately save lives.

Technologies of the future

Amongst the widespread push for greater data protection and patient privacy, there is a need to factor in the knock-on effects for the flow of data access in research. This is where innovative technologies and approaches can bridge this gap and create trusted research environments that are sustainable into the longer term:

  • Federation is widely regarded as a key technology enabler for linking up disparate datasets, including data stored in TREs[17] Federation across TREs means data can be virtually linked for combined analysis whilst remaining at its source. This means researchers can easily access, collaborate, and analyse disparate datasets without data movement.
  • No code/low code tools are part of a wider industry shift towards software that supports a wider range of end-users. As the majority of the TREs in use today are in a research context, transitioning this to use in clinical, health systems, and the private sector will take a significant step-change in terms of usability across more diverse end-users.
  • Cloud computing with enterprise infrastructure providers like AWS and Microsoft Azure can provide state-of-the-art capabilities in security and storage, but also support the increasing scale of multi-omics and clinical datasets available today. The ‘elastic’ nature of cloud computing means researchers only pay for what they need.

Conclusion

With the ability to scale with increasing volumes of data, ensure data privacy and protection, and enable secure access for approved research, trusted research environments can serve all ends of the health research community. Enabling valuable research at scale to improve the lives of patients, trusted research environments represent a sustainable and secure long-term solution for managing and using big data.

Editor’s note: This post was originally published on March 28, 2023 and may be occasionally updated for accuracy and comprehensiveness.

Further reading

Read Lifebit’s white paper on best practices for building a Trusted Research Environment
Read Lifebit’s white paper on security and data governance

References

1. Lyons, R. A. et al. The SAIL databank: linking multiple health and social care datasets. BMC Med. Inform. Decis. Mak. 9, 3 (2009).

2. UK Health Data Research Alliance & NHSX. Building Trusted Research Environments – Principles and Best Practices; Towards TRE ecosystems. https://zenodo.org/record/5767586 (2021) doi:10.5281/ZENODO.5767586.

3. Nik-Zainal, P. S. et al. Multi-party trusted research environment federation: Establishing infrastructure for secure analysis across different clinical-genomic datasets. https://zenodo.org/record/7085536 (2022) doi:10.5281/ZENODO.7085536

4. Trusted Research Environment service for England. NHS Digital (2022).

5. Visscher, P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 101, 5–22 (2017).

6. 4 ways data is improving healthcare. World Economic Forum (2019).

7. Learned, K. et al. Barriers to accessing public cancer genomic data. Sci. Data 6, 98 (2019).

8. Kilzi, Michel. The Anatomy Of Personal Data Sovereignty. Forbes (2021).

9. Thousands of patients hit by NHS data breaches. Independent https://www.independent.co.uk/news/health/data-nhs-patient-breaches-privacy-b1877154.html (2021).

10. Google reportedly mining millions of Americans personal health data. CBS News https://www.cbsnews.com/news/google-mining-millions-of-americans-personal-health-data-report-says/ (19AD).

11. Cheah, P. Y. & Piasecki, J. Data Access Committees. BMC Med. Ethics 21, 12 (2020).

12. Denton, N. et al. Data silos are undermining drug development and failing rare disease patients. Orphanet J. Rare Dis. 16, 161 (2021).

13. Koutkias, V. From Data Silos to Standardized, Linked, and FAIR Data for Pharmacovigilance: Current Advances and Challenges with Observational Healthcare Data. Drug Saf. 42, 583–586 (2019).

14. Wood, A. et al. Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: data resource. BMJ n826 (2021) doi:10.1136/bmj.n826.

15. Bedston, S. et al. COVID-19 vaccine uptake, effectiveness, and waning in 82,959 health care workers: A national prospective cohort study in Wales. Vaccine 40, 1180–1189 (2022).

16. Mitchell, C., Ordish, J., Johnson, E., Brigden, T. & Hall, A. The GDPR and genomic data. (2020).

17. Thorogood, A. et al. International federation of genomic medicine databases using GA4GH standards. Cell Genomics 1, 100032 (2021).