DataSHIELD is an R-based software solution for federated analysis - the remote analysis of multiple data sources.
It allows for sophisticated analyses without the user being able to view or copy individual level data. Instead, only non-disclosive
summary statistics are returned. This makes it an effective solution for secure data science collaborations.
In order to use DataSHIELD, additional software is required to store data and manage user interaction. There are currently two solutions for doing this: Armadillo and Opal, which can be used compatibly within the same network. Below is an example of a simple setup:
Example setup 1
Example setup 2
An alternative setup involves the user first connecting to a Central Analysis Server (CAS), which is an online R studio environment:
Resources
An additional optional feature of DataSHIELD is the ability to host files elsewhere (e.g. computer clusters) and link them to
the data of armadillo or opal servers. This is impletmented using the resourcer package. External resources can
be used alongside data stored in armadillo itself, and resources may be hosted in different locations and formats.
DataSHIELD packages and their use
Finally, here is a brief summary of the core Armadillo and DataSHIELD packages described in this documentation.
Name
Type of application
User type
Description
Sources
armadilloService
Java
-
The software that facilitates hosting and analysing data using DataSHIELD.
DSI
R package
-
This package defines the API that is to be implemented by DataSHIELD compliant data repositories.
DSMolgenisArmadillo
R package
Researchers
To communicate between the client-side packages and armadilloService to perform analysis.
MolgenisArmadillo
R package
Datamanager
To upload data and manage projects in Armadillo in R.
User Interface
Javascript/ VueJS
Datamanager
To manage users, data, configurations and view logs.
dsUpload
R package
Datamanager
To upload data according to a specific format into Armadillo/Opal using R.
dsBaseClient
R package
Researchers
Core DataSHIELD client-side R package required for executing basic DataSHIELD commands.
dsTidyverseClient
R package
Researchers
Client-side R package which implements efficient data-manipulation using selected Tidyverse functions.