Distributed system for data security and data privacy – Part 2

In the first part, I have described how one can design a distributed system for maximum security and data privacy. Now its time to provide a sample technology stack for such a system as well as a basic functioning prototype.

Technology Stack
Our architecture will make use of the following technologies:

Purpose Technology
Frontend AngularJS, HTML5, JavaScript, CSS(3)
Backend Java 1.8+
Application Server Apache Tomcat
Application Framework Spring Framework (Spring MVC, Spring Core, Spring Security, Spring Test)
REST Spring Framework (Spring MVC), Jackson
ORM Hibernate, Spring ORM
Database Encryprion Jasypt, Bouncy Castle
RDBMS MySQL(MariaDB or Oracle MySQL)
Testing JUnit, Spring Test
Build Automation Gradle

The modules will not have a separate GUI, but rather will be steered using one GUI (e.g. index.html) from the user’s browser which will be provided by the APP module. The basic Tier & Technology stack for each module will look as follows:

Architecture & Technology Stack
Distributed system for data security and data privacy

In all modules (APP, IDAT, PSNS and VDAT) the REST Service layer can be thought of as the public API of the module. That is, it will consist of a collection of RestControllers that handle HTTP requests and execute the appropriate service functionality. The View is available only in the APP since the APP will provide the GUI for the application to the end user in addition to the regular API functionality.

REST API
The REST service will be used by the frontend (browser) using AngularJS. It provides functions for authentication, fetching and resolving pseudonyms, etc.

Below are listed some of the available RESTful web service operations. For a full list of available operations please refer to the available API. In many cases we make use of the HTTP POST method instead of GET, to avoid exposing sensitive information via the URL.

Common
POST Authenticate with user credentials
Authenticate using username and password.

Resource: /services/authenticate

Method: POST

Parameter: TXLUserCredentialsDTO the username and password

Returns: TXLUserDTO The authenticated user.

Example:

IDAT
POST Fetch EID for user
Get EID(PSN1) for user.

Resource: /rest/client/eidForUser

Method: POST

Parameter: TXLUserDTO the user you want its EID

Returns: TXLPsnDTO The EID of the user.

Example:

PSNS
POST Resolve pseudonym
Resolve a given pseudonym. From PSN1 -> PSN2 or PSN2 -> PSN1, depending on where we come from. For instance a TXLCase (VDAT) holds a client profile pseudonym of type PSN1 that needs to be resolved to a pseudonym of type PSN2, which is the pseudonym of the client profile in IDAT. Here we come from VDAT and want to fetch something in IDAT (PSN1 -> PSN2).

Another example would be fetching the cases of a client with EID (PSN1) in IDAT. The EID (PSN1) needs to be resolved to PSN2(s), which are the pseudonyms of TXLCases in VDAT. Here we come from IDAT and want to fetch something in VDAT. Again this is a PSN1 -> PSN2 resolving.

Another example is the case when we have an existing client profile (IDAT) that has a pseudonym (PSN2), that we want to assign to a new case. Here we need to know the PSN1 counterpart so that we can assign it to the case.clientProfilePsnToBeResolved property of the TXLCase in VDAT. For that we ask the PSNS to resolve a pseudonym of type PSN2 and give us the PSN1 counterpart. The PSNS will return an array of counterparts, but it will optimally contain only one pseudonym of type PSN1. This is a PSN2 -> PSN1 resolving.

Resource: /rest/resolve

Method: POST

Parameter: TXLPsnDTO the pseudonym to resolve

Returns: TXLPsnDTO[] Array of counterpart pseudonyms corresponding to the pseudonym to resolve.

Example:

VDAT

POST Fetch case by pseudonym

Fetch a case that has the given pseudonym.

Resource: /rest/case/caseWithPsn

Method: POST

Parameter: TXLPsnDTO the pseudonym of the case

Returns: TXLCaseDTO the case object

Example:

Prototype
The full working prototype along with IDE setup instructions is available here.

 

Distributed system for data security and data privacy – Part 1

Introduction
Suppose you are building a platform, that integrates data of sensitive nature. Such a platform can be an IT-solution, like a web application, for clinical studies that enables patients and doctors to login and access the data, such as findings, demographic data or enter new information. Another example would be a platform that provides a penetration testing company a solution for tracking penetration test results and system vulnerabilities for given clients while also enabling their clients access the system to view the related findings. Obviously in both cases the data integrated in such a system is sensitive and the system must provide the necessary protection of data in terms of data security and data privacy.

In this post I will hold on to the later example. That is, we will have a penetration testing company that needs a software solution in form of a web application, that stores client information and case findings, while allowing a penetration tester as well as an end client (who hires the penetration tester) to access and use the system. For instance, a penetration tester can login and view findings data or create new cases, while an end client can view cases and findings data related to his account.

The architecture is based on Model B of the generic concept of the TMF (Technology and Methodology Platform for Networking in Medical Research). The functions are available through a RESTful interface.

In order to ensure the End Client’s privacy, Identification Data (IDAT) and Findings Data (VDAT) will be stored encrypted on separate servers associated with the help of a Trusted Third Party (TTP), which we call Pseudonymization Service (PSNS). This design concept has the benefit that all three servers (IDAT, PSNS and VDAT) have to be compromised in order to trace any found vulnerabilities back to the end client. In addtion to IDAT, PSNS and VDAT we will also have a fourth server called APP that will handle application data and the GUI. APP, IDAT, PSNS and VDAT will be referred as modules from now on.

Detailed Landscape
Our system has a distributed architecture. That is, the GUI lays on a different server (APP) than case findings (VDAT). Also, client information (IDAT) lays on a different server than the GUI and case findings. This is referred to as anonymization. In addition of storing the IDAT, APP and VDAT data in a distributed environment, we also apply a two level pseudonymization. That is, when storing IDAT information, a generated pseudonym will be assigned to that data record. Similarly, when storing a VDAT record, we assign it a different generated pseudonym. VDAT and IDAT information can only be linked using a trusted third party, called PSNS, that maps pseudonyms. The PSNS generates the pseudonyms for IDAT and VDAT via API call. This design preserves the end client’s privacy since all three servers – IDAT, VDAT and PSNS – have to compromised in order to link any vulnerability findings to end clients. On top of anonymization and two level pseudonimization, we also encrypt any sensitive data (username, passwords, findings, client information, etc).

The system user will load a single page which will transparently collect all the information laying in VDAT and IDAT. It will then link them together (e.g. “SQL Injection detected” (VDAT) at client “BMW” (IDAT)) and present it to the user in a single view (HTML). In order for that to happen, the IDAT, VDAT and PSNS server have to be contacted to collect and join VDAT and IDAT records together. This steering will be done on client side (web browser). The reason for that, is to avoid having a middle man server on the backend side that will do the collection and linking of data before sending it to the web browser. A middle man server like that can be compromised and sniffed. In other words, doing the steering on client side avoids having a single point where IDAT and VDAT information pass through and reside simultaneously in the same location (even for a brief time in heap space), other than the destination (web browser).

The system’s landscape is illustrated below:

Landscape
Distributed system for data security and data privacy

PSNS
The PSNS is responsible for creating and resolving pseunonyms. You may think of the PSNS as a table that looks as follows:

Site PSN1 PSN2
Site B a 1
Site B b 2
Site B XA10 x
Site A d 4
.. .. ..

When a record points to a record on a different module (e.g. a Case (VDAT) points to a Client Profile (IDAT)), this reference is provided by a pseudonym. As mentioned earlier both records will receive a pseudonym, psn, when created. Let the psn of Case be ‘x’ and the psn for the Client Profile be ‘1’. To reference the Client Profile in the Case, the Case will have a field clientProfilePsnToBeResolved that will hold the value ‘a’ (left column in PSNS table). When a user loads the Case from VDAT the Client Profile must also be loaded. This is done in the browser. The Site attribute indicates that only users belonging to that Site can resolve the given pseudonyms.

This process is illustrated in the picture below.

Pseudonymisation
Distributed system for data security and data privacy

Our system will have two classes of user, that need to be treated differently. These users are:
End client that wants to log into the system and see her cases and status
Pentester that wants to log into the system and view any cases assigned to her, create new cases etc.

The differentiation can be done either by assigning the users different roles or having a discriminator type (e.g. enum) added to the user classes. The following examples illustrate workflows that describe how the same process of loading cases is executed differently for a Pentester and an End Client:

Pentester (user id = ‘penTester1234’)
1. The browser requests a Case record from VDAT (Pseudocode: select cases from VDAT.cases where pentesterUserId = 'penTester1234').
2. Browser receives Case from VDAT, which includes a reference to a Client Profile (clientProfilePsnToBeResolved = ‘a’)
3. Browser goes to PSNS and asks for the PSN2 that belongs to PSN1 = ‘a’
4. Browser receives PSN2 = ‘1’ from PSNS
5. Browser goes to IDAT and requests Client Profile with psn = ‘1’
6. Browser receives Client Profile from IDAT
7. Browser displays Case and Client Profile data

End Client (user id = ‘bmwManager1’ )
1. The browser requests the EID (End Client ID) for end client user from IDAT (Pseudocode: select eid from IDAT.client where 'bmwManager1' in IDAT.client.endClientUserIds)
2. Browser receives EID from IDAT. Example EID = ‘XA10’
3. Browser goes to PSNS and asks for all PSN2s that belong to PSN1 = ‘XA10′ (that is, request all cases for client with EID=’XA10’).
4. Browser receives PSN2s = [‘x’] from PSNS
5. Browser goes to VDAT and requests all Cases with psn in [‘x’]
6. Browser receives Case from VDAT, which includes a reference to a Client Profile (clientProfilePsnToBeResolved = ‘a’)
7. Browser goes to PSNS and asks for the PSN2 that belongs to PSN1 = ‘a’
8. Browser receives PSN2 = ‘1’ from PSNS
9. Browser goes to IDAT and requests Client Profile with psn = ‘1’
10. Browser receives Client Profile from IDAT
11. Browser displays Case and Client Profile data

In order to better understand the how the linkage of VDAT and IDAT data is done consider the scenarios that follow.

Scenarios
Scenario 1: User Login

Scenario 1: User Login

Before a user can use the system, she has to log in. The login action is the same for both user classes. The user has to login into all 4 modules, from which she receives a token that can be used for authentication on each module later (e.g. when she wants to call a REST function on a module).

  1. User opens web browser and loads the start page (index.html)
  2. User is presented with a login screen where she has to enter her credentials, in order for her to proceed.
  3. User enters credentials and successfully logs into the system.

The scenario is illustrated below.

User Login
Distributed system for data security and data privacy

Scenario 2: End Client login and loading of Cases

Scenario 2: End Client login and loading of Cases

The End Client logs into the system in order to view cases. The simplified workflow looks as follows:

  1. End Client opens web browser and loads the start page (index.html)
  2. End Client is presented with a login screen where she has to enter her credentials, in order for her to proceed.
  3. End Client enters credentials and successfully logs into the system.
  4. End Client lands on a view that lists all cases that are related to her.

The UML sequence diagram is listed below:

End Client Login and display his/her cases
distributed-system-sequence-end-client-cases-load

Scenario 3: Pentester login and load assigned Cases

Scenario 3: Pentester login and load assigned Cases

The Pentester logs into the system and lands on a view that lists all cases (for different end clients) assigned to her.

  1. Pentester opens web browser and loads the start page (index.html)
  2. Pentester is presented with a login screen where she has to enter her credentials, in order for her to proceed.
  3. Pentester enters credentials and successfully logs into the system.
  4. Pentester lands on view that lists all cases (for different end clients) assigned to her.

The UML sequence diagram is listed below:

Pentester Login and display his/her cases
Distributed system for data security and data privacy

Scenario 4: Pentester login and create new client data and Case

Scenario 4: Pentester login and create new client data and Case

The Pentester logs into the system and creates a new IDAT record that needs to be associated with a VDAT record.

  1. Pentester opens web browser and loads the start page (index.html)
  2. Pentester is presented with a login screen where she has to enter her credentials, in order for her to proceed.
  3. Pentester enters credentials and successfully logs into the system.
  4. Pentester creates a new IDAT record (end client data) that is joined with VDAT data

An example that illustrates the persistence of end client’s data is illustrated below:

Pentester: Create a new case with a new client profile
Distributed system for data security and data privacy

Yikes! Thats it for Part 1. In the next part, I will provide information regarding the technology stack as well as a fully functioning web application prototype with source code.

References
Orientierungshilfe: Pseudonymisierung in der medizinischen Forschung