Distributed system for data security and data privacy – Part 1

Introduction
Suppose you are building a platform, that integrates data of sensitive nature. Such a platform can be an IT-solution, like a web application, for clinical studies that enables patients and doctors to login and access the data, such as findings, demographic data or enter new information. Another example would be a platform that provides a penetration testing company a solution for tracking penetration test results and system vulnerabilities for given clients while also enabling their clients access the system to view the related findings. Obviously in both cases the data integrated in such a system is sensitive and the system must provide the necessary protection of data in terms of data security and data privacy.

In this post I will hold on to the later example. That is, we will have a penetration testing company that needs a software solution in form of a web application, that stores client information and case findings, while allowing a penetration tester as well as an end client (who hires the penetration tester) to access and use the system. For instance, a penetration tester can login and view findings data or create new cases, while an end client can view cases and findings data related to his account.

The architecture is based on Model B of the generic concept of the TMF (Technology and Methodology Platform for Networking in Medical Research). The functions are available through a RESTful interface.

In order to ensure the End Client’s privacy, Identification Data (IDAT) and Findings Data (VDAT) will be stored encrypted on separate servers associated with the help of a Trusted Third Party (TTP), which we call Pseudonymization Service (PSNS). This design concept has the benefit that all three servers (IDAT, PSNS and VDAT) have to be compromised in order to trace any found vulnerabilities back to the end client. In addtion to IDAT, PSNS and VDAT we will also have a fourth server called APP that will handle application data and the GUI. APP, IDAT, PSNS and VDAT will be referred as modules from now on.

Detailed Landscape
Our system has a distributed architecture. That is, the GUI lays on a different server (APP) than case findings (VDAT). Also, client information (IDAT) lays on a different server than the GUI and case findings. This is referred to as anonymization. In addition of storing the IDAT, APP and VDAT data in a distributed environment, we also apply a two level pseudonymization. That is, when storing IDAT information, a generated pseudonym will be assigned to that data record. Similarly, when storing a VDAT record, we assign it a different generated pseudonym. VDAT and IDAT information can only be linked using a trusted third party, called PSNS, that maps pseudonyms. The PSNS generates the pseudonyms for IDAT and VDAT via API call. This design preserves the end client’s privacy since all three servers – IDAT, VDAT and PSNS – have to compromised in order to link any vulnerability findings to end clients. On top of anonymization and two level pseudonimization, we also encrypt any sensitive data (username, passwords, findings, client information, etc).

The system user will load a single page which will transparently collect all the information laying in VDAT and IDAT. It will then link them together (e.g. “SQL Injection detected” (VDAT) at client “BMW” (IDAT)) and present it to the user in a single view (HTML). In order for that to happen, the IDAT, VDAT and PSNS server have to be contacted to collect and join VDAT and IDAT records together. This steering will be done on client side (web browser). The reason for that, is to avoid having a middle man server on the backend side that will do the collection and linking of data before sending it to the web browser. A middle man server like that can be compromised and sniffed. In other words, doing the steering on client side avoids having a single point where IDAT and VDAT information pass through and reside simultaneously in the same location (even for a brief time in heap space), other than the destination (web browser).

The system’s landscape is illustrated below:

Landscape
Distributed system for data security and data privacy

PSNS
The PSNS is responsible for creating and resolving pseunonyms. You may think of the PSNS as a table that looks as follows:

Site PSN1 PSN2
Site B a 1
Site B b 2
Site B XA10 x
Site A d 4
.. .. ..

When a record points to a record on a different module (e.g. a Case (VDAT) points to a Client Profile (IDAT)), this reference is provided by a pseudonym. As mentioned earlier both records will receive a pseudonym, psn, when created. Let the psn of Case be ‘x’ and the psn for the Client Profile be ‘1’. To reference the Client Profile in the Case, the Case will have a field clientProfilePsnToBeResolved that will hold the value ‘a’ (left column in PSNS table). When a user loads the Case from VDAT the Client Profile must also be loaded. This is done in the browser. The Site attribute indicates that only users belonging to that Site can resolve the given pseudonyms.

This process is illustrated in the picture below.

Pseudonymisation
Distributed system for data security and data privacy

Our system will have two classes of user, that need to be treated differently. These users are:
End client that wants to log into the system and see her cases and status
Pentester that wants to log into the system and view any cases assigned to her, create new cases etc.

The differentiation can be done either by assigning the users different roles or having a discriminator type (e.g. enum) added to the user classes. The following examples illustrate workflows that describe how the same process of loading cases is executed differently for a Pentester and an End Client:

Pentester (user id = ‘penTester1234’)
1. The browser requests a Case record from VDAT (Pseudocode: select cases from VDAT.cases where pentesterUserId = 'penTester1234').
2. Browser receives Case from VDAT, which includes a reference to a Client Profile (clientProfilePsnToBeResolved = ‘a’)
3. Browser goes to PSNS and asks for the PSN2 that belongs to PSN1 = ‘a’
4. Browser receives PSN2 = ‘1’ from PSNS
5. Browser goes to IDAT and requests Client Profile with psn = ‘1’
6. Browser receives Client Profile from IDAT
7. Browser displays Case and Client Profile data

End Client (user id = ‘bmwManager1’ )
1. The browser requests the EID (End Client ID) for end client user from IDAT (Pseudocode: select eid from IDAT.client where 'bmwManager1' in IDAT.client.endClientUserIds)
2. Browser receives EID from IDAT. Example EID = ‘XA10’
3. Browser goes to PSNS and asks for all PSN2s that belong to PSN1 = ‘XA10′ (that is, request all cases for client with EID=’XA10’).
4. Browser receives PSN2s = [‘x’] from PSNS
5. Browser goes to VDAT and requests all Cases with psn in [‘x’]
6. Browser receives Case from VDAT, which includes a reference to a Client Profile (clientProfilePsnToBeResolved = ‘a’)
7. Browser goes to PSNS and asks for the PSN2 that belongs to PSN1 = ‘a’
8. Browser receives PSN2 = ‘1’ from PSNS
9. Browser goes to IDAT and requests Client Profile with psn = ‘1’
10. Browser receives Client Profile from IDAT
11. Browser displays Case and Client Profile data

In order to better understand the how the linkage of VDAT and IDAT data is done consider the scenarios that follow.

Scenarios
Scenario 1: User Login

Scenario 1: User Login

Before a user can use the system, she has to log in. The login action is the same for both user classes. The user has to login into all 4 modules, from which she receives a token that can be used for authentication on each module later (e.g. when she wants to call a REST function on a module).

  1. User opens web browser and loads the start page (index.html)
  2. User is presented with a login screen where she has to enter her credentials, in order for her to proceed.
  3. User enters credentials and successfully logs into the system.

The scenario is illustrated below.

User Login
Distributed system for data security and data privacy

Scenario 2: End Client login and loading of Cases

Scenario 2: End Client login and loading of Cases

The End Client logs into the system in order to view cases. The simplified workflow looks as follows:

  1. End Client opens web browser and loads the start page (index.html)
  2. End Client is presented with a login screen where she has to enter her credentials, in order for her to proceed.
  3. End Client enters credentials and successfully logs into the system.
  4. End Client lands on a view that lists all cases that are related to her.

The UML sequence diagram is listed below:

End Client Login and display his/her cases
distributed-system-sequence-end-client-cases-load

Scenario 3: Pentester login and load assigned Cases

Scenario 3: Pentester login and load assigned Cases

The Pentester logs into the system and lands on a view that lists all cases (for different end clients) assigned to her.

  1. Pentester opens web browser and loads the start page (index.html)
  2. Pentester is presented with a login screen where she has to enter her credentials, in order for her to proceed.
  3. Pentester enters credentials and successfully logs into the system.
  4. Pentester lands on view that lists all cases (for different end clients) assigned to her.

The UML sequence diagram is listed below:

Pentester Login and display his/her cases
Distributed system for data security and data privacy

Scenario 4: Pentester login and create new client data and Case

Scenario 4: Pentester login and create new client data and Case

The Pentester logs into the system and creates a new IDAT record that needs to be associated with a VDAT record.

  1. Pentester opens web browser and loads the start page (index.html)
  2. Pentester is presented with a login screen where she has to enter her credentials, in order for her to proceed.
  3. Pentester enters credentials and successfully logs into the system.
  4. Pentester creates a new IDAT record (end client data) that is joined with VDAT data

An example that illustrates the persistence of end client’s data is illustrated below:

Pentester: Create a new case with a new client profile
Distributed system for data security and data privacy

Yikes! Thats it for Part 1. In the next part, I will provide information regarding the technology stack as well as a fully functioning web application prototype with source code.

References
Orientierungshilfe: Pseudonymisierung in der medizinischen Forschung

 

lucas