Distributed system for data security and data privacy – Part 1
Introduction
Suppose you are building a platform, that integrates data of sensitive nature. Such a platform can be an IT-solution, like a web application, for clinical studies that enables patients and doctors to login and access the data, such as findings, demographic data or enter new information. Another example would be a platform that provides a penetration testing company a solution for tracking penetration test results and system vulnerabilities for given clients while also enabling their clients access the system to view the related findings. Obviously in both cases the data integrated in such a system is sensitive and the system must provide the necessary protection of data in terms of data security and data privacy.
In this post I will hold on to the later example. That is, we will have a penetration testing company that needs a software solution in form of a web application, that stores client information and case findings, while allowing a penetration tester as well as an end client (who hires the penetration tester) to access and use the system. For instance, a penetration tester can login and view findings data or create new cases, while an end client can view cases and findings data related to his account.
The architecture is based on Model B of the generic concept of the TMF (Technology and Methodology Platform for Networking in Medical Research). The functions are available through a RESTful interface.
In order to ensure the End Client’s privacy, Identification Data (IDAT) and Findings Data (VDAT) will be stored encrypted on separate servers associated with the help of a Trusted Third Party (TTP), which we call Pseudonymization Service (PSNS). This design concept has the benefit that all three servers (IDAT, PSNS and VDAT) have to be compromised in order to trace any found vulnerabilities back to the end client. In addtion to IDAT, PSNS and VDAT we will also have a fourth server called APP that will handle application data and the GUI. APP, IDAT, PSNS and VDAT will be referred as modules from now on.
Detailed Landscape
Our system has a distributed architecture. That is, the GUI lays on a different server (APP) than case findings (VDAT). Also, client information (IDAT) lays on a different server than the GUI and case findings. This is referred to as anonymization. In addition of storing the IDAT, APP and VDAT data in a distributed environment, we also apply a two level pseudonymization. That is, when storing IDAT information, a generated pseudonym will be assigned to that data record. Similarly, when storing a VDAT record, we assign it a different generated pseudonym. VDAT and IDAT information can only be linked using a trusted third party, called PSNS, that maps pseudonyms. The PSNS generates the pseudonyms for IDAT and VDAT via API call. This design preserves the end client’s privacy since all three servers – IDAT, VDAT and PSNS – have to compromised in order to link any vulnerability findings to end clients. On top of anonymization and two level pseudonimization, we also encrypt any sensitive data (username, passwords, findings, client information, etc).
The system user will load a single page which will transparently collect all the information laying in VDAT and IDAT. It will then link them together (e.g. “SQL Injection detected” (VDAT) at client “BMW” (IDAT)) and present it to the user in a single view (HTML). In order for that to happen, the IDAT, VDAT and PSNS server have to be contacted to collect and join VDAT and IDAT records together. This steering will be done on client side (web browser). The reason for that, is to avoid having a middle man server on the backend side that will do the collection and linking of data before sending it to the web browser. A middle man server like that can be compromised and sniffed. In other words, doing the steering on client side avoids having a single point where IDAT and VDAT information pass through and reside simultaneously in the same location (even for a brief time in heap space), other than the destination (web browser).
The system’s landscape is illustrated below:
Landscape |
---|
![]() |
PSNS
The PSNS is responsible for creating and resolving pseunonyms. You may think of the PSNS as a table that looks as follows:
Site | PSN1 | PSN2 |
---|---|---|
Site B | a | 1 |
Site B | b | 2 |
Site B | XA10 | x |
Site A | d | 4 |
.. | .. | .. |
When a record points to a record on a different module (e.g. a Case (VDAT) points to a Client Profile (IDAT)), this reference is provided by a pseudonym. As mentioned earlier both records will receive a pseudonym, psn
, when created. Let the psn
of Case be ‘x’ and the psn
for the Client Profile be ‘1’. To reference the Client Profile in the Case, the Case will have a field clientProfilePsnToBeResolved
that will hold the value ‘a’ (left column in PSNS table). When a user loads the Case from VDAT the Client Profile must also be loaded. This is done in the browser. The Site
attribute indicates that only users belonging to that Site
can resolve the given pseudonyms.
This process is illustrated in the picture below.
Pseudonymisation |
---|
![]() |
Our system will have two classes of user, that need to be treated differently. These users are:
– End client
that wants to log into the system and see her cases and status
– Pentester
that wants to log into the system and view any cases assigned to her, create new cases etc.
The differentiation can be done either by assigning the users different roles or having a discriminator type (e.g. enum
) added to the user classes. The following examples illustrate workflows that describe how the same process of loading cases is executed differently for a Pentester
and an End Client
:
Pentester (user id = ‘penTester1234’)
1. The browser requests a Case record from VDAT (Pseudocode: select cases from VDAT.cases where pentesterUserId = 'penTester1234'
).
2. Browser receives Case from VDAT, which includes a reference to a Client Profile (clientProfilePsnToBeResolved
= ‘a’)
3. Browser goes to PSNS and asks for the PSN2 that belongs to PSN1 = ‘a’
4. Browser receives PSN2 = ‘1’ from PSNS
5. Browser goes to IDAT and requests Client Profile with psn
= ‘1’
6. Browser receives Client Profile from IDAT
7. Browser displays Case and Client Profile data
End Client (user id = ‘bmwManager1’ )
1. The browser requests the EID (End Client ID) for end client user from IDAT (Pseudocode: select eid from IDAT.client where 'bmwManager1' in IDAT.client.endClientUserIds
)
2. Browser receives EID from IDAT. Example EID = ‘XA10’
3. Browser goes to PSNS and asks for all PSN2s that belong to PSN1 = ‘XA10′ (that is, request all cases for client with EID=’XA10’).
4. Browser receives PSN2s = [‘x’] from PSNS
5. Browser goes to VDAT and requests all Cases with psn in [‘x’]
6. Browser receives Case from VDAT, which includes a reference to a Client Profile (clientProfilePsnToBeResolved = ‘a’)
7. Browser goes to PSNS and asks for the PSN2 that belongs to PSN1 = ‘a’
8. Browser receives PSN2 = ‘1’ from PSNS
9. Browser goes to IDAT and requests Client Profile with psn
= ‘1’
10. Browser receives Client Profile from IDAT
11. Browser displays Case and Client Profile data
In order to better understand the how the linkage of VDAT and IDAT data is done consider the scenarios that follow.
Scenarios
Scenario 1: User Login
Scenario 1: User Login
Before a user can use the system, she has to log in. The login action is the same for both user classes. The user has to login into all 4 modules, from which she receives a token that can be used for authentication on each module later (e.g. when she wants to call a REST function on a module).
- User opens web browser and loads the start page (index.html)
- User is presented with a login screen where she has to enter her credentials, in order for her to proceed.
- User enters credentials and successfully logs into the system.
The scenario is illustrated below.
User Login |
---|
![]() |
Scenario 2: End Client login and loading of Cases
Scenario 2:
End Client
login and loading of CasesThe
End Client
logs into the system in order to view cases. The simplified workflow looks as follows:
End Client
opens web browser and loads the start page (index.html)End Client
is presented with a login screen where she has to enter her credentials, in order for her to proceed.End Client
enters credentials and successfully logs into the system.End Client
lands on a view that lists all cases that are related to her.
The UML sequence diagram is listed below:
End Client Login and display his/her cases |
---|
![]() |
Scenario 3: Pentester login and load assigned Cases
Scenario 3:
Pentester
login and load assigned CasesThe
Pentester
logs into the system and lands on a view that lists all cases (for different end clients) assigned to her.
Pentester
opens web browser and loads the start page (index.html)Pentester
is presented with a login screen where she has to enter her credentials, in order for her to proceed.Pentester
enters credentials and successfully logs into the system.Pentester
lands on view that lists all cases (for different end clients) assigned to her.
The UML sequence diagram is listed below:
Pentester Login and display his/her cases |
---|
![]() |
Scenario 4: Pentester login and create new client data and Case
Scenario 4:
Pentester
login and create new client data and CaseThe
Pentester
logs into the system and creates a new IDAT record that needs to be associated with a VDAT record.
Pentester
opens web browser and loads the start page (index.html)Pentester
is presented with a login screen where she has to enter her credentials, in order for her to proceed.Pentester
enters credentials and successfully logs into the system.Pentester
creates a new IDAT record (end client data) that is joined with VDAT data
An example that illustrates the persistence of end client’s data is illustrated below:
Pentester: Create a new case with a new client profile |
---|
![]() |
Yikes! Thats it for Part 1. In the next part, I will provide information regarding the technology stack as well as a fully functioning web application prototype with source code.
References
– Orientierungshilfe: Pseudonymisierung in der medizinischen Forschung