Pagine

sabato 11 aprile 2020

How the new version - April 10 - of DP-3T works and some reflections on privacy

by Enrico Nardelli

(versione in italiano qua)

The group of researchers who developed the DP-3T protocol for contact tracing in a decentralized way that I recently described has released an update of their documentation.

Now the protocol is presented in two versions. I make reference to the April 10 version of their White Paper.

Version 1 (defined as " low-cost decentralized" ) is substantially equivalent to what I described in the previous post I refer to for shortness. I list here only the changes of some significance:
  • the random key SK0 and the SKt generated on each subsequent day are locally stored on the device together with the relevant day, to provide an absolute time reference
  • the coarse time window stored locally for each encountered device now is only the day
  • the authorization code to alert the central server of being infected is given, inactive, to the person at the time of the test. The health authority activates it only when the test result is positive and the patient is notified of the result and the possibility of alerting the server;
  • if the person chooses to alert then the app sends to the server all the pairs [SKt ; day] in the contagious window
  • at page 10 of the white paper they write the contagious window starts up «to 3 days before the onset of symptoms» but then at page 16 they write the average width of the contagious windows is 5 days
  • the server distributes to all the registered apps all the pairs [SKt ; day] of the devices of infected people for every day of the contagious window
Since the server continues to distribute directly to all the registered apps the keys allowing them to reconstruct the EphIDs, a device can connect all the EphIDs of the same infected person (which would otherwise be unlinkable) and derive the temporal profile of meetings with this person.


As I observed in my previous post, this could allow, in the case of people who are met with some regularity, to identify who they are. This doesn't require the owner of the device to be a hacker, because I'm sure that if this were the solution used at mass scale, "somebody" would make low cost apps available that can offer this "help" to understand who, of those you encountered, is infected.
     We should always consider that human nature is still the same, despite digital technology advances, and curiousity and self-protection are powerful drivers of human actions.

Version 2 (defined as "unlinkable decentralized") has the same overall structure of version 1 with significant changes to some elements, which are the only ones I describe below, referring to the previous post for an understanding of the overall structure:
  • An initial SK0 key is no longer generated upon installation
  • the key SKt+1 of the following day is no longer calculated applying a predefined hash function H on the key of the previous day SKt
  • instead, a fully new SKt key is generated every day, which is a random number of 32 bytes
  • SKt is used to derive, using H with argument SKt, a single EphIDt to be used on that day t during the exchange of contact information with other devices encountered
  • the EphIDt of the devices encountered on a certain day t are stored locally in encrypted form KEt, always using H with arguments both EphIDt and t : in this way it is possible to avoid the direct recording the EphIDt and also to prevent an EphID generated in a different day but accidentally equal to that of day t leads to a "false positive"
  • the infected person who decides to send to the central server their pairs [SKt ; t] can choose for which days / periods to send them, thus obtaining a greater privacy control
  • the central server no longer distributes all the pairs [SKt ; t] of the devices of infected people for every day of the contagious window
  • instead, the central server uses the received pairs [SKt ; t] to compute the KEt encrypted values ​​corresponding to infected people's devices
  • all these KEt referring to the same day are inserted in a set called ""Cuckoo filter" (CF) which is distributed to all the registered apps
  • CF uses very little space and allows to verify with high efficiency whether one of the KEt that a device has stored in its local contact history belongs to CF or not, that is, corresponds to an infected person or not
  • when a device receives CF it checks, for each of the KEt stored in its local history, if it is included in CF. If not, the answer is definitive, if yes this could be a false positive (i.e., the filter answers "yes" but the truth is "no"), the probability of which can however be made as low as required by means of an a priori tuning of a specific filter parameter
The solution described in this version 2 decreases the probability of identification of an infected person, but does not eliminate it, since if a device X encounters another device Y several times a day, the KEt of Y appears as many times in the local history of X of his daily meetings (and vice versa).

Also, imagine that you discovered that you were in contact with an infected person for the first time 3 days ago. If you met the same person in all the following days, in your local history there will be as many reports of infected people in all those days.

The official app may not disclose anything of this to you, but only advise you to contact a health facility for a check. However, one can easily imagine that a market of "ancillary" apps offering to compute this data for you could develop. An "ancillary" app will then tell you how many times in a certain day you have met a certain device and if in the same day you have met only one device (of those signalled by the central server as belonging to an infected person) or more than one.

Moreover, while the official app only record the day of the encounter this "ancillary" app might record each encounter with the exact time of the day and will provide similar information for the subsequent days. It will not be able to tell you if the meetings of the different days refer to the same device(s) but, considering the return to normality, where you have a certain regularity of encounters/meetings and there are relatively few infected persons around, this information combined with remembering/reconstructing what you have done would allow you to link devices to people.

Considering that these contact tracing solutions are thought to be used more when people will return to normal life than in the current lockdwon situation the problem is not secondary. With the return to normality, regular daily activities, where we tend to regularly meet people we know, will begin again. And it is precisely in these situations of regular/repeated encounters/meetings that the approach of recording contacts shows its weakness with respect to privacy. I repeat my previous comment: curiousity and self-protection are powerful drivers of human actions.

I close with a couple of reflections.

The first is that for the assumptions in the documentation on the temporal width of the contagious window references to scientific literature are not provided. Furthermore, the fact that they are inconsistent with each other is not comforting.

The second is a side remark that gave me a lot to think about. Speaking of the probability that the "Cuckoo filter" makes a mistake, it is specified that the parameters can be tuned so as to «allow extensive use of the system without errors for several years» (bold is mine) ...

Nessun commento:

Posta un commento

Sono pubblicati solo i commenti che rispettano le norme di legge, le regole della buona educazione e sono attinenti agli argomenti trattati: siamo aperti alla discussione, non alla polemica.