Talk:Differential privacy

Latest comment: 5 months ago by 89.189.87.253 in topic Content could be extracted

Content could be extracted edit

Content could be extracted from another source based on the format and also the unique massive initial edit. Utopiah (talk) 19:04, 16 January 2010 (UTC)Reply

@ 89.189.87.253 (talk) 21:07, 5 December 2023 (UTC)Reply

Style Guide edit

I doubt that the style guide should be part of an encyclopedic article on differential privacy, but it still might be useful to follow the suggested capitalization rules for article writers. If I hear no objections, I'm planning to remove the style guide from the article and add it here. 141.201.13.113 (talk) 12:39, 26 November 2018 (UTC)Reply

"coins of the algorithm" edit

What does "coins of the algorithm" mean? Is this correct usage? 129.6.223.113 (talk) 22:36, 27 February 2015 (UTC)Reply

It is correct usage within the cryptography community, but I agree that it is super-confusing. I'll be getting it out over the next few weeks, as I continue in my total rewrite of this section.Simsong (talk) 02:30, 18 September 2018 (UTC)Reply

Please finish example edit

Please finish the example and show how to make the diabetes database differentially private. — Preceding unsigned comment added by 129.6.223.113 (talk) 22:40, 27 February 2015 (UTC)Reply

I'm probably going to replace this with a different example, one that is less loaded and easier to understand.Simsong (talk) 02:31, 18 September 2018 (UTC)Reply

lay people cannot read the formular edit

the example should be understandable without the mathematical formular. — Preceding unsigned comment added by 87.78.208.167 (talk) 22:02, 13 August 2015 (UTC)Reply

Agreed. After reading the article I still have no idea what differential privacy is or how it is going to counter the given de-anonymization efforts. 84.245.149.53 (talk) 13:39, 31 August 2016 (UTC)Reply

2001:4898:80E8:C:0:0:0:563 (talk) 18:44, 2 May 2017 (UTC) In particular, I found the equation (1/4)(1-p) + (3/4)p = (1/4) + p/2 to be weird. The (1/4) + p/2 is reasonably obvious just from the description of the coin flips, but the longer equation on the left side isn't obvious at all. Worse, the obvious equation people want is not, given a p, what is the result, but rather, given the result, what is the p.Reply

Perhaps this is a better formulation: Thus, if p is the true proportion of people with A, we can expect an actual result of 1/4 just from getting a tails on the first coin flip and then a tails, and p/2 when we get a heads on the first coin flip. Reversing the equation, given an actual result R, the best estimate of p is (R - 1/4) * 2.

2001:4898:80E8:C:0:0:0:563 (talk) 18:44, 2 May 2017 (UTC)Reply

Definition of ε-stability edit

The definition of ε-stability assumes two datasets,   and  , that differ only on a single element. It is not specified what difference this should be, wheter that element is present or absent in either set or whether its attributes are simply different. However, the formula for the exact definition explicitly establishes an ordering in which the probability related to   is bounded by that related to  . Why this ordering?

Besides, I agree with some above statemente that the term "the coins of the algorithm" is not clear. Maybe a link to the wikipedia page that clarifies this would help. Elferdo (talk) 08:36, 19 August 2015 (UTC)Reply

another famous example? edit

AOL search logs - does this count as another example? — Preceding unsigned comment added by Adsah98 (talkcontribs) 17:02, 7 February 2016 (UTC)Reply

No, it is not a good example. It should not be in this article. It should be in an article on de-identification. The original author of this article was confused between the two concepts. Simsong (talk) 02:32, 18 September 2018 (UTC)Reply

Link to reference not working anymore edit

The link to reference 21, differential privacy at iOS, does not work anymore. 185.87.72.149 (talk) 14:17, 21 August 2017 (UTC)Reply

PATE algorithm and utility/privacy trade-off edit

The authors of the PATE algorithm (https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/0e08bda44d22e076d15edc45afcb2e1a7a231a84.pdf) claim that their approach can answer an unlimited number of queries, by training an ancillary model with a finite number of privacy preserving queries to an ensemble of models (the ancillary model gets only an obfuscated consensus among the members of the ensemble). Further queries to the ancillary model, according to them, do no imply additional loss of privacy. This seems to me to increase the applicability of DP.

Do we really want this article to reference every DP algorithm? There are hundreds of them now. Simsong (talk) 02:33, 18 September 2018 (UTC)Reply

Synopsis edit

The current version of this differential privacy article begins by attributing differential privacy to a patent application by Dwork and McSherry.

The correct attribution is:

Dwork C., McSherry F., Nissim K., Smith A. (2006) Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi S., Rabin T. (eds) Theory of Cryptography. TCC 2006. Lecture Notes in Computer Science, vol 3876. Springer, Berlin, Heidelberg[1]

In terms of the date, the submission deadline for this publication was September 6, 2005.[2] That is over 3 months before the patent was submitted.

After the differential privacy took off, this original paper was revised and republished in 2017.[3] In the 2017 version, the history of differential privacy is spelled out: "In the initial version of this paper, differential privacy was called indistinguishability. The name “differential privacy” was suggested by Michael Schroeder, and was first used in Dwork (2006)." (page 13, section 2.2)

The odd history with the name probably explains the confusion, but the wikipedia article should be corrected. --73.93.186.114 (talk) 05:56, 29 January 2019 (UTC)Reply

I have deleted the synopsis. After removing the inaccurate info, there was nothing left. 73.93.186.114 (talk) 04:51, 30 January 2019 (UTC)Reply

References

Differential privacy doesn't contradict the "coin toss" example edit

While it is true that differential privacy is sometimes considered to be design to protect the identity of the participants of the database, it is more of a design question than a property of differential privacy. The core difference rises from the notion of a neighboring database. One can define the neighboring databases as those where the only change is in the users' private information, and not in their identifying information (a reasonable use case being where the identity of the participants is widely known). Furthermore, a common variation of differential privacy, known as local differential privacy, talks about mechanisms that behave exactly like the "coin flip" mechanism, i.e. where users randomize responses themselves (instead of, for example, a trusted curator). Finally, the concept that differential privacy requires combining the data to a single output is misinterpreted in this paragraph, as this hints that things like synthetic databases do not conform to the differential privacy requirements, which is visibley not true, as the differential privacy techniques for generating a synthetic database is commonly studied and there exist provably working solutions for this problem. Vexlerneil (talk) 16:15, 4 April 2019 (UTC)Reply

What algorithms are we talking about? Do you mean database queries? You mean omission of variables is an algorithm? Are you sure they are "algorithms"? edit

Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. 68.134.243.51 (talk) 14:22, 7 August 2022 (UTC)Reply

Hypman edit

Hype 102.90.45.231 (talk) 11:18, 21 March 2023 (UTC)Reply

@ 2A01:5EC0:1801:C567:FD00:AA66:596E:C57D (talk) 07:14, 26 March 2023 (UTC)Reply

This article should not contain mathematical proofs; it is too technical edit

Wikipedia is not a math textbook. This article contains mathematical proofs of things like the Laplace notation, and it is far too technical. It needs dramatic simplification to be useful to the general wikipedia reader. Simsong (talk) 16:57, 30 September 2023 (UTC)Reply

Randomized response should not be the first example of using DP edit

DP does really bad in local modem and randomized response. We should have better examples as the initial examples. Simsong (talk) 16:57, 30 September 2023 (UTC)Reply