Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Statistical database
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Privacy in statistical databases == In a statistical database, it is often desired to allow query access only to aggregate data, not individual records. Securing such a database is a difficult problem, since intelligent users can use a combination of aggregate queries to derive information about a single individual. Some common approaches are: * only allowing aggregate queries (SUM, COUNT, AVG, STDEV, etc.) * rather than returning exact values for sensitive data like income, only return which partition it belongs to (e.g. 35k-40k) * return imprecise counts (e.g. rather than 141 records met query, only indicate 130-150 records met it.) * do not allow overly selective WHERE clauses * audit all users queries, so users using system incorrectly can be investigated * use intelligent agents to detect automatically inappropriate system use For many years, research in this area was stalled, and it was thought in 1980 that, to quote: :The conclusion is that statistical databases are almost always subject to compromise. Severe restrictions on allowable query set sizes will render the database useless as a source of statistical information but will not secure the confidential records.<ref>Dorothy E. Denning, Peter J. Denning, and Mayer D. Schwartz, "The Tracker: A Threat to Statistical Database Security", ''ACM Transactions on Database Systems (TODS)'', volume 4, issue 1 (March 1979), pages: 76-96, {{doi|10.1145/320064.320069}}.</ref> But in 2006, [[Cynthia Dwork]] defined the field of [[differential privacy]], using work that started appearing in 2003. While showing that some semantic security goals, related to work of [[Tore Dalenius]], were impossible, it identified new techniques for limiting the increased privacy risk resulting from inclusion of private data in a statistical database. This makes it possible in many cases to provide very accurate statistics from the database while still ensuring high levels of privacy.<ref>{{Cite web |last=Hilton |first=Michael |s2cid-access=free |title=Differential Privacy: A Historical Survey|s2cid=16861132 |url=https://pdfs.semanticscholar.org/4c99/097af05e8de39370dd287c74653b715c8f6a.pdf|archive-url=https://web.archive.org/web/20170301180826/https://pdfs.semanticscholar.org/4c99/097af05e8de39370dd287c74653b715c8f6a.pdf|url-status=dead|archive-date=2017-03-01}}</ref><ref>{{Cite book|title=Theory and Applications of Models of Computation|last=Dwork|first=Cynthia|date=2008-04-25|publisher=Springer Berlin Heidelberg|isbn=9783540792277|editor-last=Agrawal|editor-first=Manindra|series=Lecture Notes in Computer Science|pages=1β19|language=en|chapter=Differential Privacy: A Survey of Results|volume=4978 |doi=10.1007/978-3-540-79228-4_1|editor-last2=Du|editor-first2=Dingzhu|editor-last3=Duan|editor-first3=Zhenhua|editor-last4=Li|editor-first4=Angsheng|chapter-url=https://www.microsoft.com/en-us/research/publication/differential-privacy-a-survey-of-results/ |via=Microsoft }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)