Data exhaust, also known as digital secondary data, refers to a data set, unique digital components, or trail of data left by any activity or interactions with a computing platform or device. Common platforms and devices can include The World Wide Web, mobile or desktop applications, and physical hardware; and can result in an immense amount of data exhaust in the form of cookies, temporary files, data points, transactions, log files, and more.[1] [2] Data exhaust is intentionally or unintentionally created, and then either stored unprocessed and unedited turned into primary data or discarded. It is a typical by-product associated with machines and online activities.[3] As an example, data exhaust would be the stored digital records of a person’s heartbeat from a heartbeat monitoring device before that data is used to discern anything or altered into a useful format. Other common examples can include streams coming in from web browsers, commands with Internet of Things (IoT) devices, or live security camera footage.
Classifications
editIntent
editData exhaust is created either intentionally or unintentionally through channels for different reasons and as the result of various actions.
Unintentional data exhaust is created by a user without their knowledge or deliberate action. This can sometimes be collected by another entity for the purpose of the creation of primary data.[3] As an example, some Google services will track your web-searches and this piece of data is unintentionally created data exhaust from the perspective of the user.[4]
Intentional data exhaust is deliberately created by the user. Intentional data exhaust can come in many formats and be created for different reasons. As an example, the digitally stored records of a person’s heartbeat from a heartbeat monitoring device is intentionally created data exhaust, that will then be turned into primary data so it can be read by the individual.
Primary data in relation to secondary data
editAccording to Tye Rattenbury primary data isn't a form of data exhaust. Primary data is the useful information that is the product of data exhaust and can be the "data that relates to the core function of your business".[3] As an example from the advertising industry, primary data may be the report, chart, or infographic which contains aggregated user information relating to user habits and trends.[5] Secondary data is the actual raw data collected from users and other sources, some of which is data exhaust, and is "everything else that's created along the way".[3] Through the process of manipulation, "data exhaust (or secondary data) can become primary data once a use is found for it."[6] A common use is by organizations to gain insight into consumer behaviour, tendencies and rationale and then these data sets are leveraged to build and maintain products and services, such as targeted advertising.[5][7][8][9]
Anonymity
editBy nature the unique digital components of data exhaust often won't have any personally identifiable information, however, this is not always the case and additionally once the data becomes part of a set or is aggregated in some way it may reveal information that allows individuals to be readily identified. To help mitigate this there are laws, guidelines, and councils worldwide aimed at protecting privacy. Once such example is The United States Federal Trade Commission's fair information practice principles, which are an established set of principles for addressing privacy concerns on which modern privacy laws are based.[10]
De-identification or data anonymization is the process of removing personally identifiable information from the data so an identity cannot be connected with the data. This can range from the removal of connection to a single individual, removal of connection to a group or category, or an even greater detachment; and at each stage, the information becomes more general and may become less useful. [11][10]
Notable uses and examples
editTargeted marketing
editThe market for targeted marketing is growing fast; mobile targeted Ad spending is estimated to reach $26.5 billion in 2019 and $38.7 billion in 2020 in the United States alone.[12] Targeted marketing profiles, which are often made up of contents from a users digital footprint, are created with or gleaned from data exhaust and these profiles enable "firms to estimate the value of consumers and target individual consumers with tailored ads".[13] Some researchers suggest a consumer's data is "among the most valuable assets that firms own".[13] Target marketing profiles are built on an individuals browsing history, past purchases, and many other indicators.[5] Firms looking to reach a specific audience can then have their ad only be seen by individuals who's profile match what the firm is looking for. As an example, a Canadian firm marketing environmentally friendly baby products can have their ad seen only by recent mothers, living in Canada, who have shown an interest in being environmentally responsible. This can aid the firm in generating the highest return on investment of ad spend and attempting to mitigate wasted expenditures on individuals with little to no likelihood of purchasing the product.[14]
Financial
editThe financial sector is a major source of data exhaust with examples such as ATM withdrawal records, logs of stock price changes and the multitude of data sets pertaining to different financial transactions. The output of this industry has been increasing rapidly due to the increased use and complexity of online banking, digital currencies, and the digitization of financial services as a whole. [15] This additional data exhaust has both potential risks and benefits attached to it; the potential benefits can be realized upon translating the data exhaust into meaningful insights.[5] As an example, it can be used to give banks a better understanding of their clients and guidance on providing better services and gaining a competitive advantage.[16][17] Potential risks are also increased as much of this data exhaust is personal and private records and other sensitive information.
One subtype of financial data exhaust is alternative data; this refers to data used to obtain insight into the investment process.[18][19] An example of this is with data exhaust in the form of millions of satellite images. Machine-learning software is able to analyze these images and identify changes from one day to another which can then be used to make predictions, identify trends and provide other competitive advantages. Current implementation includes analyzing "car parks of big-box retailers to estimate sales" and "looking at farmland to estimate corn yields."[19]
Medical
editOne common type of data exhaust is from the medical industry with examples like the heart rate measurements from heart rate recording devices, the video from cameras used during surgery, and the data preceding lab results. Often this data exhaust is never captured, or is saved only for a period of time long enough for it to be analyzed, or is overwritten when the device makes its next routine impression. However, when this data exhaust is saved or is converted into primary data, it can be of particular use and interest to certain entities.[5] Common parties would be medical professionals using the data to make diagnoses and understand conditions, however other notable examples include insurance companies and credit issuers. As an example an insurance company identifying an "area where a high incidence of a certain disease" has been spotted and raising premiums for them.[20]
Controversially, the sharing of medical data exhaust and medical data as a whole has been made easier as of 2011 with the Supreme Court ruling to allow pharmacies, drug makers and others to buy, sell, or share 'prescription records' for marketing purposes.[21]
Common examples and compiled potential storage required
editType of data exhaust | Size (1 day, 24 hours) | Size (7 days) | Size (31 days) | Size (365 days) |
---|---|---|---|---|
Fitbit Data1 | 69.13 kilobytes | 483.92 kilobytes | 2.14 megabytes | 25.23 megabytes |
Security Camera Footage2 | 5.28 gigabytes | 36.96 gigabytes | 163.68 gigabytes | 1.92 terabytes |
Averaged Internet Cookies3 | 123.20 kilobytes | 862.40 kilobytes | 3.81 megabytes | 44.97 megabytes |
Example Log File4 | 1.00 bytes | 7.00 bytes | 31.00 bytes | 365.00 bytes |
These are very general examples with great liberties used and are intended only to give a rough idea of potential file size over time.
- Fitbit Data: compromised of energy, distance, steps, active minutes, altitude and heart-rate, sampled at 15-second intervals based on one user.[22]
- Security Camera Footage: based on 720p footage at an avearge of 220 megabytes per hour.[23]
- Averaged Internet Cookies: based on an average of 88 sites visited per day, 14 cookies per site and 100 bytes per cookie.[24] [25] [26]
- Example Log File: based on 1 byte and 1 file a day, log files can range in size enormously but many are as small as 1 byte due to them only being text. There is no logical way to determin an average number of log files created per day.
Controversies and implications
editMany privacy issues and related concerns of the Information Age stem from data exhaust.[5][27][28][29][30] Organizations will often collect users data exhaust as it offers valuable insight into the user’s habits.[5][4] One way that data exhaust is compiled and categorized is in the form of a digital dossier also known as a digital footprint or digital shadow. The term digital dossier generally refers to the complete collection of information about an individual that is available on the internet. This data-set offers information on the “habits and interests of your online life” and “all your online data that includes secure private records as well as your public online identity. In addition to your contributions to the dossier, other people can make contributions to your dossier with pictures and other additions to the subset of data that makes up your public identity online”[31]
Security issues
editOne concern with data exhaust is with the technical nature of it and the security issues revolving around it. When it is transmitted, it is done so electronically via the Internet and wireless transmissions (such as Bluetooth or Wi-Fi) and these methods of transmission both have privacy and security risks associated with them.[32][33] Additional concerns can arise once this data reaches its next destination and implications involving the sale or exchange of data.[21] In some cases, this may be within the user’s device ecosystem, although it may also end up with other third parties with a variety of different intentions. These intentions can be direct-profit driven motives such as targeted advertising and creation of goods and services to be sold,[5] or collected for other reasons such as for reference or monitoring; two examples by governments or other regulatory authorities include, data mining by "pattern mining" or "subject-based data mining".
Consent issues
editOne risk associated with data exhaust, particularly in the form of a digital dossier, is the impact on users relating to lack of understanding and consent from 3rd parties using their information. Some of the typical data collected on an individual can include name, gender, birthdate, recent searches, websites visited, locations visited, location of employment, location of residence, and much more.[34] For a lot of this information, some form of consent must be acquired but the degree to which the user understands their consent is given can range greatly. A common form of acceptance is the acknowledgement of a website's privacy policy, however, in a study done on privacy policy and terms of service policy reading behavior by Jonathan Obar and Anne Oeldorf-Hirsch, they are often “too long and wordy” and such policies can simply be “overwhelming”.[29] The average length of a privacy policy of the top 75 most popular websites in the United States was 2500 words in 2008. Research done at Carnegie Mellon University used an assumed "average reading rate of 250 words per minute to find an average reading time of 10 minutes per policy".[35] The results of the study done by Obar and Oeldorf-Hirsch, showed 80% of participants (undergraduate students at a United States university) spent less than a minute reading the privacy policy while 74% skipped reading it altogether, opting for a 'quick join' option.[29] This research is in line with other opinions in the field suggesting many people don't read privacy policies and the user will simply click “I agree” without fully understanding what they are agreeing to and therefore they are "ineffective at informing" users and ineffective at obtaining consent.[36][37][38]
Additionally, some websites will not require their users to create an account nor any act of acceptance at all beyond a clause somewhere on the site stating that if you continue to use the website, service, etc., you are automatically accepting the terms of service whether you have read them or not.[39]
Ownership issues
editOne issue relating to data exhaust is the dilemma of who owns it.[32] Three of the common parties who attempt to claim ownership over it are the user who created the data, the entity who has collected the data and third parties who have accessed the data. As an example, with the data exhaust from a heart rate monitor in a hospital, all 3 of these parties may have a vested interest in the data exhaust. First, the user who created the data would be the patient whose heart rate is being measured. Their claim to ownership of this data is based on the heart rate being measured being theirs. The second party may be the hospital who owns the electrocardiograph or employs the staff taking the measurement. This is the beginning of the controversy and a "particularly troublesome" question and "it has been the object of recurrent, highly-publicized lawsuits and congressional inquiries" according to researchers from the University of California. [32] A third party who may also claim ownership of this data exhaust is any third party who may have accessed the data. This can be entities who have edited, managed, purchased or have had any other hand in its existence. Within the heart rate scenario, an example of this third party may be an insurance company who would like to use the heart rate reading results to assess the health of the patient[40] or a cloud storage company who manages and stores the data. It is estimated there is "thousands of terabytes of electronic human data generated annually in North America and Europe" and its storage, access and "how it is governed needs to be assessed." [32]
Long-term effects and implications
editWhile much of the data exhaust that is generated has no impact and serves no purpose beyond its initial one, some is used by organizations and regulatory institutions to assess an individual in a number of ways. Particularly as data exhaust is converted into primary data and becomes a part of an individuals digital dossier, it can begin to have ramifications, specifically concerning inaccuracies and inconsistencies.[41] As an example, when an individual makes a purchase with a credit card and then at a later date pays it off, a number of records of the transactions are created and assigned to the individual in some form. Throughout this process, data exhaust is created and some of it is stored and used in the creation or addition of a credit score.[42] Certain conditions can alter the process used to analyses and create this data exhaust which can result in inaccuracies as well as variations of the end result. “In one study of 500,000 files, 29% of consumers had credit scores that differed by at least fifty points between credit bureaus. Fifty points can mean tens of thousands of dollars in extra payments over the life of a mortgage”.[28] This is one example of an effect of data exhaust, but there are many others and implications concerning the ways that it can be used to connect "the dots of past behaviour to predict future” behaviour. [28] The implications of this result in the dividing of society into 'targets' and 'waste'; this is known as 'The Scored Society' a term coined by Frank Pasquale in his book "The Black Box Society: The Secret Algorithms That Control Money and Information".[43]
Solutions
editIn many cases, prevention of data exhaust at the source is impossible due to its nature. Methods for prevention at the source would often include physical manipulation of the hardware or software generating the data exhaust. Simple alternatives, however, include prevention of long-term storage, or prevention of sharing.
Prevention of long-term storage
editPrevention of long-term storage of data exhaust is one method of many to counter some of it's negative effects. As an example, Fitbit activity trackers generate data exhaust in the form of heart rate measurements, steps taken, and other fitness-related metrics. Most Fitbit devices store detailed minute-by-minute data for five to seven days and store daily totals for up to 30 days. [44] After these periods the Fitbit will begin to overwrite the oldest entries with the latest impressions thereby erasing any data exhaust that was generated. If the Fitbit tracker is paired with another device then this data may be transferred off the device which may result in a copy somewhere else.
Prevention of sharing
editPrevention of sharing of data exhaust is one method of many to counter some of its negative effects. To prevent the sharing of data exhaust, the data must not typically leave the host device. Common forms can include transferring wirelessly or via a physical connection. As an example, the data exhaust from certain Fitbit activity trackers can be transmitted wirelessly via Bluetooth or via USB to a secondary device often a phone or personal computer. [44] Additionally, some users may choose to share their data even further, as an example, with the Google Cloud for healthcare platform. This platform allows users to integrate their data "with electronic medical records", "give doctors the ability to remotely monitor a patient’s condition", and a variety of other functions.[45][40]
Other common forms of preventing a device from sharing data exhaust, particularly when browsing the web, including using secure connections, firewalls, VPNs, and other internet security and safety methods.[46]
Acceptance
editAn alternative to prevention of allowing others to access data exhaust is acceptance. This can be the result of many factors, two of which are acceptance due to resignation and acceptance due to virtuousness. Acceptance due to resignation can be the result of resignation for many different reasons. "Resignation occurs when a person believes an undesirable outcome is inevitable and feels powerless to stop it."[47] In this scenario, an individual is typically aware their data exhaust is being collected and used and simply feels as though they have no control over it, or the costs to benefits associated with this level of privacy are insignificant. Additionally, certain prevention methods may result in a decreased quality of experience, product or service.[47] Acceptance due to resignation can also be the result of acceptance in exchange for some benefit. This commonly includes the exchange of personal information, purchasing habits, and web tracking for discounts, personalized shopping experiences, and loyalty points. Notable examples of business capitalizing on this acceptance are Air Miles, Ebates, and other loyalty or points programs. However, Joseph Turow and other researchers suggest that "many feel those tradeoffs are unfair" and the value of the information compared to the discount is tipped in favour of the marketing industry. This misconception is known as 'The Tradeoff Fallacy' which is further explained in the report from the Annenberg School for Communications.[47]
Acceptance due to virtuousness is the result of acceptance to an individuals data exhaust being collected and used, due to perceived moral excellence and a clear ethical conscience. This is a theory that's greatly expanded upon by Daniel Solove in "'I've Got Nothing to Hide' and Other Misunderstandings of Privacy"; and suggests that an individual may have little or no concern for their personal information being used, tracked, or collected as they have nothing to hide.[27] Solove suggests, however, the problem with this expands beyond the individual and that privacy is about hiding more than misdoings. He suggests, this notion leaves other users vulnerable and instigates a culture of acceptance and compliance with the removal of privacy and other basic rights.[27]
See also
editReferences
edit- ^ Rouse, Margaret. "What is data exhaust?". WhatIs.com. TechTarget, Inc. Retrieved 26 October 2018.
- ^ Mcgrath, Michael; Ni Scanaill, Cliodhna (2013). "Chapter 7, The Data Economy of Biosensors". Sensor technologies : healthcare, wellness, and environmental applications. New York City, New York, United States: ApressOpen. ISBN 978-1430260134. Retrieved 15 October 2018.
- ^ a b c d Noyes, Katherine. "5 things you need to know about data exhaust". PCWorld. IDG Communications. Retrieved 29 October 2018.
- ^ a b Google. "Privacy Control". Google. Alphabet Inc. Retrieved 15 October 2018.
{{cite web}}
:|last1=
has generic name (help) - ^ a b c d e f g h Riederer, Christopher; Erramilli, Vijay; Chaintreau, Augustin; Krishnamurthy, Balachander; Rodriguez, Pablo (2011). "For sale : your data". Proceedings of the 10th ACM Workshop on Hot Topics in Networks - HotNets '11. New York, New York, USA: ACM Press: 1–6. doi:10.1145/2070562.2070575. ISBN 9781450310598. S2CID 9858581.
- ^ Hyde, Kevin (2017-06-13). "What Is Data Exhaust? Cutting Through the Fumes". Capture Higher Ed. Retrieved 2018-12-11.
- ^ Gondek, Chris. "Secondary Data is Valuable - Use it Right". Commvault. Commvault. Retrieved 15 October 2018.
- ^ Lazer, David; Kennedy, R; King, Gary; Vespignani, A (March 2014). "The Parable of Google Flu: Traps in Big Data Analysis" (PDF). Science. 343 (6176): 1203–1205. doi:10.1126/science.1248506. PMID 24626916. S2CID 206553739. Retrieved 26 October 2018.
- ^ Siemens, George; Baker, Ryan (April 2012). "Learning Analytics and Educational Data Mining: Towards Communication and Collaboration" (PDF). LAK '12 Proceedings of the 2nd International Conference on Learning Analytics and Knowledge: 252-254. doi:10.1145/2330601.2330661. S2CID 207196058. Retrieved 26 October 2018.
- ^ a b Federal Trade Commission. "Privacy Online, Fair Information Practices In The Electronic Marketplace" (PDF). www.ftc.gov. Retrieved 2018-12-17.
- ^ Hoffman, Donna L. (2006). "Information Privacy in the Marketspace: Implications for the Commercial Uses of Anonymity on the Web". The Information Society. 15 (2): 129–139. doi:10.1080/019722499128583. Retrieved 2018-12-10.
- ^ "2018 U.S. Local Advertising Forecast: Mobile and Social". BIA Advisory Services. BIA/Kelsey. Retrieved 2018-12-17 – via https://www.emarketer.com/Chart/US-Mobile-Location-Targeted-Ad-Spending-2017-2022-billions/216275.
{{cite web}}
: External link in
(help)|via=
- ^ a b Zhao, Xia; Xue, Ling (2012). "Competitive Target Advertising and Consumer Data Sharing". Journal of Management Information Systems. 29 (3): 189–222. doi:10.2753/mis0742-1222290306. S2CID 35878889. Retrieved 2018-12-17.
- ^ SIEGEL, WILLIAM. "SAGE Journals: Your gateway to world-class journal research". SAGE Journals. doi:10.1177/004728759002800312. S2CID 154441632. Retrieved 2018-12-17.
- ^ Newman, Daniel. "Top 5 Digital Transformation Trends In Financial Services". Forbes. Retrieved 2018-12-12.
- ^ Rieker, Falk. "Embracing Digital Transformation: The Future of Banking". www.digitalistmag.com. Retrieved 2018-12-04.
- ^ Barnes, Dan (2017-07-02). "The role of data in gaining valuable financial insights". Raconteur. Retrieved 2018-12-04.
- ^ Flanagan, Terry (2016-12-07). "'Early Days' For Alternative Data". Markets Media. Retrieved 2018-12-10.
- ^ a b Z., W. (2016-08-22). "Why investors want alternative data". The Economist. ISSN 0013-0613. Retrieved 2018-12-10.
- ^ King, Leo. "Alarm Over The 'Gold Rush' For Citizens' Big Data". Forbes. Retrieved 2018-12-04.
- ^ a b Savage, David G. (2011-06-24). "Supreme Court sides with pharmaceutical industry in two decisions". Los Angeles Times. ISSN 0458-3035. Retrieved 2018-12-12.
- ^ Catplace. "Fitbit Community". Fitbit Community. Retrieved 18 December 2018.
- ^ "Cisco Collaboration". Cisco Collaboration. Retrieved 18 December 2018.
- ^ "NIELSEN PROVIDES TOPLINE U.S. WEB DATA FOR MARCH 2010". The Nielsen Company. Retrieved 18 December 2018.
- ^ Lubic, Paul. "Tracking Cookies: How many Does Your Computer Have?". Paul's Internet Security Blog. Retrieved 18 December 2018.
- ^ Notenboom, Leo. "Will deleting cookies free up room in my computer's memory?". Ask Leo. Retrieved 18 December 2018.
- ^ a b c Solove, Daniel (2007). "'I've Got Nothing to Hide' and Other Misunderstandings of Privacy". scholarship.law.gwu.edu. Retrieved 2018-12-17.
- ^ a b c Pasquale, Frank (2015-01-31). The Black Box Society. Cambridge, MA and London, England: Harvard University Press. doi:10.4159/harvard.9780674736061. ISBN 9780674736061.
- ^ a b c Obar, Jonathan A.; Oeldorf-Hirsch, Anne (2016). "The Biggest Lie on the Internet: Ignoring the Privacy Policies and Terms of Service Policies of Social Networking Services". SSRN Electronic Journal. doi:10.2139/ssrn.2757465. ISSN 1556-5068.
- ^ Kiss, Jemima (2010-08-20). "Does technology pose a threat to our private life?". The Guardian. ISSN 0261-3077. Retrieved 2018-12-17.
- ^ Cheney, Mark (2012-05-15). "Do You Know What's In Your Digital Dossier?". Big Think. Retrieved 2018-12-12.
- ^ a b c d Meingast, Marci; Roosta, Tanya; Sastry, Shankar (August 2006). "Security and Privacy Issues with Health Care Information Technology" (PDF). 2006 International Conference of the IEEE Engineering in Medicine and Biology Society. 2006. New York, NY, USA: IEEE: 5453–5458. doi:10.1109/IEMBS.2006.260060. ISBN 1-4244-0032-5. ISSN 1557-170X. PMID 17946702. S2CID 1784412. Retrieved 29 October 2018.
- ^ Xiong, Jie; Jamieson, Kyle (2013). "SecureArray". Proceedings of the 19th Annual International Conference on Mobile Computing & Networking - MobiCom '13. New York, New York, USA: ACM Press: 441. doi:10.1145/2500423.2500444. ISBN 9781450319997. S2CID 1509956.
- ^ Haselton, Todd (2017-12-06). "What does Google know about me?". www.cnbc.com. Retrieved 2018-12-11.
- ^ McDonald, Aleecia; Cranor, Lorrie Faith (2008). "The Cost of Reading Privacy Policies" (PDF). lorrie.cranor.org. Retrieved 2018-12-11.
- ^ Schaub, Florian. "Nobody reads privacy policies – here's how to fix that". The Conversation. Retrieved 2018-12-11.
- ^ Milne, George R.; Culnan, Mary J. (2004-01-01). "Strategies for reducing online privacy risks: Why consumers read (or don't read) online privacy notices". Journal of Interactive Marketing. 18 (3): 15–29. doi:10.1002/dir.20009. ISSN 1094-9968. S2CID 167497536.
- ^ Fernback, Jan; Papacharissi, Zizi (2007). "Online privacy as legal safeguard: the relationship among consumer, online portal, and privacy policies". New Media & Society. 9 (5): 715–734. doi:10.1177/1461444807080336. ISSN 1461-4448. S2CID 6197762.
- ^ Temming, Maria (2018-04-27). "Website privacy policies don't say much about how they share your data". Science News. Retrieved 2018-12-12.
- ^ a b "Life Insurance and Fitbit Data". Healthline. 2018-10-01. Retrieved 2018-12-12.
- ^ "How Your Credit Score Impacts Your Financial Future | FINRA.org". www.finra.org. Retrieved 2018-12-13.
- ^ "What is a Credit Score & How is it Calculated in Canada? | My Money Coach". www.mymoneycoach.ca. Retrieved 2018-12-13.
- ^ "The Black Box Society — Frank Pasquale | Harvard University Press". www.hup.harvard.edu. Retrieved 2018-12-13.
- ^ a b "How do Fitbit devices sync their data?". Fitbit Help. Retrieved 2018-12-12.
- ^ Becker, Kraig (2018-04-30). "Fitbit to Use Google Cloud to Share Medical Data with Doctors". Digital Trends. Retrieved 2018-12-12.
- ^ Price, Emily. "What You Need to Know About Golf's New 2019 Rules". Lifehacker. Retrieved 2018-12-17.
- ^ a b c Turow, Joseph; Hennessy, Michael (2015). "The Tradeoff Fallacy: How Marketers are Misrepresenting American Consumers and Opening Them Up to Exploitation". SSRN Electronic Journal. doi:10.2139/ssrn.2820060. ISSN 1556-5068.
Category:Digital technology Category:Privacy Category:Internet privacy Category:Computing Category:Data