Question: The AOL search-query database released on the Web (Section 2.1.2) incl
ID: 667415 • Letter: Q
Question
Question: The AOL search-query database released on the Web (Section 2.1.2) included the search query “How to kill your wife” and other related queries by the same person. Give arguments for and
against allowing law enforcement agents to search the query databases of search engine companies
periodically to detect plans for murders, terrorist attacks, or other serious crimes so that they can
try to prevent them.
Here is Section 2.1.2 attached
2.1.2 New Technology, New Risks
Computers, the Internet, and a whole array of digital devices—with their astounding
increases in speed, storage space, and connectivity—make the collection, searching,
analysis, storage, access, and distribution of huge amounts of information and images
much easier, cheaper, and faster than ever before. These are great benefits. But when the
information is about us, the same capabilities threaten our privacy.
Today there are thousands (probably millions) of databases, both government and
private, containing personal information about us. In the past, there was simply no
record of some of this information, such as our specific purchases of groceries and books.
Government documents like divorce and bankruptcy records have long been in public
records, but accessing such information took a lot of time and effort. When we browsed in
a library or store, no one knew what we read or looked at. It was not easy to link together
our financial, work, and family records. Now, large companies that operate video, email,
social network, and search services can combine information from a member’s use of
all of them to obtain a detailed picture of the person’s interests, opinions, realtionships,
habits, and activities. Even if we do not log in as members, software tracks our activity
on the Web. In the past, conversations disappeared when people finished speaking, and
only the sender and the recipient normally read personal communications. Now, when we
communicate by texting, email, social networks, and so on, there is a record of our words
that others can copy, forward, distribute widely, and read years later. Miniaturization
of processors and sensors put tiny cameras in cellphones that millions of people carry
everywhere. Cameras in some 3-D television sets warn children if they are sitting too
close. What else might such cameras record, and who might see it? The wireless appliances
we carry contain GPS and other location devices. They enable others to determine our
location and track our movements. Patients refill prescriptions and check the results of
medical tests on the Web. They correspond with doctors by email. We store our photos
2.1 Privacy Risks and Principles 51
and videos, do our taxes, and create and store documents and financial spreadsheets in a
cloud of remote servers instead of on our own computer. Power and water providers might
soon have metering and analysis systems sophisticated enough to deduce what appliances
we are using, when we shower (and for how long), and when we sleep. Law enforcement
agencies have very sophisticated tools for eavesdropping, surveillance, and collecting and
analyzing data about people’s activities, tools that can help reduce crime and increase
security—or threaten privacy and liberty.
Combining powerful new tools and applications can have astonishing results. It is
possible to snap a photo of someone on the street, match the photo to one on a social
network, and use a trove of publicly accessible information to guess, with high probability
of accuracy, the person’s name, birth date, and most of his or her Social Security number.
This does not require a supercomputer; it is done with a smartphone app. We see such
systems in television shows and movies, but to most people they seem exaggerated or way
off in the future.
All these gadgets, services, and activities have benefits, of course, but they expose us
to new risks. The implications for privacy are profound.
Patient medical information is confidential. It should not be discussed
in a public place.
—A sign, aimed at doctors and staff, in an elevator in a medical office
building, a reminder to prevent low-tech privacy leaks.
Example: Search query data
After a person enters a phrase into a search engine, views some results, then goes on to
another task, he or she expects that the phrase is gone—gone like a conversation with a
friend or a few words spoken to a clerk in a store. After all, with millions of people doing
searches each day for work, school, or personal uses, how could the search company store
it all? And who would want all that trivial information anyway? That is what most people
thought about search queries until two incidents demonstrated that it is indeed stored, it
can be released, and it matters.
Search engines collect many terabytes of data daily. A terabyte is a trillion bytes.
It would have been absurdly expensive to store that much data in the recent past, but
no longer. Why do search engine companies store search queries? It is tempting to say
“because they can.” But there are many uses for the data. Suppose, for example, you search
for “Milky Way.” Whether you get lots of astronomy pages or information about the
candy bar or a local restaurant can depend on your search history and other information
about you. Search engine companies want to know how many pages of search results
users actually look at, how many they click on, how they refine their search queries, and
what spelling errors they commonly make. The companies analyze the data to improve
52 Chapter 2 Privacy
search services, to target advertising better, and to develop new services. The database of
past queries also provides realistic input for testing and evaluating modifications in the
algorithms search engines use to select and rank results. Search query data are valuable to
many companies besides search engine companies. By analyzing search queries, companies
draw conclusions about what kinds of products and features people are looking for. They
modify their products to meet consumer preferences.
But who else gets to see this mass of data? And why should we care?
If your own Web searches have been on innocuous topics, and you do not care who
sees your queries, consider a few topics people might search for and think about why
they might want to keep them private: health and psychological problems, bankruptcy,
uncontrolled gambling, right-wing conspiracies, left-wing conspiracies, alcoholism, antiabortion information, pro-abortion information, erotica, illegal drugs. What are some
possible consequences for a person doing extensive research on the Web for a suspense
novel about terrorists who plan to blow up chemical factories?
In 2006, the federal government presented Google with a subpoena
[1]
for two months
of user search queries and all the Web addresses
†
that Google indexes.
‡
Google protested,
bringing the issue to public attention. Although the subpoena did not ask for names of
users, the idea of the government gaining access to the details of people’s searches horrified
privacy advocates and many people who use search engines. Google and privacy advocates
opposed the precedent of government access to large masses of such data. A court reduced
the scope of the subpoena, removing user queries.
4
A few months later, release of a huge database of search queries at AOL showed that
privacy violations occur even when the company does not associate the queries with people’s names. Against company policy, an employee put the data on a website for search
technology researchers. This data included more than 20 million search queries by more
than 650,000 people from a three-month period. The data identified people by coded
ID numbers, not by name. However, it was not difficult to deduce the identity of some
people, especially those who searched on their own name or address. A process calledreidentification identified others. Re-identification means identifying the individual from
a set of anonymous data. Journalists and acquaintances identified people in small communities who searched on numerous specific topics, such as the cars they own, the sports
teams they follow, their health problems, and their hobbies. Once identified, a person is
linked to all his or her other searches. AOL quickly removed the data, but journalists,
[1]
A subpoena is a court order for someone to give testimony or provide documents or other information for an
investigation or a trial.
†
We use the term Web address informally for identifiers, or addresses, or URLs of pages or documents on the Web
(the string of characters one types in a Web browser).
‡
It wanted the data to respond to court challenges to the Child Online Protection Act (COPA), a law intended to
protect children from online material “harmful to minors.” (We discuss COPA in Section 3.2.2.)
2.1 Privacy Risks and Principles 53
researchers, and others had already copied it. Some made the whole data set available on
the Web again.
5[1]
Example: Smartphones
With so many clever, useful, and free smartphone apps available, who thinks twice about
downloading them? Researchers and journalists took a close look at smartphone software
and apps and found some surprises.
Some Android phones and iPhones send location data (essentially the location of
nearby cell towers) to Google and Apple, respectively. Companies use the data to build
location-based services that can be quite valuable for the public and for the companies.
(Industry researchers estimate the market for location services to be in the billions of
dollars.) The location data is supposed to be anonymous, but researchers found, in some
cases, that it included the phone ID.
Roughly half the apps in one test sent the phone’s ID number or location to other
companies (in addition to the one that provided the app). Some sent age and gender information to advertising companies. The apps sent the data without the user’s knowledge
or consent. Various apps copy the user’s contact list to remote servers. Android phones
and iPhones allow apps to copy photos (and, for example, post them on the Internet) if
the user permits the app to do certain other things that have nothing to do with photos.
(Google said this capability dated from when photos were on removable memory cards
and thus less vulnerable.
6
This is a reminder that designers must regularly review and
update security design decisions.)
A major bank announced that its free mobile banking app inadvertently stored
account numbers and security access codes in a hidden file on the user’s phone. A phone
maker found a flaw in its phones that allowed apps to access email addresses and texting
data without the owner’s permission. Some iPhones stored months of data, in a hidden
file, about where the phone had been and when, even if the user had turned off location
services. Data in such files are vulnerable to loss, hacking, and misuse. If you do not know
the phone stores the information, you do not know to erase it. Given the complexity of
smartphone software, it is possible that the companies honestly did not intend the phones
to do these things.
†
Why does it matter? Our contact lists and photos are ours; we should have control of
them. Thieves can use our account information to rob us. Apps use features on phones
that indicate the phone’s location, the light level, movement of the phone, the presence
of other phones nearby, and so on. Knowing where we have been over a period of time
(combined with other information from a phone) can tell a lot about our activities and
[1]
Members of AOL sued the company for releasing their search queries, claiming the release violated roughly 10
federal and state laws.
†
The various companies provided software updates for these problems.
54 Chapter 2 Privacy
1. Files on hundreds of thousands of students, applicants, faculty, and/or alumni from the
University of California, Harvard, Georgia Tech, Kent State, and several other universities,
some with Social Security numbers and birth dates (stolen by hackers).
2. Names, birth dates, and possibly credit card numbers of 77 million people who play video
games online using Sony’s PlayStation (stolen by hackers). Another 24 million accounts
were exposed when hackers broke into Sony Online Entertainment’s PC-game service.
3. Records of roughly 40 million customers of TJX discount clothing stores (T.J. Maxx,
More about the TJX
incident: Section 5.2.5
Marshalls, and others), including credit and debit card numbers and some
driver’s license numbers (stolen by hackers).
4. Bank of America disks with account information (lost or stolen in transit).
5. Credit histories and other personal data for 163,000 people (purchased from a huge
database company by a fraud ring posing as legitimate businesses).
6. Patient names, Social Security numbers, addresses, dates of birth, and medical billing
information for perhaps 400,000 patients at a hospital (on a laptop stolen from a hospital
employee’s car).
7. More than 1000 Commerce Department laptops, some with personal data from Census
questionnaires. (Thieves stole some from the cars of temporary Census employees; others,
employees simply kept.)
8. Confidential contact information for more than one million job seekers (stolen from
Monster.com by hackers using servers in Ukraine).
Figure 2.1 Lost or stolen personal information.
7
interests, as well as with whom we associate (and whether the lights were on). As we
mentioned in Section 1.2.1, it can also indicate where we are likely to be at a particular
time in the future.
Some of the problems we described here will have been addressed by the time you
read this; the point is that we are likely to see similar (but similarly unexpected) privacy
risks and breaches in each new kind of gadget or capability.
Stolen and lost data
Criminals steal personal data by hacking into computer systems, by stealing computers
and disks, by buying or requesting records under false pretenses, and by bribing employees
Hacking: Section 5.2
of companies that store the data. Shady information brokers sell data
(including cellphone records, credit reports, credit card statements,
medical and work records, and location of relatives, as well as information about financial
and investment accounts) that they obtain illegally or by questionable means. Criminals,
lawyers, private investigators, spouses, ex-spouses, and law enforcement agents are among
the buyers. A private investigator could have obtained some of this information in the
past, but not nearly so easily, cheaply, and quickly.
2.1 Privacy Risks and Principles 55
Another risk is accidental (sometimes quite careless) loss. Businesses, government
agencies, and other institutions lose computers, disks, memory cards, and laptops containing sensitive personal data (such as Social Security numbers and credit card numbers)
on thousands or millions of people, exposing people to potential misuse of their information and lingering uncertainty. They inadvertently allow sensitive files to be public
on the Web. Researchers found medical information, Social Security numbers, and other
sensitive personal or confidential information about thousands of people in files on the
Web that simply had the wrong access status.
The websites of some businesses, organizations, and government agencies that make
account information available on the Web do not sufficiently authenticate the person acMore about authentication techniques:
Section 5.3.2
cessing the information, allowing imposters access. Data thieves often
get sensitive information by telephone by pretending to be the person whose records they seek. They provide some personal information
about their target to make their request seem legitimate. That is one
reason why it is important to be cautious even with data that is not particularly sensitive
by itself.
Figure 2.1 shows a small sample of incidents of stolen or lost personal information
(the Privacy Rights Clearinghouse lists thousands of such incidents on its website). In
many incidents, the goal of thieves is to collect data for use in identity theft and fraud,
crimes we discuss in detail in Chapter 5.
A summary of risks
The examples we described illustrate numerous points about personal data. We summarize
here:
.
Anything we do in cyberspace is recorded, at least briefly, and linked to our
computer or phone, and possibly our name.
. With the huge amount of storage space available, companies, organizations, and
governments save huge amounts of data that no one would have imagined saving
in the recent past.
.
People often are not aware of the collection of information about them and their
activities.
.
Software is extremely complex. Sometimes businesses, organizations, and website
managers do not even know what the software they use collects and stores.8
.
Leaks happen. The existence of the data presents a risk.
.
A collection of many small items of information can give a fairly detailed picture
of a person’s life.
.
Direct association with a person’s name is not essential for compromising privacy.
Re-identification has become much easier due to the quantity of personal information stored and the power of data search and analysis tools.
56 Chapter 2 Privacy
.
If information is on a public website, people other than those for whom it was
intended will find it. It is available to everyone.
.
Once information goes on the Internet or into a database, it seems to last forever.
People (and automated software) quickly make and distribute copies. It is almost
impossible to remove released information from circulation.
.
It is extremely likely that data collected for one purpose (such as making a phone
call or responding to a search query) will find other uses (such as business planning,
tracking, marketing, or criminal investigations).
. The government sometimes requests or demands sensitive personal data held by
businesses and organizations.
.
We often cannot directly protect information about ourselves. We depend on the
businesses and organizations that manage it to protect it from thieves, accidental
collection, leaks, and government prying
Explanation / Answer
Technology helps people in implementing privacy in their day today lives. On the other hand, it has made it difficult for people to maintain privacy. In the current age of smartphones and tablets, there are various applications that can raid your privacy without you being even aware of it. There are various examples of technology being used to invade privacy.
Privacy plays a pivotal role in cases when accessing important information such as documents, account statements, and other personal information. However, there are various organisations, businesses, as well as individuals who feel there is not enough privacy.
The use of surveillance by government agencies to keep a check on crime also carries a risk of invading privacy and freedom of expression.
Benefits
The benefits of surveillance technologies are:
1) Improved ability to detect and prevent crime. By gathering information, crime can be contained. However, studies have shown that these benefits are variable in different scenarios.
2) It can be used to determine a suspect’s communications and activities. It can point to people and parties with whom the person has been communicating, the frequency and duration of such calls, the content, as well as the websites accessed.
3) By tracking and analysing large volumes of data, indicators of criminal activity can be tracked and placed under closer surveillance.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.