jrtom

http://dataphiles.blogspot.com/2009/09/machine-learning-protest-at-g20.html

Heh heh heh. :)

http://science.slashdot.org/science/08/02/05/1852201.shtml

:)

http://science.slashdot.org/science/07/07/13/1313257.shtml
referencing
http://sciam.com/print_version.cfm?articleID=6A2EF194-E7F2-99DF-3323DA6BA4346B0B

http://www.netflixprize.com/

I so don't have time for this right now, but it sounds like a really interesting problem. Not sure I'm eligible to participate (my PhD advisor is apparently one of the judges, which I think explains a conversation that I had with him a couple of weeks ago involving a lot of intentional vagueness on his part), anyway, but I bet that I could do a pretty decent job if I had the time to sink into it. (Especially considering the data that they have but are not using...)

argh.

Current Mood: weird

From

fdmts: A Face Is Exposed for AOL Searcher No. 4417749

In essence, AOL recently released 20 million anonymized search queries. However, it turns out that it's not that hard to figure out who someone is based on what they're searching for, as the article details.

I deal with this sort of issue on a continuing basis as part of my profession (among other things, I do research on learning models for social network analysis). In some cases, the data is inarguably public: no one really minds if I analyze the network defined by the "writes-a-paper-with" relation. But in other cases, it's been drilled into the heads of researchers--supposedly--that anonymization is required in order to release data, and often in order to get it in the first place.

The problem is, of course, that clearly anonymization isn't sufficient in this case.

It's a tricky problem, of course; we can't do research if we don't have data to work with, and there are valuable things that can be learned from such data that _don't_ involve violating peoples' privacy. I guess the question is, if it's necessary to collect such data in the first place, and to study it, is there anything in addition to anonymization that can be done to prevent this sort of 'reverse-engineering' of someone's identity? (Obviously AOL shouldn't have released the data publically in the first place…but the point is that by current standards they probably thought that it wouldn't do any harm because it was anonymized.) Aggregating it isn't the answer, because then you lose much of the information that made the data valuable in the first place.

*ponder*

Current Mood: pensive
Current Music: the hiss of central air conditioning

Amygdala: Blue in the Face

This is mostly a placeholder in case I come back to this later, but this blogger suggests that the reason why Bush & co. didn't get the warrants was that they were doing large-scale pattern analysis on the communications of tens of thousands of people (or more) . . . thus making acquiring warrants impractical at best.

This kind of analysis is precisely what I do in my research. I have no doubt whatsoever that I could get a job with the CIA or NSA to simply continue doing what I've been doing. Let me be clear: I don't think that there's anything ethically wrong with the research qua research; the evil, if any, is in how it is used.

But it still itches me.

Current Mood: hmm.
Current Music: Gabriel, Peter - Up - Growing Up

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Entries tagged with data mining

machine learning protest at G20

toddlers, language acquisition, and machine learning

more on privacy in practice

Netflix recommendation contest

what you can learn from (others') search results

pointer to reflections on NSA surveillance of USA citizens

Profile

Navigation

May 2011

Syndicate

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags