jrtom

Recently, my friend

naudiz made a very well-written philosophical rant on labeling and symbolism. I liked it a lot, but had a few things I wanted to say in response. Several paragraphs in I realized that I wanted to be able to find it later, so I moved it here and will put a link to it in her comments for that post.

(I've observed that the above post is locked, so you may not be able to read it. Sorry for the inconvenience if so. The material below does not directly depend on it, however.)

I've done my own ranting on the subject of labels and stereotypes and all that. (Generally not at such length, but then

naudiz is perhaps one of the few that tests out as occasionally more verbose than I. ;> )

There is a discipline within artificial intelligence called "machine learning"; I work in this area. Basically, machine learning folks work on creating systems that can solve problems like classification of objects, planning, scheduling, and so forth. It's a bit different from some other flavors of AI in the sense that we're not really trying to duplicate the solution strategies that humans use; it's an approach that owes more to fields like mathematics, probability, and game theory than it does to (say) cognitive science or neurobiology.

This is my own long-winded way of introducing the notion that classifying things--that is, giving them labels, so that we can say "this one is an X, and this one is a Y"--is in fact a really useful thing to be able to do well: it allows us to make decisions, like whether to cross the street now or not. (This could be cast as a classification problem either by considering the scene--are there cars passing in front of me or not?--or by identifying and interpreting the traffic symbols.) Generally speaking, humans are lousy at estimating odds (such as "what is the probability that I'll get hit by a car if I cross the street now?"), but they're really good at these kinds of classifications.

Where you can run into trouble in classifying (assigning labels to) things:
(1) your "training data" (the data that you used to learn how to classify things) is incorrect: someone handed you bad data. Long live the educational system.
(2) your training data is not reflective of the general population (too small, badly distributed, or obsolete)
(3) your training data doesn't have enough (or the right) features
(4) going backwards: trying to impute features from labels

(This whole thing is further complicated by the fact that people often don't mean the same things by a given label, or have the same associations with it. But we'll ignore that for the moment.)

(3) and (4) sound innocuous, but they're where the trouble shows up, in my opinion; time and experience often correct (1) and (2). If your training data has irrelevant or insufficient features, then you end up assigning labels based on crap. ("Oh, I can cross the street because it's Tuesday, and I crossed the street last Tuesday without incident.") (4) is often just as silly: the cases that comprise a given class are often really different from one another, and trying to figure out what someone's like from a single label can lead to useless speculation if you're honest with yourself, and wild inaccuracies if you're not.

So, to summarize: assigning labels is generally useful, and often necessary. The problems arise when you don't know enough to do it right, or if you try to reason about behavior or characteristics based on labels alone.