specifically, these two successive postings:
accident of berth
a couple points regarding my esteemed partner's post
I don't work for Google. I do know how PageRank (the algorithm described in the original paper) works, but even when Google first emerged that wasn't the only element in their overall ranking scheme--just the most well-known. For example, in order to decide what ranking to give a page, you have to consider not just how "popular" a page is (which is, roughly, what PageRank measures) but how well the page matches the query. (Some of the earliest search engine algorithms depended almost entirely on the latter datum.)
I can't say this for sure--I know a fair bit about network algorithms, not so much about how the software that the 'Net depends on works--but I'm reasonably certain that Google doesn't take traffic into account. Google, like other search engines, employs web crawlers that wander over the web and gather information about web pages. I don't know of any way that such a crawler could interrogate the server to ask it how many hits it's received recently. (If someone knows of such a way, by all means let me know.) So I believe that quinn's point (1) is correct.
Point (2): Yes, and anchor text which matches the search text, and so forth; I understand that Google generally tries to take the structure of the page into account. However, pure repetition of a particular term has a limited effect.
It's worth noting that reputation works, to a degree, in the off-line world in the same way that Robin bemoans it working courtesy of Google. The opinions of famous people--or the opinions of people that are associated, in the public eye, with those famous people--have an impact that's clearly all out of proportion with their level of experience or expertise in the opinion's domain. (Two words: celebrity endorsements.)
Academic paper Google rankings: speaking as a researcher in artificial intelligence and information retrieval (although, I admit, not one that specializes in computational linguistics) I don't think it likely that an efficient, effective automated approach to deriving a measure of the quality of a work--than seeing how often people talk about it/refer to it--is likely to be available soon.
(Side notes:
So, yes, where you put that paper will affect how many people find it when they go looking for it; this is why it's important to researchers to get published in the good journals and conferences. (Where "good", of course, means that the influential researchers get published there. The related ranking algorithm known as HITS (or hubs-and-authorities) may be of interest here.)
The one refinement that I know of that might be available soon would be to consider how a paper is referenced, i.e., positively or negatively (as a start). But--and this is important--it's not at all clear how that evaluation should be used to influence PageRank. Some of the most important and influential papers, I suspect, are those that are referenced precisely because of the way in which they are wrong, because they offer a useful jumping-off point.
To get at the "well, Cory doesn't write about Jeremiah, so why does his opinion of my cup of wrath matter?" problem: some recent research has focused on relative ranking schemes. PageRank assumes that, initially, the influence of any web page is equal. Some kinds of relative rankers assume that some specified 'reference' pages are the appropriate starting points, and go from there. (I can get more specific about how this works if anyone cares.) This is not the same as personalized web search, which generally changes how the search is specified and/or the results are interpreted. This provides a way to finesse the metadata problem: first find a few pages that you think are good starting points (i.e., they do a good job of describing what you want) and then see how things are ranked relative to those pages.
accident of berth
a couple points regarding my esteemed partner's post
I don't work for Google. I do know how PageRank (the algorithm described in the original paper) works, but even when Google first emerged that wasn't the only element in their overall ranking scheme--just the most well-known. For example, in order to decide what ranking to give a page, you have to consider not just how "popular" a page is (which is, roughly, what PageRank measures) but how well the page matches the query. (Some of the earliest search engine algorithms depended almost entirely on the latter datum.)
I can't say this for sure--I know a fair bit about network algorithms, not so much about how the software that the 'Net depends on works--but I'm reasonably certain that Google doesn't take traffic into account. Google, like other search engines, employs web crawlers that wander over the web and gather information about web pages. I don't know of any way that such a crawler could interrogate the server to ask it how many hits it's received recently. (If someone knows of such a way, by all means let me know.) So I believe that quinn's point (1) is correct.
Point (2): Yes, and anchor text which matches the search text, and so forth; I understand that Google generally tries to take the structure of the page into account. However, pure repetition of a particular term has a limited effect.
It's worth noting that reputation works, to a degree, in the off-line world in the same way that Robin bemoans it working courtesy of Google. The opinions of famous people--or the opinions of people that are associated, in the public eye, with those famous people--have an impact that's clearly all out of proportion with their level of experience or expertise in the opinion's domain. (Two words: celebrity endorsements.)
Academic paper Google rankings: speaking as a researcher in artificial intelligence and information retrieval (although, I admit, not one that specializes in computational linguistics) I don't think it likely that an efficient, effective automated approach to deriving a measure of the quality of a work--than seeing how often people talk about it/refer to it--is likely to be available soon.
(Side notes:
- It used to be easy to find the seminal PageRank paper which I referenced above, there are now a number of references that show up much sooner than the paper does when you do the obvious Google searches.
- I find it amusingly ironic that while I almost certainly have an electronic copy of that paper on one of my hard drives, having that paper doesn't give me a way to find a public link to it.)
So, yes, where you put that paper will affect how many people find it when they go looking for it; this is why it's important to researchers to get published in the good journals and conferences. (Where "good", of course, means that the influential researchers get published there. The related ranking algorithm known as HITS (or hubs-and-authorities) may be of interest here.)
The one refinement that I know of that might be available soon would be to consider how a paper is referenced, i.e., positively or negatively (as a start). But--and this is important--it's not at all clear how that evaluation should be used to influence PageRank. Some of the most important and influential papers, I suspect, are those that are referenced precisely because of the way in which they are wrong, because they offer a useful jumping-off point.
To get at the "well, Cory doesn't write about Jeremiah, so why does his opinion of my cup of wrath matter?" problem: some recent research has focused on relative ranking schemes. PageRank assumes that, initially, the influence of any web page is equal. Some kinds of relative rankers assume that some specified 'reference' pages are the appropriate starting points, and go from there. (I can get more specific about how this works if anyone cares.) This is not the same as personalized web search, which generally changes how the search is specified and/or the results are interpreted. This provides a way to finesse the metadata problem: first find a few pages that you think are good starting points (i.e., they do a good job of describing what you want) and then see how things are ranked relative to those pages.