27 July 2007

jrtom: (Default)
Pardon me while I geek out for a moment here.

Over the years, there's been a lot of sound and fury expended in certain quarters regarding the fact that the Web doesn't have a built-in way of identifying links _to_ a given page. Sure, if you have access to all the pages (by virtue of having crawled them) you can construct a graph data structure that will encode this information...but it's not built-in, and it requires a lot of time and effort by comparison with what you have to do to get all the links leading _out_ from a page.

This is significant to those who are interested in things like "the wisdom of crowds", i.e., gathering opinions on a page based on what people that link to it are saying about it.

Anyway, services like Technorati address this by, in essence, crawling blog pages, figuring out what links to what, and _adding "trackback" links to blogs that use their service_.
This lets the blog owner (and everyone else) see who's linking to them, and thus permits them to see what's being said.

For purpose of link analysis, though, this is problematic, because it means that your blog is now actually linking back to their comment on their page. Which, by the semantics that search engines use, means that you think there's something worth looking at on that page, which means that their ranking goes up marginally (depending on how highly ranked your own page is). This really shouldn't happen, because in fact you've made no such judgement. (There's also the closely related problem that an actual human being may be more likely to visit this other website as a result of this link, and to associate that website with your own. This isn't really susceptible to a technical fix, though)

I can't possibly be the first person to have noticed this problem, but I do wonder how search engines deal with it. I could imagine adding semantics to the link (embedding 'trackback' as a tag to the link) but that only works if everyone recognizes it, so that would be a matter for the W3C.

(Edit: The blog posting that prompted this is here. As of the time of its posting, one of the trackback links is a complete waste of space: http://centennial-man.blogspot.com/2007/07/duck-will-go-crazy-httpwww.html and may in fact simply be there so that the blog owner's site gets more hits. However, I note with interest that if you view the source of the Google blog page, the stuff in the "links to this post" section actually doesn't appear, i.e., what appears to be a single page is actually a combination of at least two pieces. That seems like a reasonable solution to the link analysis problem, but somewhat inelegant.)

*ponder*

Profile

jrtom: (Default)
jrtom

May 2011

S M T W T F S
1234567
891011121314
1516 1718192021
22232425262728
29 3031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated 1 July 2025 22:37
Powered by Dreamwidth Studios