As you know, I think that the world of blogging, news feeds, and similar dynamic web phenomena is very significant. The dynamic web is about who is talking about what now; it's about what the world is thinking about. I would like to see the following service provided.
Index the web of feeds but weight the words in the index by both the number of sites in which a word appears and how recently it has appeared. Thus a word that appears in a large number of sites on a particular day would rate high for that day with the rating dropping off on subsequent days if the word did not continue to appear on a large number of sites.
Presumably one would still get a Zipf's law distribution of word frequencies. But what would be more interesting than just the absolute distribution of word frequencies is the change from day to day. For example, I imagine that marriage has appeared much more frequently during the past couple of weeks than usual. I suspect that its appearance has dropped off since the Senate ended consideration of the FMA.
This could be the basis of a very popular web site: "The world's attention today" -- in some ways similar to Slate's "Today's papers." Given the right data and some intelligent analysis, this would be a useful service for anyone who wanted to know what people were paying attention to. I'll bet that such a web site would get lots of traffic and would become quite a valuable cyberspace destination. It could be the start of an important business.
For a bit of background of how I got here, see my blog entries: here and here.