So Who Are the Top 100 Blogs? Not Who I’d Have Thought

This Carnegie Mellon Computer Science study won the prize for “Best Student Paper.”

The title of the paper is “Cost-Effective Outbreak Detection in Networks.

Here is their question:

Blog rankings

Rankings are based on the following question: Which blogs should one read to be most up to date, i.e., to quickly know about important stories that propagate over the blogosphere? [emphasis added]

Budget=100 blogs: If we can read 100 blogs, which should I read to be most up to date? Unit cost (each blog costs 1 unit), optimizing the information captured (we want to be the first to know about something with many people blogging about the story after us)

Budget=5000 posts: If we can read the total of 5000 posts, which blogs should one read? Cost of reading a blog is the number of posts it has, we optimize the information captured

Multicriterion solution: We want to read both a small number of blogs and a small number of posts. These results are from the experiment on figure 4(a) from the paper. We find the right budget where value of objective function is 40%. Cost of a blog is a combination of a number of posts (NP) a blog has plus a constant (UC).

Here is their real-life comparison:

The spread of information in the blogosphere: First blog writes a post and then other blogs refer to it. The behavior (information) spreads (cascades) through the network of blogs.

Water distribution networks

[The] same techniques and algorithms as used for blogs also apply to detecting disease outbreaks in water distribution networks. Consider a city water distribution network, delivering water to households via pipes and junctions. Intrusions can cause contaminants to spread over the network, and we want to select a few locations (pipe junctions) to install sensors, in order to detect these contaminations as quickly as possible.

The sensor placements obtained by our algorithm are provably near optimal, providing a constant fraction of the optimal solution. Our approach scales, achieving speedups and savings in storage of several orders of magnitude.

This same link also provides their algorithm and some illustrations, plus links to more detailed information. Don’t know that I care for being compard to contaminated water, however. Couldn’t they have done something with, say, ice cream?

This is the .pdf of their paper, with illustrations of how the cascades work.

But what is surprising is the list they came up with. Of course, #1 is no surprise at all -Instapundit, of course. But after that, it’s up for grabs:

Here’s some data regarding the parameters of their table:

Top 100 blogs for unit cost case and PA objective function

  • PA score : score for the solution of length k
  • NP : number of posts of a blog in 2006
  • IL : number of inlinks that a blog got from other blogs inside the dataset in 2006
  • OLO : number of outlinks to other blogs in the dataset
  • OLA : number of all outlinks (also counting links other resources on the web)

The table is below the fold. You’re going to be surprised at some of the blogs that made the list, and some that are noticeably absent.
– – – – – – – –

k  PA score  Blog  NP  IL  OLO  OLA 
0.1283  instapundit.com  4593  4636  1890  5255 
0.1822  donsurber.blogspot.com  1534  1206  679  3495 
0.2224  sciencepolitics.blogspot.com  924  576  888  2701 
0.2592  watcherofweasels.com  261  941  1733  3630 
0.2923  michellemalkin.com  1839  12642  1179  6323 
0.3152  blogometer.nationaljournal.com  189  2313  3669  9272 
0.3353  themodulator.org  475  717  1844  4944 
0.3508  bloggersblog.com  895  247  1244  10201 
0.3654  boingboing.net  5776  6337  1024  6183 
10  0.3778  atrios.blogspot.com  4682  3205  795  3102 
11  0.3885  lawhawk.blogspot.com  1862  463  1668  6597 
12  0.3984  gothamist.com  6223  3324  1891  17172 
13  0.4078  mparent7777.livejournal.com  25925  199  4027  47933 
14  0.4163  wheelgun.blogspot.com  1174  128  262  939 
15  0.4245  gevkaffeegal.typepad.com/the_alliance  302  428  333  2481 
16  0.4318  anglican.tk  66  66  1377  3482 
17  0.4384  micropersuasion.com  1503  2880  506  5666 
18  0.4444  pajamasmedia.com  5007  141  2920  26881 
19  0.4500  blogher.org  3302  412  1587  14222 
20  0.4556  mypetjawa.mu.nu  1108  1733  757  3609 
21  0.4611  reddit.com  2618  1940  201  1117 
22  0.4661  soccerdad.baltiblogs.com  814  451  1137  4307 
23  0.4711  thenoseonyourface.com/the_nose_on_your_face  400  394  349  1645 
24  0.4759  ahistoricality.blogspot.com  441  87  293  805 
25  0.4803  theanchoressonline.com  989  430  1597  6358 
26  0.4848  americablog.blogspot.com  5786  3351  331  3950 
27  0.4890  sfist.com  3068  1461  1891  13203 
28  0.4931  tbogg.blogspot.com  1412  864  5567  19396 
29  0.4971  horsepigcow.com  516  498  203  1220 
30  0.5009  whyhomeschool.blogspot.com  513  211  205  1030 
31  0.5046  daoureport.salon.com  2012  5255  177  768 
32  0.5083  sisu.typepad.com/sisu  331  304  293  1968 
33  0.5119  metafilter.com  5866  1277  607  13374 
34  0.5151  megite.com  535  33  378  2422 
35  0.5183  laist.com  2651  1259  1389  7680 
36  0.5214  captainsquartersblog.com/mt  2623  6495  517  6187 
37  0.5243  shakespearessister.blogspot.com  4580  2116  1386  5839 
38  0.5271  blog.guykawasaki.com  218  1470  24  311 
39  0.5299  tryinotocomeundone.blogstream.com  76  183  343  973 
40  0.5326  bluestarchronicles.blogspot.com  180  144  283  1082 
41  0.5352  googleblog.blogspot.com  294  2815  84 
42  0.5377  theglitteringeye.com  924  377  1088  3927 
43  0.5402  asterisco.paradigma.pt  2419  145  521  14280 
44  0.5425  readwriteweb.com  543  1236  275  1937 
45  0.5448  digbysblog.blogspot.com  1784  3553  574  3153 
46  0.5470  conservativecat.com  682  284  916  3551 
47  0.5491  phillyist.com  1633  800  1797  6328 
48  0.5511  socialcustomer.com  279  119  122  889 
49  0.5530  business2.blogs.com/business2blog  635  343  132  1801 
50  0.5549  gatewaypundit.blogspot.com  2677  3172  1146  6829 
51  0.5567  crooksandliars.com  2426  2578  1275  6147 
52  0.5584  rightwingnews.com  1975  1700  891  8478 
53  0.5600  10000birds.com  160  72  46  217 
54  0.5617  radar.oreilly.com  647  1219  160  2699 
55  0.5632  cowboyblob.blogspot.com  1208  173  145  379 
56  0.5648  business-opportunities.biz  1419  450  224  4773 
57  0.5663  dcist.com  2873  1995  1346  8049 
58  0.5678  headrush.typepad.com/creating_passionate_users  159  1149  45  313 
59  0.5693  legitgov.org  2810  10835  473  562 
60  0.5707  whataboutclients.com  518  80  220  1252 
61  0.5722  roughtype.com  365  1074  101  455 
62  0.5736  tuaw.com  3656  368  34518 
63  0.5750  aude91.canalblog.com  375  81  67  208 
64  0.5764  thelondonfog.blogspot.com  953  117  192  861 
65  0.5777  bostonist.com  1080  944  1402  5001 
66  0.5791  seattlest.com  2562  1326  1367  8063 
67  0.5805  austinist.com  3113  1086  1199  7531 
68  0.5818  indianwriting.blogspot.com  419  49  48  451 
69  0.5831  powerlineblog.com  2081  2362  179  1487 
70  0.5844  firedoglake.blogspot.com  655  1163  232  1496 
71  0.5857  elisson1.blogspot.com  736  257  200  737 
72  0.5869  rhymeswithright.mu.nu  1325  329  1050  5583 
73  0.5882  ragnell.blogspot.com  403  170  121  689 
74  0.5894  pulverblog.pulver.com  934  445  313  5653 
75  0.5906  mry.blogs.com/les_instants_emery  558  49  91  1347 
76  0.5918  gapingvoid.com  1156  905  235  1752 
77  0.5929  catymology.blogspot.com  114  56  41  169 
78  0.5941  hughhewitt.com  1330  1234  500  2468 
79  0.5953  lifehacker.com  4436  2420  927  16658 
80  0.5964  jordoncooper.com  619  264  229  2189 
81  0.5976  econbrowser.com  263  349  210  1647 
82  0.5987  socialitelife.com  4455  1677  1400  10616 
83  0.5998  gatesofvienna.blogspot.com  894  1090  404  1892 
84  0.6009  nevillehobson.com  578  384  4142 
85  0.6019  waxy.org/links  836  2093  97  289 
86  0.6030  aliferestarted.blogspot.com  77  52  95  387 
87  0.6040  volokh.com  2400  1150  489  2047 
88  0.6051  library.coloradocollege.edu/steve  154  33  85  459 
89  0.6061  drsanity.blogspot.com  963  1419  807  2269 
90  0.6071  mudvillegazette.com  770  1351  579  2902 
91  0.6081  saysuncle.com  1992  552  4025 
92  0.6091  privacydigest.com  1819  683  543  14208 
93  0.6100  londonist.com  2624  844  868  6308 
94  0.6110  shanghaiist.com  1359  1656  1292  5442 
95  0.6120  markshea.blogspot.com  3109  551  413  1750 
96  0.6129  singleservecoffee.com  442  325  237  885 
97  0.6139  jeremy.zawodny.com/blog  279  617  84  550 
98  0.6148  scienceblogs.com  4261  1614  3168  15324 
99  0.6157  basicthinking.de/blog  2084  410  432  15046 
100  0.6166  scobleizer.wordpress.com  1144  757  406  2487 

A commenter, Zman Biur, at Soccer Dad said:

“if there’s a best day to read blogs to maximize the information your getting, it’s Friday.”

Who has time to read blogs on Friday? Must be an anti-Semitic algorithm!

“if you only have time to read 100 blogs”

Who on earth has time to read 100 blogs?

Why, bloggers have the time, Mr. Zman Biur.

And commenters also, who like to hang around and share their thoughts but don’t want to deal with the upkeep of a blog. It’s kind of like letting your neighbors kids in to play occasionally because they like your neat “stuff”, but you can send them home when you feel like it.

What I did notice however, was that study said the best time to read blogs is on Friday. It’s been my experience our traffic drops off then. First, lots of people skip work on Friday. Second, we must have more Jewish readers getting ready for Shabbat than I realized.

Cool!

NOTE: I recognize that being on this list does not mean we’re actually in the top 100 in virtual reality. What these students were establishing was the most efficient way to use your blog-reading time. That’s what this list signifies.

12 thoughts on “So Who Are the Top 100 Blogs? Not Who I’d Have Thought

  1. Gratified to see The Anchoress made it.

    The Daily Kos isn’t there, and neither is the Drudge Report…

    Blue Star Chronicles??? Wow! Way to go Beth!

  2. The smart students should have checked the addresses before publishing their report. #3 and #88 are not there anymore.

    However, I’m still at the same old address, writing the same old boring stuff.

    I come to the Gate because it’s good writing and you all have lots of stuff to disseminate from parts of the world most think don’t matter.

  3. For whatever the list is worth, I am pleased that LGF and HotAir didn’t make it. They are heading full tilt down the Danrather Holier-than-thou path.

    And commenters also, who like to hang around and share their thoughts but don’t want to deal with the upkeep of a blog. It’s kind of like letting your neighbors kids in to play occasionally because they like your neat “stuff”, but you can send them home when you feel like it.

    I no longer maintain my blog because the usefulness/danger ratio is too low in my profession, and because I’m so much less talented than so many others.

    That said, this will be the last of my infrequent comments. I wasn’t aware that non-bloggers’ comments were an annoyance.

  4. Mr. Pasha–

    Reading over what you excerpted–

    And commenters also, who like to hang around and share their thoughts but don’t want to deal with the upkeep of a blog. It’s kind of like letting your neighbors kids in to play occasionally because they like your neat “stuff”, but you can send them home when you feel like it.

    I can see how it sounds. Infelicitous to say the least. It was *supposed to be* funny. In no way did I mean it to be an exclusion of anyone.

    I guess a better way to put it would be that the neighbor’s children come in to play because they like your stuff and then they leave when they get bored.

    We only started blogging because our comments at Belmont Club were too long and I thought we were hogging the thread sometimes.

    I checked and your profile is no longer available so there is no way to apologize or explain. Since you won’t be back, there’s simply no way to let you know it came out wrong.

    However, if you took my comment that way, others obviously might do the same (as Peter Drucker said, “communication is the act of the recipient” so it doesn’t matter what I *meant*. What matters is what you took from my comment).

    I sure don’t want this faux pas to spread…

  5. Indigo red (interesting nic…I’m trying to imagine it)–

    Are you kidding? Go thru all one hundred links??? Heck, we’re way overdue to houseclean our own blogroll. Anyway, who knows how long it took them to put that table together…

    Right now, we do it on a catch-as-catch can basis — i.e., if I click on someone and the link brings me to an advertising page then that’s a clue the blog is no longer registered, so we delete it.

    Hey, now that I think of it…did *you* go through all one hundred clicks?

    Hmmm… would you like to go through our blogroll? I’ll be your friend…

    No, seriously, I could send you a gift certificate from Amazon. We have their credit card, which I use for all our expenses. When you get enough points, they send out gift certificates. Now that the future Baron is staying here until grad school and eating us out of house and home I get more certificates than I used to…

    (before any of y’all scold me about my irresponsible parenting…yes, he *does* pay room and board, in the form of one week’s paycheck a month).

    So anyway, are you game for this job?

  6. Conservative Swede and aethling2 —

    The fact is that both those blogs are way bigger than we are.

    The students weren’t looking at size, they were looking at nodes and examined the blogosphere as a cascade, which is a clever way of doing it. This method was premised on the “cost-efficiency” of having x time to read x blogs.

    IOW, more bang for your buck.

  7. The students didn’t make a mistake. These were rankings for 2006.

    My guess about LGF and OTB (another big one that didn’t make the cut) is that there was too much overlap between those an others mentioned. I don’t know why that would be as both seem to be agenda setters.

    Still I wonder, if you changed one or two blogs in the list how much would it affect the others? Would dropping two blogs from the list mean that you’d then have to drop, say, another 3 and then replace those with 5 different blogs? (I would assume that such a dynamic would occur, but can’t prove it.) Or would you be able just to remove two and replace those two without any further loss of efficiency?

  8. Soccer Dad–

    You were too modest in this comment: you failed to mention that *you* made the list twice.

    The first time in the Watcher of Weasels Council, and the second time for your own blog.

    That is pretty cool.

  9. This is because LGF and drudge are both accumulators, whereas other blogs on this list generate a lot of stuff.
    Not that LGF doesn’t generate stuff, just that they tend to be more accumulators.
    Almost everything I read here, I haven’t seen elsewhere or its in some obscure danish newspaper, etc.

  10. Well gosh, I’m not there! *gasp* oh well…
    LOL oh, it’s just for 2006 and I hadn’t started yet? yeah… that must be it…

    ok, I know hardly anyone reads my blog anyway… boohoo… Maybe if I get the job in Iraq and actually have enough time to write things down…

  11. Who cares about LGF? It is an echo chamber over there anyway. And not that good a blog. I like a reading room with, you know, real content. Not some place going, “nyah! Nyah! I’m smarter than all the rest of you plebes!’ like LGF seems to be stuck on doing.

Comments are closed.