Interdisciplinary chaos

April 18th, 2005

That load of skeptical, arrogant, jump-to-conclusions, narrow-minded bastards...

I'm mad now... News of a research project I've been involved in broke on Slashdot today. $3.3 million from the MacArthur foundation to research kids and technology. Everyone seems to misunderstand it:

Let's see: $3.3M / 3 professors / 3 year study = $366K/year/each.

I wish I could make that kind of money to "discover" that kids use cell phones and e-mail to find out when to meet their "crew" at the mall.

[snip]

My point is that we should probably take a look at fixing what's actually broken before trying to play around with gee-whiz high-tech crap that does little/nothing to improve the basics of education.

http://www.omninerd.com/news/news.php?nid=110#820

The following distills the perspectives of our primary investigators Peter Lyman, Mizuko Ito, and Michael Carter, as well as the slew of students, professors, and other advisors involved in this incredible project. Of course, this is entirely my opinion, so don't complain to them if you don't agree or if I get it wrong.

I'm not going to respond to the insinuation that these amazing, friendly, funny, selfless, incredibly intelligent professors, one of which is my Master's final project advisor, are going to do anything but the research they've been funded to do. You see, we ARE trying to find out what's broken. One short version of our research is that educators are severely disconnected from the people they're trying to educate -- making poor choices about technologies to use, not taking into account the technology habits of the very people they're trying to help among other gaps.

Saying they "use cell phones and e-mail" or that kids do childish things only shows that you're assuming you know what they're doing -- that you're holding them to your own ideal of childhood. They may think their activities are perfectly normal, so maybe we're the ones who should change our habits. The reality is that we can't be certain of anything because nobody has done the research.

What we aren't going to do: We aren't going to suggest they buy 10,000 AMD based, wireless networked PCs running Ubuntu Linux. We're not out to create things and throw them into the wild to see if kids will use it, wasting the money and momentum of this project. We're not here to pad our pockets with dead man foundation money, much like the uninformed soul above suggests.

The angry reaction to this story is probably due to the failure of the two other disciplines that focus on kids the most -- education and marketing. Those fields treat children as brainwash-able vessels of information absorption, reflecting adults' perspective about what the children should be and how they should attain their future status. For instance, this perspective has advocated bringing technology into classrooms and forcing kids to learn how to use them. This makes their lives better? Improves their education? This resembles what the kids do with that technology outside the classroom or in the future?

The failure is not understanding the problem from the perspective of the people it impacts the most -- the children. This problem is the very reason ethnographic and other qualitative research methods exist. Someone out there has to speak for these kids but not in some disconnected, adult voice. We have to speak for the children in their own voice, in their own words, reflecting their concerns, habits, and preferences as best we can. This is more than understanding that "kids use cell phones and e-mail." The real understanding will come when we know how and why kids have appropriated these new technologies for their own uses in and out of schools, what uses they've created, and using that knowledge to advance the use of technology in education.

Why is there such animosity when disciplines collide? Why do technologists hate social research? I think about my final project research -- Berkeley freshmen and communication technology use -- and how disconnected developers, educators, and administration are from the needs and habits of those students. I'm especially concerned because my ideas for future research are highly cross-disciplinary. Will my research be rejected outright simply because I'm trying to bring a new perspective into a field with different acceptable methods? That still won't answer the question which inspired this writing: Why is the Slashdot readership so uninformed when problems don't directly involve computers?

So for all you people who think this is money wasted, that you can think of better uses for it, that technology should be taken out of schools, or any other reflex reactions, you obviously don't know a thing about what the goals of this research are. We're incredibly excited about this research and hope it will solve your concerns as well as ours. At best, we'll transform education and our understanding about how kids appropriate technologies in new and unexpected ways. At the very least, it will keep a dozen or so graduate students happily employed for three years, doing research and furthering their own education and research goals which, as a graduate student, is a damn good goal too.

Even though I'm leaving this research after I graduate in May, I know it will continue in the good hands of some of the best and brightest people I've ever known. And if you think that our PI's are going to kick back, relax, and bask in their new found wealth, then I can understand exactly why they received the grant and you didn't.

To all would-be comment spammers

April 8th, 2005

Go somewhere else!

I got my first comment spam last week! I'm so excited! It means my web site has now passed from just some random pages to one that people actually care about enough to use to their own advantage. You call it annoying, but I call it progress.

So to all you comment spammers out there, I highly recommend you don't try it. I offer the following:

  • All links in comments use rel="nofollow" and all comment pages plus my comment RSS feed are in my robots.txt file to prevent search engines from counting the links in page ranking.
  • All HTML tags are removed from comments. Too bad.
  • I have filters and I'm not afraid to use them.
  • The only people you're pissing off are the scarce numbers of readers I have. In other words, you're not reaching a big audience.
  • Comments like "Your site colors suck!" don't mean shit to me, and the other readers of this site know that already.

Don't bother to spam here. Seriously. It's not worth the time -- yours or mine.

Also for the record, I've tried very hard to make this site more compliant with accepted practices for blind and low vision readers. Hopefully this hasn't created any readability or navigation issues as a result. I know the photo pages still need some work, but otherwise the site should be much better than it was before.

Shooting yourself in the foot

March 30th, 2005

Money can buy you new feet... or at least a bitchin' wheelchair.

As you regular readers can probably tell, I'm a little behind on my writing. In fact, I just counted and I have 71 articles and ideas in the backlog. I'm not very proud of this fact. Because oral arguments were heard today in the Grokster case, I felt finishing up this article would be very timely.

Using the courts to solve your problems is like playing with loaded guns -- someone is going to get a bullet by the time you're done. In the legal revolver dance, everyone could be riddled with bullets in the end.

Worst of all, you could shoot yourself in the foot. You go to court with a smoking gun -- an argument that you're certain is unstoppable -- and end up with a ruling against you that will inhibit your ability to fight that battle in the future. Just ask Eric Eldred, Lawrence Lessig, and the crew who litigated Eldred v Ashcroft. If you don't remember, this is the case where Eldred wanted to publish books that would have been out of copyright if not for the Sonny Bono Copyright Term Extension Act. (I can only imagine the mess in the pants of those media companies' execs when Sonny Bono got elected to Congress.) Lessig et. al. argued that extending the duration of copyrights was creating an effective monopoly, something that the Constitution doesn't allow, thereby impacting free speech and other stuff.

In a 7-2 smackdown, the Supreme Court ruled that copyright legislation is solely the domain of Congress as enumerated in their Constitutional powers -- Article 1, Section 8, Clause 8. The Courts must also defer to Congress since they're in a better position to judge what the proper duration of copyrights ought to be (because Congress has better judgement than the Supreme Court?). 7-2 is not good. That means at least three Justices have to die or step down or get a lobotomy to have even a tiny chance of the case being decided differently.

Brewster Kahle, not content to have his ideals shot down by a Supreme Court decision, decided to challenge the lengthened duration of copyrights again. This case, Kahle v Ashcroft, focused on the shift from conditional copyrights (registering works to receive copyright protection) to unconditional copyrights (where everything is protected by copyright upon creation). The result is many works are now orphaned -- still under copyright but without copyright holders available to ask for use permission.

Kahle and the Internet Archive got shot down. Citing the Eldred case as part of the reasoning, the case was dismissed outright from the district court. They are appealing the decision, but don't hold your breath for a different answer.

If there's any silver lining on this cloud, the U.S. Copyright Office had an epiphany about orphaned copyright works in the mean time. In Canada, a system is in place for individuals to petition for use of a copyrighted work when the copyright holder can't be found. Our U.S. Copyright Office has asked for comments about what they should do. This does not mean they will do anything. This does not mean they will do the "right" thing, whatever "right" means. This does mean they recognize the problem and are hoping someone has a solution that will be amenable to all parties. I think Brewster should be celebrating this move by our government and drop the lawsuit to focus all energy on the copyright office, but he probably won't.

As I've said before, going to court is a last resort. It's an expensive, lengthy process where you're gambling on a decision in your favor. The best advice is don't go to court in the first place. The next best advice is don't put yourself into a situation where going to court is even an option. Resolve your differences elsewhere -- like Congress. For the same amount of money it would take to go to court in the first place, you can hire a lobbyist to go to D.C. and get a law passed saying the same thing.

Both sides think they have a smoking gun, but they're blinded by their idealism. As the Eldred and Kahle cases show, the penalty for following this idealism blindly is a ruling against you. You don't see straight and you shoot yourself in the foot. The next time you go to the revolver dance, you're on crutches versus an opponent out for the kill.

But that's not exactly why I worry about the Grokster case. Grokster thinks their right because their peer-to-peer software just a technology -- not something forcing people to infringe copyright by sharing music and movies. The media industries see Grokster and other P2P software developers as thieves profiting off the copyrighted works they own. They're talking about two different worlds entirely, and it's up to the Supreme Court to figure out who's right.

The difference between Grokster and their opponents is that the RIAA and cohorts have already put the money and resources into Congress, they have no present intention of accepting or working with P2P technologies and companies, and the courts are simply their current venue of choice. Regardless of the decision in this case, you can expect to see a new DMCA-ish bill raised in Congress shortly after -- maybe the INDUCE Act, maybe something else. Those media companies don't innovate; they just attack anyone who beats them to the money -- player piano rolls, radio, cassette tapes, VCRs, seat belts, airbags... oops. Wrong industry.

I have a prediction for the Supreme Court ruling in this case which I won't share now. I will say that the outcome of this case will not make a big difference in the end. MGM has the resources to pursue other options -- legislation, new P2P system development, sicking zombie Sonny Bono on infringers -- and Grokster doesn't. I guess that means even if Grokster wins, they'll lose.

If you lose in court, you can hope that the Copyright Office will decide to hear your argument instead. Or if you lose, you can go to Congress and ask them to see the problem your way. So don't give up hope if you shoot yourself in the foot. The miracle of modern government will get you back up on fresh legs in no time -- well, the government and a whole lot of money. And an army of lawyers. And lobbyists too. Don't forget those lobbyists...

Apple feels the pressure

January 11th, 2005

iMovies anyone?

I recently learned about Podzilla -- a Linux distro for the iPod. iPod already has some recording support, but here's the kicker -- you can pay the $50 for the iPod recording hardware and get 16-bit recordings at 8kHz (telephone-ish quality) or you can install Podzilla for free and record whatever you want at up to 96kHz (DVD audio quality).

Why is recording crippled on the iPod? Well, part of the answer is that the market for voice recording hardware is pretty small. Dedicated voice recorders are incredibly expensive, and Apple is not at all in that market. They're out to sell their iPods and get people to use iTunes. Furthermore, voice recording doesn't need amazing quality, so the low audio rate is sufficient for most.

However, this is not a market issue. If Apple really wanted to make an iPod capable of high quality recording, they could have. Their competitors, such as Creative, Archos, and iRiver, have MP3 players full quality MP3 and WAV recording (44kHz, CD quality). My Creative Jukebox 3 even has digital recording inputs. In fact if Apple was concerned about being competitive, they would have included high quality recording in the iPods from the very beginning.

Which leads me to an interesting conclusion: Apple was pressured by the RIAA and big music distributors to disallow high quality recording on iPods. If you're the RIAA, you're worried that someone can take a CD, plug it's output right into an iPod, then play the CD and end up with MP3s on the iPod (or AACs or whatever their damned format is). Worse yet, someone can bring an iPod to a concert with some microphones and make great quality bootlegs (because bands haven't learned that if they could make live concert CDs on the spot, they could rake in the money).

I think Apple was either told by the RIAA no recording on the iPod or intentionally crippled it to avoid the wrath of the music industry in the first place. Think about the case if Apple had allowed high quality recording... "Sorry big music distributors, but we released our tiny music player with the ability to record the same music of yours that we sell online. We know people will record music themselves and avoid paying you the royalties you so desperately want. But we don't really need the iTunes store to sell iPods because we're Apple and people will buy our hardware since it looks cool, is vastly overpriced, and has less features than our competitors." Or something like that.

All of this makes me wonder about the iPod photo, or whatever that new device is. Already most of Apple's competition has moved on to portable movie players, some of which even record video, so why is Apple just moving to photos and not movies? My prediction is that Apple will have a movie playing iPod just as soon as they create iMovies (or iFilms or something like that, like iTunes but for movies... call it Quicktime?) and sort out the issues with the MPAA and big movie distributors.

To put this another way, if Apple doesn't get into movies, then it's because they couldn't line up the studios. I think most people are comfortable making MP3s from their CDs now, but Apple would have to overcome a huge hurdle to get DVD ripping software on everyone's computer. Why would you buy iTunes if you already own the CD? Likewise, why would you want to buy videos online when you already own the DVDs? The MPAA has harshly attacked anyone releasing DVD ripping software, so Apple would have to get their blessing before the iPod + video or iMovies comes out.

But that's just my opinion. Back in reality, I don't know of any other portable music player manufacturers that also have a vested interest in selling online music except for Sony, but then Sony has been very quiet about their efforts, and they own the music they're selling anyway. Since Steve Jobs is already in bed with the movie studios (Pixar) and since iTunes has been more successful than anyone could have predicted, I'm sure he'll be able to convince the other big movie studios to fall in line with the portable revolution. The iTunes store is as important to iPod's success as the iPod itself -- both for the RIAA's blessing and the iPod's overall success -- so a movie version of iTunes would be equally essential for Apple to break into the handheld movie player market.

I'm certain this isn't the only instance of the RIAA and others putting pressure on big software companies to bow to their whims. Microsoft wholeheartedly jumped onto the DRM bandwagon, much to the delight of the media industry, with recent versions of their Windows media formats. I just wonder if MS did that before or after their conversations with those companies. I wouldn't characterize these as "alliances" as much as "necessities for doing business." If MS hadn't bowed to the pressure of the RIAA and others, then someone else would have. Therefore, I say MS and Apple both made the right decision for their business, much to the detriment of all the people stuck with their hardware and sofware. Pressure like this goes beyond software and hardware, but I'll deal with those aspects some other time.

I'll give you one other similar example. There's a Palm device -- the Treo 650 -- that was made specifically for Sprint and is pretty much a combo cell phone and PDA. The 650 came crippled in that you couldn't use its Bluetooth to connect to the Internet. Why would Sprint let you do that anyway? Someone hacked it and enabled that feature; supposedly it was always there but just hidden. That same person also figured out that you can use an SD WiFi adapter with some driver hacks -- a wifi adapter that Palm wouldn't release drivers for. Sprint certainly pressured Palm just like the RIAA pressured Apple; both Sprint and Palm want to milk the cell phone market for everything they can.

Back to Apple... Based on what Apple did before, I bet their iPod photo already has video support. It's just crippled. And so if you're patient enough, wait for a version of Podzilla to uncripple the video playback support built into their new iPod. It's either that, or wait until a sanctioned, overpriced, under-featured iPod video finally comes out. Or just pick up one of the non-Apple portable media players that already support video playback and avoid the whole iPod thing in the first place.

Blogging by the numbers: The Big Number

January 3rd, 2005

One number to rule them all

How many blogs are there? This is one of those questions of incredible interest without any reliable answer. A few companies have been nice enough to provide information that give us some estimate of how large the blogging world is. Unfortunately, most people don't take the time to decide how reasonable (or not) those numbers are.

A side note... My policy with this website is not to include external links unless those sites are seriously something you should check out. In other words, all these other sites I refer to but don't link to do exist; I just hope you don't stumble upon them because they're useless crap and I don't want to increase their Google rank inadvertently. Also, for those links I do have, there are probably other places I should have citations for data but I don't include them because I don't add multiple links to the same page. Just look around and the data is linked to somewhere else.

Some great data comes from the Pew Internet and American Life Project. One poll that has great blogging results was taken between March to May 2003 with about 1,500 people, making for pretty accurate data. Quick statistics review... The poll had a 3% margin of error with a 95% confidence interval, meaning that there's a probability -- 95% in this case -- that their numbers are within 3% of the real numbers, not that any given result is +/-3% of its actual value. Furthermore, the poll was conducted by phone using random phone numbers, meaning the distribution of people who answered the question probably was close to that of the U.S. overall. (Don't worry about bias from people without phones; they probably don't have Internet access either.)

Given this fairly accurate poll, Pew reported that about 2% of American Internet users said they have a blog or web diary and about 11% of 'net users read blogs and web diaries (the data is summarized in their report Content Creation Online). Some people ran with these numbers and found amazing statistics of their own. For example, did you know that 110 million people worldwide read blogs? Well, I didn't and neither did those 110 million people. I'll explain...

Let's say 11% of American Internet users really do read web logs or diaries. And other polls suggest about 200 million Americans (about 2/3 of us) have Internet access from home, work, 'net cafes, whatever. Do the math and BAM -- 22 million Americans read blogs. But then some people take this way too far. Somewhere around 800 million to a billion people worldwide have Internet access, so you multiply 11% by 1 billion 'net users and 110 million people read blogs. You can do the same math with the number of bloggers and you get about 20 million of them.

110 million blog readers, 20 million blogs, right? Nope, nothing is that simple. Look at LiveJournal's statistics. By their numbers, more than 35% of LJ'ers are under 18 years old (graph below). Pew missed a significant part of the blogging population by polling only 18-and-ups, and everyone citing that Pew number as absolute truth missed what I think is the most important blogging group there is.

LiveJournal users shown by age. Taken from http://www.livejournal.com/stats/stats.txt. Nov. 12, 2004. The peak of this graph is at 18 years old, and about 36% of users are under 18.

Pew missed the mark by a bit. So how much did they miss it by? I'll make a few guesses. That 35% of under 18 LJ'ers is probably an underestimate. Half of the people using LJ didn't bother answering the age part of their profile, and I believe most of those non-responses are under 18. Furthermore, there are liars. You can see a little peak around 104 years old; those people put down their age as being born in 1900. And of course, if you're going to have a publicly articulated persona, you want to make yourself look as cool as possible, and older is cooler. I'm surprised there aren't more kids who are 69 years old, but then again how many of them can do the math and figure out they should be born in 1935 for that.

Remember that the Pew study was only about Americans and their Internet use, not the world. If 11% of Americans are doing it, you can be certain that number will not hold for the entire world. While Americans make up about 1/4 of the Internet population, they're probably a majority of the blogging population. LiveJournal records use by country, and a little under 80% of blog writers are from the U.S. So even if 25% of 'net users are American, we dominate the blog world. According to the NITLE Blog Census (which I use with caution as I'll explain later), about 80% of their crawled are written in English, but about 1/3 of all web sites are English and 2/3 of English web sites are in the U.S. Anyone who simply extrapolates global 'net use to blogging behavior will be off big time.

Graphs of each are below, and the key to the rankings are at the end of this page. Though it would be comparing apples and oranges, the NITLE distribution is almost exactly the same as the LiveJournal data (above). Someday it might be fun to delve deeper into the language/country blog differences, but this is as far as I'll take it for now.

The top ten countries of total Internet users worldwide (Global Reach) versus LiveJournal users as a percent of the total. (rankings are at the end)

The top ten ranked languages from Internet language data versus NITLE BlogCensus language data as a percent of the total. (rankings are at the end)

One other significant issue is that of definitions. The Pew poll asked if people had read, written, or contributed to a web log, blog, or web diary. I don't want to digress on such a philosophical problem, but people have different understandings of what a blog or web diary is. Did people say yes about writing a web diary when it was just their family's web page? Does Slashdot count or not? Who knows. Problems like this always happen, so we'll have to trust that people know a blog when they see one.

I had a chat with Mary Hodder with Technorati back in October about the big number. (FYI, Technorati deals with all information blog and that's about it.) She said Technorati estimated the number of blogs to be about 12 million, and that they have over 4 million blogs indexed. BBC News recently had an article where they cited Technorati as saying there are 4.5 million or so blogs in existence. Funny, the BBC number is a lot like the number Mary cited for Technorati's crawled blogs. I guess you can't even trust reputable news sources for accurate blogging information.

It gets worse. Other "authorities" for blogging size are cited too often without reflecting on how they got those numbers. NITLE's Blog Census currently has around 2 million indexed pages. I hope nobody is using their numbers yet. They've only gone though less than 5% of the over 5 million LiveJournal blogs (their <5% is less than the number of journals active in the last week). And like I said before, some parts of the sample, such as under 18 bloggers, are only evident in certain domains. Even if NITLE is using a 95% confidence interval, it's meaningless if they aren't sampling from the entire population.

Then there are the "other" polls... Most of these I question their methodologies. A few had open online polls, so you have no idea how representative the poll results are of the entire population (like the poll I cited in my last post. Others look only at LiveJournal or similar blog hosting sites without taking into account the non-blog-service-using people. Even with these complaints, at least those polls had enough intelligence to mention these facts along with their poll results.

What does Technorati do that the rest don't? Their numbers are based on a few things. First, they have web crawlers made specifically for blogs, that use the links in those blogs to find other blogs and add the new ones to their search. They let people submit their blogs to the engine if it's not already there. They also get "pings" whenever a new blog is created on certain blog hosting sites or when new blog software is installed on an individual's site. With all this information, the data and estimates they give are probably the most accurate if any are to be trusted.

So how many blogs are there? I have no idea. If I had to wager, I would put my faith in Technorati's numbers since I trust their methodology the most and since they have much to lose if they've got it wrong. Technorati also said (in that same BBC article) that the number of blogs is doubling every 5 1/2 months -- 10,000 or so a day, a trend maintained for the last 18 months. That 12 million I cited earlier is probably closer to 15 million blogs now.

I'll ignore other counting issues like private blogs (as in not publicly accessible), abandoned blogs, and fringe blogs (without incoming links so they can't be crawled) except to say these make defining what is and is not a blog even more difficult. This stuff is pretty hard to do even with all the data that's already out there.

Finally, Pew just today came out with some new poll results. 27% of American net users now read blogs -- an increase of 58% from February (from political blogs?). Also, Pew says 7% of people now have blogs which is in line with Technorati's prediction of blogs doubling every 5 1/2 months (comparing Pew's Mar-May 2003 poll to the new data). That memo is pretty brief though; I'll wait until there are some methodology details or a full report of their survey data before believing them. Unfortunately, this will not stop people from committing all the same errors I described above and reporting that now 70 million blogs exist and 270 million people read them worldwide. Just wait for it...

That's enough for now. From here, I'll get into more detail about blog readers and writers.


Top 10 rankings for the Global Reach country data.

  1. United States
  2. China
  3. Japan
  4. Germany
  5. United Kingdom
  6. Korea
  7. Italy
  8. France
  9. Canada
  10. Brazil

Top 10 rankings for the LiveJournal country data.

  1. United States
  2. Canada
  3. United Kingdom
  4. Russian Federation
  5. Australia
  6. Germany
  7. Philippines
  8. Singapore
  9. Netherlands
  10. Japan

Top 10 rankings for the Global Reach language data.

  1. English
  2. Chinese
  3. Spanish
  4. Japanese
  5. German
  6. French
  7. Korean
  8. Italian
  9. Portuguese
  10. Malay

Top 10 rankings for the NITLE language data.

  1. English
  2. French
  3. Portuguese
  4. Farsi
  5. Polish
  6. German
  7. Spanish
  8. Italian
  9. Dutch
  10. Chinese (big5)

Blogging by the numbers: An introduction

December 22nd, 2004

Damn you statistics!!!

I did some research this semester about blogging. The inspiration for this was my economics of information class. As a requirement for that class, we had a final paper and presentation about any economic/information topic of our choosing, so I decided to study the economics of blogging.

At this point, I want to make sure we all understand what economics is. The best definition I heard for economics is that it's the study of the distribution of scarce goods. Most people confuse economics with money. Certainly money has a lot to do with economics, but economics is much more than that. Economics has a lot to say about utility and usefulness, distribution, production, and more.

In other words, I sought to find information that wasn't simply about money and the business of blogging. Someone out there coined the word "blogonomics" as a bastardization of both words "blogging" and "economics." Thankfully that word hasn't been adopted; the person who came up with it (or at least was credited with it) was pandering for donations. This just shows you how little most people really understand about economics.

So I divided my efforts into a few parts. First, I needed to learn more about bloggers. Who is blogging? How many blogs are there? How fast is this growing? This alone is worthy of massive amounts of research to say the least of the two months I had to produce my final paper. I'll start discussing that in a bit, but suffice to say the quality of blogging statistics is miserable at best.

Next, I wanted to get a little deeper into the nature of the blogging realm. Why are people blogging? How are companies using blogging as part of their strategies? Who reads which blogs, how is traffic distributed, and why? Rather than just plea to the Zipf curve, most people avoid the deeper implications of why these traffic patterns emerge and what it means for the blogging world.

I also did some of my own research into the function of the blogging realm. I wanted to look at linking patterns between blogs and the rest of the Internet. Do blogs exist in their own little realm or do they anchor themselves among the rest of the Internet or what? By looking at link structures, maybe I could get some sense about what blogs really do.

At the end, I was left with more questions than I started with. This was the most disappointing aspect of my work; you would like to hope that weeks of work would turn out some revelation but instead I was wondering where I could find other people to help me out. No matter though, I'll offer you the same questions I asked myself hoping that maybe someone out there will get a clue and do this much needed work.

Somewhat related to the research I did for my economics of information class, I worked with a friend on a project for another class using this blog research as some of the basis. Before we began our research, I was showing him some polls I found on the Internet with details about bloggers. In particular, there was one poll that had... um... interesting results.

This poll used a methodology that made it completely useless. It was done by a search engine site, and most of the less than 1000 responses came from people who had registered with that site. In other words, this was a self selected population. When you deal with polling and statistics, the most important aspect is to ensure that your sample accurately reflects whatever it is you're studying. When your poll takers decide for themselves to do it or not, you will never know how or if it's biased. My guess is that only the most interested bloggers will take the poll, biasing it towards more participation and more frequent posting when the opposite is more likely.

This was lost on most readers however. Commenters loved it and said it was great. I can only feel bad for the people who use it to prove anything about the blogging world. There is NO WAY that 95% of bloggers post at least once a week. That's why I won't offer you a link to the poll. It's crap. The only use it might have is for the site that did the poll, to get insight into the type of people who use the site. Given that number above, you should have no trouble identifying it, then closing your browser as soon as you encounter it.

Unfortunately, this poll is typical of blogging statistics. They're loaded with hidden biases, skewed samples, and gaping holes that most people don't care to look for before reporting them to the masses. So let me get this caveat out of the way. Quite possibly the numbers and information I'm going to provide in the coming weeks will be slightly off or outright wrong. This should not detract from the points I will make along the way. If there's anything you should take away from these writings, you should think deeper about the numbers before accepting them as truth.

There are lots of great quotes about statistics I could use as a conclusion here, but I won't. In fact, it's quite possible that I'm here to mislead you with numbers and prove to you that I'm right and everyone else is wrong. But it's also quite possible that I'm onto something, and if so, then I promise you I'll be the most surprised one in the end. Next time, we'll get into the numbers.

Targeting Toolmakers

November 12th, 2004

Marvel Comics jumps the shark.

Marvel Comics just announced they're suing Cryptic Studios, Inc. and NCsoft Corporation, the makers of the game City of Heroes. For those of you out of the loop, City of Heroes is an online role playing game that allows you to play as superheroes complete with superpowers and costumes befitting their super-ness. You can then take your newly created super hero along with thousands of other players and kick thug and villain butt in Paragon City for about $15 a month.

So why is Marvel suing them? City of Heroes comes with a well designed character creation system that lets you tweak nearly every aspect of your superhero -- size, colors, uniforms, you name it and you can change it. Of course, this means that you can conceivably make a character that looked like an existing comic book hero, say The Hulk, give him powers just like The Hulk, and call him The Hulk, then let him loose on the virtual streets of Paragon City.

City of Heroes is a pastiche of every superhero thing the makers thought they could put into a video game. If there's anything Marvel can rightfully be unhappy about, it would be people using their character names (for which they do have valid trademarks) in game. They've invested lots of time and money in Wolverine, and if a CoH character tried to play Wolverine like the comic book Wolverine, I would hope other players kick his butt for not being original using the superhero creation kit and absolutely give him the smackdown if he deviates even the slighest from character. However, I wouldn't have an issue with a CoH character Wolverfellow that was a lot like Wolverine, that everyone knew was based on Wolverine, but that everyone knew wasn't Wolverine but merely an homage to the character. For you CoH players, how many other players have you seen that look a lot like existing Marvel or other comic book characters? In general, how many comic book super heroes have you seen that have similar powers or seem almost exactly alike?

With respect to designing characters like ones that already exist, I say there's a limited set of superpowers (ice, fire, energy, telekinesis, etc.) and so characters will repeat or at least seem a lot alike after the 10,001st one is created. Super strength and flying are common among superpowers, but that doesn't mean that every superstrong flying character is modeled after Superman. And yes, DC Comics did sue Marvel in the 1940's over the character Captain Marvel because his powers were too much like Superman's. I'm certain that since then Marvel and DC have made other characters that were very similar but decided a lawsuit wasn't worth the effort.

(An aside: Marvel and DC already claim they own the trademark "super heroes," but I don't know if that will hold up much longer. A quick search of the City of Heroes website turned up many references to the term "super heroes" but only on the user forums and in review quotes -- none made by the game producers. You gotta wonder if they're trying to avoid the term altogether so they won't get sued.)

I guess this means that Pixar is Marvel's next target. The powers of the characters in that movie resemble those of the Fantastic Four (no spoilers -- Elastic Girl = Mr. Fantastic, their daughter Violet = Invisible Woman, and Mr. Incredible = The Thing) so Marvel should sue the shit out them, right? Those characters and the other references like the X-Men movie and characters, and, well, I don't want to ruin the references, but there are many they're all homages, not bait for infringement lawsuits.

But my greatest worry is that this is the tip of an iceberg of intellectual property lawsuits. Should Izzy Stradlin be able to sue Fender Guitars for making instruments that other people can use to learn and imitate his riffs? Maybe the makers of City of Heroes can sue Microsoft for making tools that let other people make games like City of Heroes. Or anyone can sue makers of CAD software because you can design nearly anything with those tools.

They're blaming the toolmakers, not themselves. We can't have a system where we always point the finger at the toolmakers when the blame lies with the tool users. This doesn't absolve the toolmakers entirely -- they still have to act responsibly and reasonably when issues like this come up. So for everyone from Google to gun manufacturers, peer-to-peer application and video game makers, we need to distinguish between people who make tools, people who use tools, and the tools themselves. Analyze the situation and blame the dumbasses appropriately.

Marvel is the dumbass in this affair, not the toolmaker or tool users. Marvel was dying until their movies resurrected them (check how their stock rebounded after the X-Men movie was released in 2000). Get a clue Marvel. Be happy with the royalties you're getting from the movies and leave it at that. If anything, you can get some accounts in City of Heroes to steal new character ideas from existing characters in the game. And then the game users can sue you for improperly appropriating their creations. We all know you haven't created any good characters since the new X-Men in 1975 and you ostracized all your good artists in the 90's who reacted by ditching your lame ass company. Stupid, stupid, stupid.

Just for that, I'm not going to pay to see the next Marvel character based movie. Then again, I didn't pay to see Daredevil, Punisher, and Hulk either, but they all sucked ass, so maybe Marvel's problems run deeper than just a video game.

Distributed Desktop Searching

November 10th, 2004

Google's mistake means everyone wins! And a few more people lose.

One of the biggest problems plaguing the business world is the unwillingness of people to contribute to group knowledge systems. Plenty of products are built to share information between coworkers, like Lotus Notes, where users must add their documents to the system or nobody will never know about them. But users are lazy, so they either never add those documents, or put them in the wrong place, or don't follow all the processes making the document inaccessible or unusable. These systems often cost lots of money, don't work as well as they claim, and require non-intuitive methods for getting their so-called benefits

And then Google came out with their desktop search tool. Before you could say "hack this," someone found a way to make those searches remotely. Now as irresponsible as I think Google is for throwing technology like that to the world, there could be an upside.

Most interesting documents never leave people's PCs, and most groupware solutions have crappy search interfaces. So here's my idea -- distributed desktop searching. Install Google's search tool on everyone's PC, then install the remote search tool. Make a application that will take a search string, send the query to the PCs with the remote Google search tool, then assemble the results on a nice page. With the results, you can go to that person for the document or, even easier, click a resulting link and get the document yourself. This could put all those crappy groupware companies out of business and actually get you the files you need from your coworker's PCs.

Since Google's and presumably Yahoo's and Microsoft's will all be free, what will this mean for companies who depend on disorganization and the inability to find information for their business? I'm talking about groupware companies, consultants, IT managers, SIMS Masters graduates...

It's just an idea... the first of many I'll put here. Feel free to ignore it.

Google Bashing

October 28th, 2004

"Google hacking" gets a new connotation

I tried to install Google's new desktop search tool, but the installer didn't work. It said I didn't have enough hard drive space to install it, despite the fact that I had more than enough hard drive space to install it. After submitting the bug report, I got a response a week or so later that essentially said too bad.

Their tool captures everything you do on your computer, including emails sent and received, browser history, and all textual information on your hard drive (except WordPerfect documents apparently). It can take that information and let you run searches on your PC just like searching on Google's web site, then combine those results with a search of the Internet, reporting your query to Google as well.

Google is going to be deluged with individual search habit information to a degree that they've never seen before. They (probably) know how people search the Internet, but now they know how people search their own computers. And the resulting information and popularity of the tool will put Google years ahead of any of their closest search engine competitors.

I don't want a search engine on my computer, regardless if Google gets my search information or not, so I guess I'm happy that the tool didn't work. But faster than you can say "script kiddie," there are hacks for providing remote access to your computer's Google desktop searches. One of these sites that described the trick warned that you shouldn't use it for malicious purposes. Like that's going to keep the hackers from using this.

Let me explain my fear. Google releases a tool that lets you search (almost) every document on your computer including, say, your Excel spreadsheet that contain password lists, your cached browser page that has your social security number on it, or the email that you got with your username and password for a shopping website. Just Google your machine for "password" or "username" or "SSN" or "credit card number" or "billing address" and see what comes up.

And now there's an exploit that lets other people remotely query your machine using Google's tool. People worry about what if they get a virus that turns their computer into a spam spewing zombie. Now you can worry that you'll get a virus which will allow someone to search away on your PC for any information about you. I can't wait until the first viruses that install Google's new tool after infecting your machine. Just think of the rise in identity theft, stolen credit card numbers, cases of blackmail, and so on scaling in proportion to the rise of desktop search tools.

(Note: I'm calling this an exploit even if Google doesn't (actually, I don't know what they call it). If this was Microsoft, that's what it would be called. As I see it, Google's good name is the only thing keeping this off the radar.)

I think this could be the first of a series of similar tools that threatens privacy, security, and more. Well, maybe it's not the first either. Gmail and other web-based email tools have a great exploit too -- using search engines to answer the "security question" like "What's your mother's maiden name?" or "What's your dog's name?" when some of that information is easily searchable on the net. I know I'm not the first person to suggest that exploit, but what you should realize is that while the migration of search to the desktop gives you better access to your information, it also gives others better access to your information, your search habits, and, if used for bad purposes, your private information. Compared to Google's desktop tool, RFID is just a UPC code.

Google scares me. Not because they're evil, but because they're throwing tools onto the 'net without any regard for, well, without regard for anything as far as I can tell. The word "irresponsible" comes to mind. They're like kids playing chemistry with the chemicals under the kitchen sink. Maybe there's value in using the Internet as a research or marketing setting on a mass scale. But "beta testing" with anybody who wants to play with their tools means we can find the bad parts of their technology before they can fix them.

Now everyone is speculating on where Google is going next. Rumors include a Google branded browser or instant messenger. Google doesn't want a browser. There's enough competition in that market without Google; their toolbar is as involved as they want to get with the browsers. What Google does want is to be your portal to all the information on the Internet, your computer, and everything. They have two extraordinarily valuable assets besides their name - search technology and storage capacity. These assets stick out in all of their tools -- the search engine, Orkut, Gmail, Froogle, image searching, etc.

If they are creating a "browser," it's not in any traditional sense of the word. I hate fortune telling, but I have a vision of something with IM and chat (based on Jabber that remembers and makes searchable all your conversations), community and social networking services (Orkut but using community information tied to their search engine info), email (Gmail), location based services (my sleeper prediction for their next avenue, eventually tied into community and general searches), and brute force searching power (including the not mentioned yet desktop and Internet searches) all built into a single (web?) application like Gmail. IBM had a prototype of parts of this in their Remail tool. Unlike IBM, if there's anyone who can pull this off, it's Google. And if Google can't pull that off all at once, just watch the next few applications they release and you'll see where they're headed. Yahoo will be kicking themselves in the pants if (more accurately, when) Google gets to it first.

But if Google seems intent on throwing a new application to the world without some due diligence on their part, they're only deluding themselves. And so I want to repeat my earlier comment. Google, the Internet is not your beta testing environment. You deserved more flak than you got after you released Gmail for the privacy concerns in that software, and I can only hope that your future technologies are put under even more scrutiny. Your glory days will not last forever, so you had better start thinking of new markets to wind your way into not based on your search or storage technologies. And Google, start thinking about social responsibility before you unleash these beasts into the wild.

Finally, when you get around to it, could you please fix that bug in the desktop tool installer? I've got some friends that I want to send it to so I can keep an eye on them...

The Law of Diminishing Interest

October 14th, 2004

Sometimes, one opinion is enough for everybody.

Time for the much anticipated corollary of the Law of Diminishing Opinions.

The Law of Diminishing Interest

After opinions/statements/stories have proliferated about a topic, it will eventually be beaten to death such that no one will care about it any more.

Obviously this doesn't apply to everything. Specifically, a few issues are so polarized, so passion invoking, that no amount of time will let it slide: abortion, hard-core Republicans vs. hard-core Democrats, are Bert and Ernie gay, stem cell research, is "Shiny Happy People" the worst song ever written. And personal blogs are an exception I'll get to later.

So lets take John Kerry and the swift boats mess. It was hardly a month ago that every news program dedicated at least one story to the newest revelation of this story. Everyone had an opinion on it, the opinions proliferated, and the story broke under its own weight. We all got so sick of hearing "turned chicken and ran from the fight this" and "shot some gook in the back that" and now we would rather live the rest of our lives without knowing this ever happened.

Music provides a better example. Why does it seem like there are only ten bands that play on MTV or pop radio stations all the time? Probably because there are only ten bands playing all the time. Popularity in music has limits; only so many bands can be popular at once, so once a new one hit wonder comes along, an old one has to go away. We're no longer interested in that old song. Our interests have moved on to the next big thing.

Let's move to blogging. First, you already only can keep up with a few blogs at once. Your reading 30 or 50 RSS feeds and that already takes up hours a day, but at least it's more manageable than checking 30 to 50 web sites a day. You can hardly keep up with those posts, so you skim most and only read a couple that matter most. Adding another blog is right out unless there's an old one you can remove to make way for the new one.

Furthermore, you don't want to read two blogs that cover the same thing, offering the same opinion. Just like the news, you can get all your news information needs sated from one, maybe two sources that you trust. Any further sources just rehash what you already knew. That's why blogs have to differentiate themselves with witty opinions or pointless pining or random digressions. If you're the same as everyone else, why should other people read what you write? Personal blogs are obviously an exception to this since they are unique as are their authors, but people with a panache for daily posting can quickly become overwhelming...

And even within blog posts, bloggers' laziness is evident in their behavior. The ultimate props you can give is a link and maybe quote to someone else's blog. This results in a single story propagating throughout the blogging realm with lightning speed and with dulling repetitiveness. I can't count how many times I've now read the story about Bush's so called wireless earpiece strapped to his back during the first debate. Actually, it wasn't an earpiece. The battery pack that keeps Robot Bush animated slipped out a bit, and the operators couldn't come on stage during the debate to slip it back in place.

The blogging echo chamber is the worst manifestation of the Law of Diminishing Interest. Here we have a medium that prides itself on interconnectivity and information proliferation. The result of this is repetition, unoriginal commentaries, and shameless self promotion. While the Law of Diminishing Opinion tells us that fewer and fewer new thoughts will proliferate the longer a topic languishes, the Law of Diminishing Interest tells us that we will care less and less about those opinions as time goes by. That's not to say one of those tail opinions might be interesting or not, just that they're lost in the noise as a result of bad timing. This reinforces the value of breaking news stories and being quick with responses to current events, both of which the blogging world are very good at. My point is that as opinions start flowing out about an event, we dilute the value of any one of them because there are so many opinions written and we can't spread our attention that thin.

Surprisingly, these problems are partly solved because of our limited attention span -- the fact that we can only keep track of so many (or few) sources of information at once. With so many information sources available, we ratchet our own information filters very high so that we don't become overwhelmed by keeping up with two hundred web sites a day. The end result of this are Zipf distributions (power law, on a log-log plot it's a line with a slope of -1) of traffic for the Internet as a whole and (as I hope to experiment with soon) blogs. Just like Bush's tax cuts, the top 0.01% of sites get nearly one-third of all traffic. (Porn is different. A small number of sites (say, 1, 3, 5) is enough for most Internet search needs, but we need many more pages of porn to fill our, um, needs. Geoff Nunberg made this observation in one of my classes.)

But this distribution of blogging traffic means you put your blinders on. We like news sources that reflect our own ethics and political views so we congregate to those sites. So those few sites you do read are ones that reinforce your world view rather than expand it as the Internet idealists would prefer.

And thus I offer you a challenge. Start reading something that completely appals you. Democrats -- try the National Review. Republicans -- how about The Nation? Undecided or Independents, read The Week or The Economist depending on how short or long your attention span is, respectively. LaRouche or Nader people, I assign you to read The Constitution. I can happily admit that there is no greater educational experience than understanding your enemies. Not only will it reinforce your beliefs but hopefully it will also make you realize why you believe the things you do.

Simply put, the Internet has too much information to be useful unless you narrow your eyes a bit. This could be the ten pages you read most, the five search results you check after you've entered your query, or the several dozen pages you go to for your porn needs. Blogs are even more guilty of this than most, primarily due to their explosive growth (more on blog growth in future posts). Certainly we need better tools and methods for filtering what's out there to a usable level. But for now, I think we would all be best served by having better porn search tools. Regardless of how interests diminish for most of the web, I think I can safely say that interest in pornography is something you can count on far into the future.