Words, Links, Likes and Shares: The Evolution of Relevance

While the dustup over Bing’s possible appropriation of Google’s long-tail search results is presently occupying the attention of the world of search, I thought I’d take a step back and offer a longer-term historical perspective about an aspect of search that fascinates me: namely the evolution of search algorithms to adopt ever greater amounts of human-generated input into their calculation of relevancy.

Last September, Facebook began including heavily “Liked” items in their search results, and Bing followed suit in December. While this news itself is now a few months old, it got me thinking about how the methods used to determine relevance have changed since the era of web search began. The inclusion of “likes” as a measure of relevancy represents another chapter in the evolution of the various techniques that have been employed to determine relevance ranking in search results.

The arc of relevancy’s story can be traced along one dimension by observing the amount of human input that is incorporated into the algorithm that determines the relevance ranking of search results.

Early search engines relied primarily on the words in each page (and some fancy math) to determine a page’s relevance to a query. In this case, there is one human (the author of that particular web page) “involved” in determining the page’s relevance to a search.

When we launched the Excite.com web search engine in October 1995, we had an index that contained a whopping 1.5 million web pages, a number that seemed staggering at the time, though the number of pages Google now indexes is at least five orders of magnitude larger.

Excite’s method for determining which search results were relevant was based entirely upon the words in each web page. We used some fairly sophisticated mathematics to determine how to stack rank each document’s relevancy to a particular search. This method worked fairly well for a time, when searching just a few tens of millions of pages, but as the size of our index grew, the quality of our search results began to suffer. The other first-gen search engines like Lycos, Infoseek and AltaVista suffered similar problems. Too much chaff, not enough wheat.

Enter Google. Google’s key insight was that the words in a web page weren’t sufficient for determining the relevance of search results. Google’s PageRank algorithm tracked the links in each web page and recognized that each of those links in a web page represented votes for other web pages, and that measuring these votes could help determine the relevance of search results, and do so dramatically better than cranking out complex math calculations based only on the words in the document alone.

Simply put, Google allowed the author of any web page to “like” any other web page simply by linking to it. So instead of a single page’s author being the sole human involved in determining relevancy, all of a sudden everyone authoring web pages got to vote. Overlaying a human filter on top of the basic inverted-index search algorithm created a sea-change in delivering relevant information to the users seeking it. And this insight (coupled with the adoption of pay-per-click advertising) turned Google into the juggernaut it became.

While Google’s algorithm expanded the universe of humans contributing to the relevancy calculation dramatically from a single author of a single web page to all the authors of web pages, it hadn’t fully democratized the web. Only content publishers (who had the technical resources and know-how) had the means to vote. The 90%+ of users online who were not creating content still had no say in relevancy.

Fast forward several years to the meteoric rise of Facebook. Arguably, Facebook’s rise is largely attributable to the launch of the newsfeed feature as well as the Facebook API, which opened the floodgates for third-party developers and brought a rich ecosystem of applications and new functionality to Facebook. After reaching well over half a billion users, Facebook unleashed a new powerful feature that may ultimately challenge Google in its ability to deliver relevant data to users: the “Like” button.

With over two million sites having installed the Like button as of September 2010, billions of items on and off Facebook have been Liked. In the early Google era, only those people with the ability to author a web page (a relatively small club in the late ‘90s) had the ability to “like” other pages (by linking to them).

Facebook’s Like button today enfranchises over half a billion people to vote for pages simply by clicking. This reduces the voting/liking barrier rather dramatically and brings the wisdom of the crowd to bear on an unprecedented scale. And beyond simple volume, it enables the “right” people to vote. Having your friends’ votes count juices relevancy to a whole new level.

A related behavior to clicking the Like button is content sharing, which is prevalent on both Facebook and Twitter. Social network “content referral” traffic in the form of URLS in shares and tweets in users’ newsfeeds is now exceeding Google search as a traffic source for many major sites. Newsfeeds are now on equal footing with SERPs in terms of their importance as a traffic source.

Not only are destination sites seeing link shares become a first-class source of traffic, but clearly users themselves are spending much more time in their newsfeeds on Facebook and Twitter than they do in the search box and search-results pages. Social networks’ sharing and liking gestures have resulted in an unexpected emergent property — users’ newsfeeds have become highly personalized content filters that are in some sense crowdsourced, but are perhaps more accurately described “clansourced” or “cohortsourced” since the crowd doing the sourcing for each user is hand-picked.

Beyond liking and sharing in the spectrum of human involvement is perhaps a move to a more labor intensive gesture: curation. Human-curated search results (aided, of course, by algorithms) are the premise behind Blekko, a new search engine focused on enhancing search results through curation. Making a dent in Google’s search hegemony is a tall order indeed, but my guess is that if anyone succeeds, it will be through a fundamentally new approach to search, and likely one that involves a more people-centric approach. And Google certainly faces a challenge as content farms and the like fill up the index with spam that is hard to root out algorithmically. For a cogent description of this problem, just ask Paul Kedrosky about dishwashers and the ouroboros.

One thing seems clear: the web’s ability to deliver relevant content to users relies on ever-sophisticated algorithms that not only leverage raw computational power but also increasingly weave sophisticated forms of feedback from a growing sample-size of the humans participating in the creation and consumption of digital media online.

Do More Faster

My partner Brad Feld and TechStars CEO David Cohen just wrote a book called Do More Faster, which will be released in a week or so, and is presently available for pre-order on Amazon. In keeping with the title of the book, they have put together a compelling book in record time; they did so by leveraging a network of contributing authors, including yours truly.

My chapter is entitled “Use Your Head, then Trust Your Gut”, and in it I reflect on the fact that founders of technology companies in particular have a huge amount of data at their disposal: real-time sales information, user behavior analytics and a huge amount of advice coming at them from board members, investors, advisors and countless other humans who have often strong opinions on how a start-up should be run.

One of the great balancing acts an entrepreneur must perform is synthesizing all of these inputs (many of which are conflicting) and then charting a decisive course of action. When done well, this involves a blend of art and science and qualitative and quantitative thinking.<

In addition to my small contribution to this book, Brad and David assembled dozens of chapters from mentors, company founders and others involved in TechStars into seven themes: Idea and Vision, People, Execution, Product, Fundraising, Legal and Structure, and Work and Life Balance.

This book is a must-read for anyone involved in the creation of early-stage technology startups, so head over to Amazon and order a copy now.

Disk is the New Tape

I came across this gem at Data Center Knowledge, mentioning Twitter’s plans to move into their own data center, having outgrown the managed hosting services they use at NTT America. The article includes a great slide deck by Twitter’s John Adams entitled Scaling Twitter, and was presented at this week’s Chirp 2010 conference, which, sadly, I was unable to attend. There’s a ton of great stuff in here detailing some of the techniques, tools and technologies (including current darlings like Kestrel and Cassandra) that Twitter has used to scale their service in the face of 752% growth in 2008 followed by growth in 2009, a feat somewhat akin to upgrading a jet engine in flight.

But my favorite slide in the presentation is the one entitled “Disk is the new Tape”, which refers to the heavy I/O challenges that social graph applications face. Disk is just way too slow for most Web2.0 applications, which means apps need lots of RAM and must focus on techniques that minimize disk access at all costs in order to provide reasonable (sub 500ms) response times.

Clearly smart software like Kestrel and Cassandra, which are built from the ground up to run in highly distributed environments have enabled the building of apps at internet scale, but it does also suggest that server (and data center) architectures must evolve over time too — moving hard drives out of the critical path (perhaps transitioning to SSDs for non-volatile storage as costs fall?) and thereby relegating hard disks to offline archival storage, a fate met by tape drives years ago.

An Audio Engineering Arthur C. Clarke Moment

Arthur C. Clarke famously wrote that, “any sufficiently advanced technology is indistinguishable from magic.” Well, today I read about a new audio editing that performs magic: Melodyne Editor.

As readers of this blog know, I’m a guitarist and hobbyist recording engineer. I built out a ProTools HD recording studio in my converted garage in Portola Valley, CA back in the day, and my band Soul Patch recorded our two albums there, which we released on our label, Toothless Monkey Music, and numerous other albums were recorded there by my band-mate Nick Peters, who now runs his own label and studio (Bodydeep Music) out in Redwood City, CA.

Please pardon what follows – it is a bit of audio geekery, but anyone who is even superficially familiar with the capabilities of modern digital recording systems will likely be slack-jawed in disbelief when I explain what Melodyne Editor can do. (I have to give a tip of the hat to Thomas Dolby — I’ve been a long-time reader of his blog (and fan of his music), and it was his blog post that made me aware of this amazing tool.)

Anyway, all of this background is just to say that I know my way around digital audio and signal processing plugins. I’ve been a long-time fan and user of AutoTune (quite useful for cleaning up “almost right” vocal takes), which, amazingly, can put out-of-tune vocals back in tune. AutoTune is an example of pure technology magic, though some lament the effect it has had on musicianship and vocal performance.

Then, a couple years ago, I began experimenting with a new pitch processing audio editor called Melodyne. Not only did Melodyne offer the ability to correct out-of-tune instruments or vocals, but it broke the audio waveform down into discrete notes that could be slid around in pitch and time using a graphic editor. This took a step beyond AutoTune – not only could you tweak the pitch of a performance, you could actually move the notes around with your mouse in pitch and time. You could literally alter the melody and rhythm of a vocal or instrumental performance by dragging your mouse around. Pretty amazing, right?

Of course, as amazing as these pitch (and time) audio processing tools are, they have a big constraint: they only worked with monophonic material. You needed a track with a single singer on it, or an instrument that only plays one note at a time: this left out most parts performed by pianos, guitars, vocal choirs, a horn section or an entire symphony. Basically if any of the audio in which you wanted to fix pitch problems contained chords (more than one note played simultaneously) on the track, you were out of luck.

The newest version of Melodyne does something most in the audio world have considered impossible: it allows the editing of polyphonic material. You can literally reach inside a guitar track and retune an individual note within a chord. Or find an out-of-tune singer in a group of backup singers and fix just that singer’s out-of-tune note.

This, my friends, is magic. And one step closer to what I’ve long considered my ultimate fantasy audio engineering tool: software that could take a mono or stereo mixdown of a song, and break it out into a multitrack representation of each individual remix. This would allow anyone to take a favorite song, break it into its component parts and build a remix. There are many reasons why this is probably far more difficult than pitch shifting individual notes in harmonic material, but if anyone could pull this off, my bet is on the wizards at Celemony. Wow.

Note: check out this video on Celemony’s website (sorry no embed code) to see what some serious pros (like Herbie Hancock) think about Melodyne. If you watch long enough, you’ll find Living Color guitarist Vernon Reid refers to Arthur C. Clarke famous aphorism as well, which I didn’t discover until after I wrote this post!

Droidmaker: George Lucas and the Digital Revolution

A few weeks ago, during my family’s Spring Break vacation, I had the pleasure of reading a great history of George Lucas and the massive impact he and the extended Lucasfilm family had on the technology behind filmmaking, and, ultimately on the broader technology ecosystem. This book is exhaustively researched and is almost textbook-like in its presentation of annotated photos and topic-specific sidebars. Unlike a text-book, however, it is a real page-turner. I devoured Droidmaker in a few days sitting poolside in Hawaii, Mai Tais firmly in hand.

What was most enjoyable about this book is it is not simply a George Lucas biography – while Lucas is (obviously) the main figure in the book, author Michael Rubin (full disclosure: he’s an old buddy of mine) does an excellent job placing Lucas and his mentor-colleague Francis Ford Coppala in the context of filmmaking history. Rubin deftly illuminates how their collaboration and competition served to move the filmmaking techniques and technology forward, way up in Northern California, far removed the ossified and technophobic power center of Hollywood. Rubin worked at Lucasfilm early in his career and enjoyed personal access to many of the seminal figures in this book, including Pixar founders Ed Catmull and Alvy Ray Smith as well as George Lucas himself.

There’s plenty of nerd-fodder in here too. Rubin is comfortable discussing the technological intricacies behind video vs. film, frame buffers, computer generated animation, stop-motion photography, 3D rendering and more. I happily geeked out on the discussions of frame rates of film vs. video and 3:2 pulldown techniques used to transfer film to video. The masterstroke in this vein is the discussion of how the guys in the computer division (which later became spun out as Pixar, when Steve Jobs bought the team and technology from Lucas) simultaneously solved the thorny problems of eliminating image jaggies and creating motion-blur in computer-generated graphics. In a single moment of insight (after working for years on both problems), they realized that temporal and spatial randomized sampling while rendering each frame of a computer animation were the key to making the final product look realistic. Or, more specifically, film-like.

The discussion of the technology are informative and approachable even for the non-technical reader (though I admit to having some exposure to the intricacies of digital media editing as a life-long musician and occasional digital audio engineer). These tech discussions reminded me of another great tech history book I read and reviewed recently: Racing the Beam, which is the history of the Atari 2600 game console.

Droidmaker is a great read and really made me appreciate the immense contribution Lucas made to filmmaking but also to digital media technology in general: his personal investment (to the tune of tens of millions of dollars) in fundamental R&D in computer animation and digital audio and video editing in the late ’70s and well through the 80’s led not only to seminal companies that were birthed at Lucasfilm, like Pixar and THX, but also deeply influenced the broader path of technology evolution that led to the emergence of independent companies like Avid (digital video editing) and Digidesign (digital audio recording and editing with the ProTools platform).

So go pick up a copy already!

Pogoplug in the News

The fine folks at Cloud Engines, makers of my favorite consumer electronics gadget, the pogoplug, have been very busy in 2010. They launched at retail here in the US and Canada and followed quickly with announcing availability of the pogoplug in the UK and Europe. It has been fun to start seeing French and German showing up in the pogoplug twitter stream.

They’ve been receiving a flurry of great product reviews, including a 9/10 rating from The Inquirer in the UK and a five-star rating and an Editor’s Choice Award from Cnet-France.

Back here in the US, the pogoplug was just reviewed by Katherine Boehret in the WSJ in the Mossberg Solution. She also did a video review of the device, which you can watch below.

I’m also super-excited for a bunch of new features that will roll out for the pogoplug over the next couple months. Stay tuned…

Apostrophes and Plurals Don’t Mix

Warning: grammar rant ahead…

FOR THE LOVE OF PETE, PEOPLE, NEVER EVER USE AN APOSTROPHE WHEN PLURALIZING A WORD!

Sorry, I had to get that off my chest. I don’t know what is so confusing about this, but I encounter this mistake many times a day. Because I had an excellent English teacher in high school who was a big influence on me (thank you, Mrs. Noland), I am known among my friends and colleagues as a bit of a grammar nazi. In fact, I am a proud member of the Facebook group I Judge You When You Use Poor Grammar. I am comfortable with this.

If you are writing anything for public consumption, using bad grammar and misspelling words makes you look, at worst, unintelligent, and, at best, careless.

While I can overlook many grammatical errors that result from misunderstanding subtler nuances of the English language, this particular rule is so easy, I can’t understand where the source of confusion comes from. Apostrophes are for contractions and possessives. Never for plurals.

I understand that keeping it’s vs. its straight can be tricky, since its is the one case where there is no apostrophe in a possessive, but this still has nothing to do with pluralization.

So get it straight, people. Please.

Repeat after me: I will never use an apostrophe when pluralizing a word.

Ahh, I feel much better.

I should also mention that I Judge You When You Use Poor Grammar has been turned into a very amusing book, which my friend Amy was nice enough to give to me a few days ago – she knows me well. I highly recommend the hard copy version.

Topspin and the Future of Music Marketing

I’ve had the pleasure of working with the fine folks at Topspin Media since I joined the board of the company when Foundry Group invested in Topspin’s Series B in 2008, and I’ve been fortunate to know Topspin’s co-founders, Peter Gotcher and Shamal Ranasinghe since the late ’90s.

Topspin was founded with the premise that the key to any artist’s success in the digital age will hinge on an artist’s ability to engage directly with their fans and build a meaningful and authentic artistic and commerical relationship with them. Topspin provides sophisticated artist-focused and data-driven tools to enable artists and their management to run their businesses online.

Now that Topspin has been working with hundreds of artists and has a couple years of real-world experience with their platform in production, they’ve built up enough data to start to share some of their findings about managing, measuring and marketing with data. Shamal gave an excellent presentation at the Midem Conference in Cannes last week, and the deck is packed full of Topspin’s learnings about best practices for running direct-to-fan campaigns.

Here’s the presentation, which is well worth a read for anyone interested in the latest thinking on music marketing in digital age. For a more in-depth discussion of these slides, check out Shamal’s post on the Topspin blog.

Long Hiatus / Random News

It has been a while since I’ve written a blog post. I think some of it is twitter-induced. Instead of a blog post, I simply tweet a URL and feel that I’ve done my part. Ahh, the lazyweb. Actually, there is now an official description of this phenomenon: the Gresham/Morgan Internet Law. My friend Howard Morgan pointed out on his blog that cheap tweeting drives out dear blogging. Guilty as charged.

Rather than simply blog about my insufficient blogging, there are several things in my world (more specifically in the Foundry Group portfolio) today that merit a mention:

First, EmSense (one of standard bearers in our HCI theme) announced today that they’ve raised a $9m round, led by Technology Partners. EmSense has made a ton of progress this year establishing themselves as a serious player in the neuromarketing space, and I’m excited to have Technology Partner’s Roger Quy join the board. He’s probably one of the only VCs out there with a PhD in neuroscience, so his endorsement of EmSense is particularly meaningful.

Second, today Topspin Media announced that registration for Berkleemusic.com’s course “Online Music Marketing with Topspin” starts today. Berklee is one of the premier names in music schools, and this course represents a first step in Topspin expanding the reach of their software beyond the private beta they’ve been running over the past year.

Here’s a quick video preview describing more about the course:

Third, Oblong was featured last week in a Bloomberg TV series called Bloomberg Innovators. Oblong’s founders and technology are featured prominently in the show, as are I and my partner Jason Mendelson (and his Galaga machine). If you haven’t seen Oblong’s tech in action, now’s your change. Bloomberg doesn’t allow embeds of the video, so you’ll have to follow this link.

And, last but not least, I’d be remiss if I didn’t put in a plug for the Defrag Conference, happening next week in Denver on November 11-12. This is the third year of the Defrag Conference, and it gets better every year. Come join folks like my Foundry Group partners, Defrag founder Eric Norlin, Andy Kessler and Paul Kedrosky as we geek out in the mile high city.

Bay Area Food Log

My family just got back last week from spending a month in San Francisco. While we’ve lived (quite happily) in Boulder over the past three years, we spent 17 years in the Bay Area and like to get back there on a regular basis for an extended stay to reconnect with old friends and to reconnect with the great cuisine the Bay Area has to offer.

Any of you who follow me on Twitter or Facebook probably saw me post status updates as we did our food tour, but I didn’t always remember to do it at each meal. So I looked back at my calendar (and my news feeds) to try to reconstruct a (mostly) comprehensive list of where we went out to eat during our month in the Bay Area. While we tried a couple new places (La Ciccia and Range), our destinations were more oriented towards old favorites, honed over many years of living in Northern California. Here goes:

7/18 – Yank Sing, San Francisco (lunch)

7/18 – Kokkari, San Francisco

7/19 – Pizzeria Picco, Larkspur (lunch)

7/19 – Taylor’s Refresher, San Francisco

7/20 – Sushi Ran, Sausalito

7/21 – Golden Flower, San Francisco (lunch)

7/21 – Slanted Door, San Francisco

7/30 – Tres Agaves, San Francisco (lunch)

7/30 – Mijita, San Francisco (dinner)

7/31 – La Ciccia, San Francisco

8/01 – The Village Pub, Woodside

8/02 – Tacubaya, Berkeley (lunch)

8/02 – Little Star Pizza, San Francisco

8/03 – 21st Amendment, San Francisco (lunch)

8/04 – Quadrus Cafe, Menlo Park

8/04 – Spruce, San Francisco

8/05 – Sancho’s Taqueria, Redwood City (lunch)

8/06 – Stern Dining Hall, Stanford University (lunch)

8/06 – Straits Cafe, Palo Alto

8/07 – Tres Agaves, San Francisco (lunch)

8/08 – Ame, San Francisco

8/12 – Yoshi’s SF, San Francisco

8/13 – Gialina Pizzeria, San Francisco

8/14 – Fish, Sausalito (lunch)

8/14 – Isa, San Francisco

8/15 – Yank Sing, San Francisco (lunch)

8/15 – Range, San Francisco

We also made numerous trips (in person and takeout) to Pizzeria Delfina (Pacific Heights location), Bittersweet Cafe and La Boulange (Fillmore & Union St. locations), but I can’t recall the precise days we visited those fine establishments. The careful reader no doubt noticed my pizzeria and taqueria fixation. What can I say, they are two of my favorite food groups.

We also hit the world’s best farmer’s market (the San Francisco Ferry Plaza Farmer’s Market) numerous times during our stay. Late July and early August are prime season for heirloom tomatoes and peaches and nectarines. And the king of all purveyors of stone fruit is, of course, Frog Hollow Farm.

The stand-out dinners for me during the month were La Ciccia, Ame and Range. The restaurant I’m most disappointed we didn’t make it to was A16, which we really enjoy, but somehow never made it there.

I’m always looking for suggestions of new places to try when I’m in SF (which is often). Please mention your favorites in the comments!