1. Digital Archives Do Not Fit In Libraries

    Via The NYTimes (and other places):

    The [Library of Congress] will archive the collected works of Twitter, the blogging service, whose users currently send a daily flood of 55 million messages, all that contain 140 or fewer characters.
    Mr. Raymond said that the archive would be available only for scholarly and research purposes.

    It’s interesting that The Library of Congress is so focused on saving this data, and seems to consider accessing it later a secondary concern. We see this approach to archiving all the time at Perpetually. Unlike physical assets (think books), storage of digital assets (think Twitter messages) is really not very difficult. The hard part is building an efficient system to retrieve this data. Twitter’s billions of short messages make this point even clearer.

  2. Twitter Homepage Redesign

    Twitter’s homepage redesign is all about new visitors.  For the first time, Twitter puts together trending topics as a stock ticker marquee, a list of interesting people you might want to follow and a “New to Twitter?” section.

    New people now participate in the Twitter ecosystem before they sign up, and understanding the Twitter jargon is no longer a prerequisite. Look at how the new site shows actual tweets for the first time. Before it may have been easy to see what was most popular “right now”, but it was totally disconnected from how people really use the service. 

    Twitter has always been in the habit of learning from their users: hash-tags began with our very own Chris Messina. Now that feature is core to the service. These changes recognizes that existing users never see the homepage, and new users need help getting started. 

    Twitter has finally decided to educate people. In retrospect, the challenge for them is to clearly communicate the value of tweeting. Their homepage is certainly moving in that direction. 

  3. Innovating Cron: Announcing Norc

    Last week at Python NYC we open sourced Norc, a task management system that replaces Unix cron.  At Perpetually we let anyone archive any web site on any schedule.  One of the big challenges we faced early on was to create a flexible, traceable and scalable scheduling system to handle this problem.  While cron is great, it’s not geared toward solving this problem: Tasks are tied to a single computer, and they’re managed independently for each host and user from the command line.  In addition, with cron we’d have to build all sorts of infrastructure to handle error reporting, logging, etc.  So there was a clear need for something better.

    After looking at systems like AutoSys, Amazon’s SQS and RabbitMQ, it didn’t seem like anyone had solved this problem in a simple, open way.  AutoSys is a closed, proprietary system, and SQS / RabbitMQ are queueing systems, without support for scheduling.

    So I started developing Norc back in April, and for the last few months it’s been the backbone for all our scheduled archiving tasks.  As we built out our infrastructure we started to face the same system administrative tasks as everyone else: various system and database backups, daily reports, that kind of thing.  While cron is typically used for managing this kind of thing, the self-documenting, fault-tolerant and error reporting features of Norc proved a more compelling solution.

    It’s critical technology for us, and we think a lot of people could get a lot of value out of it.  There’s also a lot more that could be done to extend Norc, such as building an awesome web-based or mobile front-end.  That’s why we’re open sourcing it: We hope it proves useful for others, and we hope it develops into a more mature product.

    So, see & check out the source code on Git Hub:

    http://github.com/darrellsilver/norc

    And let us know what you think!

    Also, we’re hiring developers, so please contact us if you want to work on something awesome in New York!

    Update: There’s some great discussion of Norc happening at Hacker News.

  4. fascinated:

Songkick has a cool app/company timeline in their new space. This is great, all small teams should do this to remember just how much they actually accomplish. Easy to forget.

    fascinated:

    Songkick has a cool app/company timeline in their new space. This is great, all small teams should do this to remember just how much they actually accomplish. Easy to forget.

  5. History in a Snapshot

    It’s been interesting to see what our first users value in Perpetually.  When we started our focus was entirely on capturing the web at a moment in time and reflecting it back for the user exactly as it was, HTML, CSS and all.

    But in the last couple weeks a bunch of folks asked to see screenshots of their archives, in addition to full-text searching and browsing.  Since we were already taking these screenshots for visual browsing, this was an easy feature to add.

    We’ve added a link in the header of each record, and in search results, to a full-page beautiful screenshot:

    We also want to guarantee that URLs in Perpetually are permanent, and can thus be reliably referenced forever.  So, if you’re looking at The White House from September 8, 2009 the URL is:

    http://perpetually.com/200909080000/http://www.whitehouse.gov/

    And to see the screenshot of that same record, just add ‘screenshot’ as a prefix:

    http://perpetually.com/screenshot/200909080000/http://www.whitehouse.gov/

  6. 5 - 8% of the Web Disappears Every Year

    When I started looking into online data decay in January I was surprised to learn there was no reliable source of analysis on the subject.   So I researched it myself. These findings seem an appropriate article for our first (non-TechCrunch50) blog post on The Perpetual Web.

    Estimating the amount of information that disappears due to data decay is difficult because there’s no single definition of the problem.  This is also what makes it an often undervalued issue.  For example, blog posts move down the page as they are replaced by new content, but they are generally still archived for the future.  Archiving of news, however, has taken a big step backward on the internet because a story’s prominence (is there a photo, is it breaking news?) changes constantly, and these changes are most always lost.  This kind of information was naturally saved when both the story and its context was delivered on a printed page once a day, but on the internet it’s often lost forever.

    To achieve a simple, conservative estimate of data decay, I limited this research to the crudest manifestation of the problem: content that simply disappears.  I looked at 12,000 bookmarks on Delicious across a random set of users, writing some software to check each of those bookmarks to see if their URLs were still active (meaning a non-200 level HTTP status code).

    The Delicious.com data set is a particularly high-value one because it relies upon Delicious’ filters and users to keep out spam and SEO junk. This data naturally only sees bookmarks that someone cared enough about to want to share and reference in the future.



    The green line shows broken links as a percentage of the total over time. One of the reasons we publish decay as a range (5-8%) instead of as a fixed number is because the rate of decay changes over time (see the black trend line).  Unlike us mere humans, the longer content survives more long-lasting it becomes.  Basically, if a bookmark lasts one year, it’s more likely to make it to two, and so on.

    This test is far from perfect, but once we gain more data from Perpetually.com we’ll be able to repeat this analysis less conservatively. The “gotchas” I’ve identified are

    • We don’t know the date when content was first published we substitute the date on which it was first bookmarked. This skews the results toward a shorter lifespan.
    • We’re only taking into account data decay of the type “live” to “lost”. Thus, any site that changes design or content on the page you bookmarked is still considered active. This is probably the biggest distance we maintain from having a perfect dataset.
    • The web is young, and changing fast. Decay resulting in broken bookmarks are improving, while changes are increasing. Thus, ten years ago pages disappeared quite quickly relative to the relative stability of the technology powering the web today.
    • Delicious is young and the data is not constant. The number of links in our sample of 12,000 implies a certain adoption of Delicious over the past few years. Going back three years, the sample size can be quite small. This too will improve over time.
  7. Launch Notes From TechCrunch50 2009

    Just over a week ago we launched as a finalist in TechCrunch50.  Have a look at our demo video.  TechCrunch50 is a PR machine, designed from start to finish to launch new companies, show off sponsors and further define TechCrunch as a market leader.  After a month of work, rehearsals and the event itself we learned some lessons that will help our business immensely and may prove useful for anyone trying to decide if TechCrunch50 is right for them.

    To apply or not to apply:

    Your product isn’t too young.

    The product we had for the application and interview was entirely different than what we launched with.  The TechCrunch folks who decide the companies are looking at the potential of your business and the uniqueness of your idea.  If they believe you can deliver, it doesn’t matter that you haven’t yet done so.

    The application process is chaotic.

    Don’t let it get to you.  A lot of people felt they were left hanging during the application process.  The event is a huge undertaking, and the TechCrunch staff busts their asses for months, interviewing hundreds of companies, reading a thousand applications, all to make sure that it’s a success.  You often couldn’t tell which time zone people were in because emails came at all hours of the day and night.  The dedication and throughput of the team is incredible, but the shear scale of the event means that mistakes were made and deadlines were missed.  Take a deep breath, realize you’re all on the same team, and be flexible.

    Is TechCrunch50 a good fit?

    It seems to me that SXSW is an equally (maybe better) venue to launch than TechCrunch50 for anything that requires grass roots adoption.  Compare the success of Twitter and Foursquare (both launched at SXSW) with the plethora of social-media startups vying for a decreasing amount of attention at TechCrunch50.

    The nature of TechCrunch50 is that you’re telling the audience what to think and hoping that they respond to it with a beta signup, venture money, press coverage, or something else.  But social networks need evangelists, early adopters and a lively community.

    Paul Graham recommended that any startup with “a chicken and egg problem” spend a majority of their demo discussing how they’ll solve that problem.  I think that if you have to talk about how you’ll solve this problem you’re already facing a huge challenge.  Instead, you should be solving this problem with private beta users, and using TechCrunch50, if at all, to show how you’ve already managed this hurdle.

    The audience has high expectations of your company because the event is such an industry leader.  This was great for us because we got great feedback and started to build a reputation.  But some people were immediately against several startups on stage because they felt a sense of competition with the founders.  The point is that this exclusive club can be a double-edged sword.

    Update: A couple days after writing this @ev summed up my point in way fewer characters: ”I don’t think Twitter would have done well at TC50 or Demo. (Likely response: WTF?) Wonder if Google would have. (Search? Yawn.)”

    Make the deadlines work for you.

    One of the reasons we applied for TechCrunch50 was that it demanded from our product exactly what we needed to develop anyway.  The deadlines for TechCrunch50 (end of June for the application, beginning of September for the first demo, mid-September for the event itself) were a great way to set goals for our product development.  Several companies rearranged their development schedule around TechCrunch50 instead of focussing on what their company needed.  When TechCrunch50 finishes you’ll still have a company to build.

    Application & demo:

    Make a good impression in your application video.

    The video that accompanies your initial application is what will get you to the next round of consideration.  I was told this more than once by the TechCrunch staff. It’s just that simple.  Get help from someone who knows how to integrate video with your product and give them time to execute it well.

    Be wary of pandering.

    Some people may disagree, but I refused to make our demo super flashy.  I’m a strong believer in TechCrunch50 as a launch event, but if you sell one type of company on stage and deliver another one to your customers you’ll be doing damage to your business in the long-term.  Also, please don’t pander to Arrington when you’re on stage unless it’s really funny.  I can’t tell you the number of jokes that fell flat because someone was talking to one guy in an audience of a few thousand.  AnyClip pulled it off with a search for Darth Vader that turned up Arrington.  That was hilarious.

    The most convincing demos were those that had a solid success story on hand.  It was powerful for us to use The Wall Street Journal design team in our demo, but nothing compared to CitySourced, which brought their first customer on stage to answer questions from the judges.  It was almost the only time in the two days of demos where the judges sounded definitively less experienced than the demoing company.

    Listen talk listen talk:

    You’re in. Don’t be nervous.

    The conference floor is filled with all the people you read every day, see speak at conferences, and report the news throughout the tech industry.  We got a whole bunch of high-profile business cards, but the most interesting conversations came from those who could really talk with us in depth about product direction, expectations and key markets.  Most of the real innovation in technology happens because of people who are on the front lines of business needs each day and are still turned on by a new idea.  These are the folks with whom I’m most looking forward to building a long-term business.

    Talk to more people about what you’re launching, and listen to what they say.

    We were saved many times by the advice we got from our advisers, friends and anyone else who’d listen.  The negative feedback we got on stage was directed exactly at the area of our business that had received the least feedback.

    Jason and his team want you to succeed.

    Jason Calacanis is the best pitch coach I’ve ever met.  Jason talks a mile a minute, and often he tells you four things before he’s thought about two of them.  But listen to him because his intuition has enormous value.  His team truly cares about putting on the best event, and the TC50 and Demo Pit companies are central to achieving this.  Regardless of your feelings about the chaotic application process, they are on your side.

    Talk to the TechCrunch staff at TechCrunch50 and listen to what they say.

    TechCrunch employs approximately 40,000 people across 300 countries on 2 planets.  Or so it would seem.  Everyone is extremely invested in the success of TechCrunch50, and by extension, your success.

    Talk to the volunteers and listen to what they say.

    They’re all from around SF, they’re all interested in startups, and none of them paid any money to be there.  They can be very exciting, and they’re looking to you for inspiration, advice, feedback, or maybe vice versa.

    Talk to everyone always at every table over every drink and don’t be afraid to talk to everyone always.  And listen to what they say.

    It’s an Alec Baldwin reference:  Always be talking.  If you’re talking with the rest of your team, and it’s not because your demo is FUBAR, you’re wasting your time.

    Mistakes we made:

    If we had it to over again, I think we’d launch into a private beta instead of publicly launching our first product five minutes after getting off stage.  While we got our first customers, we were also tied to support issues from the first day.  If we had launched in private beta, we’d have lost some early revenue but we’d still have a huge list of email addresses and Twitter followers with whom we could roll out the product on a more controlled schedule.

    I’m totally envious of AnyClip’s demo.  They are masters of the stagecraft of the demo as well as the Q & A after.  I wish I had stronger, better prepared answers to the judges questions, but, alas, my inexperience on stage was on full display.

    The morning after:

    The work really starts on Wednesday.  Despite meeting the biggest deadline of my life, there’ll be no vacation afterwards.  I was up at 7am the next morning writing customers, fixing bugs and responding to email.  Be prepared: Regardless of what you achieve on launch day, the real work starts after the headlines and Twitter feed levels off.

    The preparation and event itself advanced our business at least six months ahead of where we’d be otherwise.  A later post will discuss where we’ll be going next…