I am using more and more Twitter as an open bookmark / note repository. I was looking for a solution to backup my timeline and browse my tweets locally. The web service http://tweetbook.in does something like this. It downloads all tweets a user timeline and generates a PDF file.
I have decided to write my own backup engine to keep my tweets in raw format (not only the 140-characters content, author and date) but also to have to ability to format my tweets the way I want. The code is available in the backup-twitter project on github.
The Twitter API is already wrapprreed through the python-twitter module (on github and google-code). This module is available in the Python Package Index (PyPI) and can be easily installed :
sudo pip install python-twitter
Based on python-twitter examples and this blog post, I managed to get all my tweets. The strategy to download a Twitter timeline is described on the Twitter API website. SQLite was the storage solution of choice (Pierre Lindenbaum wrote a similar backup script with javascript and SQLite). I found a good tutorial about Python and SQLite here.
A raw tweet actually contains plenty of metadata. See for instance an original tweet I posted:
{"created_at": "Tue Dec 18 15:18:54 +0000 2012", "favorited": false, "id": 281055949958049792, "retweeted": false, "source": "web", "text": "High-Resolution Mandelbrot in Obfuscated Python http://t.co/YE7Dr6sU | Impressive! cc @hsantuz @jbarnoud via @secti0n9", "truncated": false, "user": {"created_at": "Tue Mar 09 21:53:14 +0000 2010", "description": "Structural bioinformatics. Python. Science 2.0. Teaching. Humor. Science in France/French. Assistant prof at Univ Paris Diderot (France) on sabbatical in Congo.", "favourites_count": 131, "followers_count": 179, "friends_count": 132, "id": 121557269, "lang": "fr", "listed_count": 16, "location": "Pointe-Noire, Congo", "name": "Pierre Poulain", "profile_background_color": "5EAED6", "profile_background_tile": true, "profile_image_url": "http://a0.twimg.com/profile_images/1693624182/20100124-110315_normal.jpg", "profile_link_color": "009999", "profile_sidebar_fill_color": "http://a0.twimg.com/profile_background_images/264836396/dullhunk__A_molecular_model_of_the_bacterial_cytoplasm__Flickr__CC-BY.jpg", "profile_text_color": "333333", "protected": false, "screen_name": "pierrepo", "statuses_count": 2180, "time_zone": "Paris", "url": "http://cupnet.net", "utc_offset": 3600}}
and a retweet that contains metadata of the original author and of the one who retweeted:
{"created_at": "Wed Dec 19 20:36:27 +0000 2012", "favorited": false, "id": 281498250584932352, "retweet_count": 15, "retweeted": false, "retweeted_status": {"created_at": "Wed Dec 19 17:07:48 +0000 2012", "favorited": false, "id": 281445740000198656, "retweet_count": 15, "retweeted": false, "source": "web", "text": "Pleased to share our new web design for all PLOS journals http://t.co/bfFGZv0j http://t.co/KllOIdvi", "truncated": false, "user": {"created_at": "Tue Jan 29 07:27:17 +0000 2008", "description": "PLOS accelerates progress in science and medicine by leading a transformation in research communication. Tweets by Victoria Costello, Blogs & Social Media Mngr", "favourites_count": 16, "followers_count": 24093, "friends_count": 3090, "id": 12819112, "lang": "en", "listed_count": 1750, "location": "USA and UK", "name": "PLOS", "profile_background_color": "999999", "profile_background_tile": true, "profile_image_url": "http://a0.twimg.com/profile_images/2424363764/2etoq0zjwxicokm1woge_normal.jpeg", "profile_link_color": "0033FF", "profile_sidebar_fill_color": "http://a0.twimg.com/profile_background_images/634286014/xhu1mspbr37f70ybe8nn.jpeg", "profile_text_color": "333333", "protected": false, "screen_name": "PLOS", "statuses_count": 2416, "time_zone": "Pacific Time (US & Canada)", "url": "http://www.plos.org", "utc_offset": -28800}}, "source": "web", "text": "RT @PLOS: Pleased to share our new web design for all PLOS journals http://t.co/bfFGZv0j http://t.co/KllOIdvi", "truncated": false, "user": {"created_at": "Tue Mar 09 21:53:14 +0000 2010", "description": "Structural bioinformatics. Python. Science 2.0. Teaching. Humor. Science in France/French. Assistant prof at Univ Paris Diderot (France) on sabbatical in Congo.", "favourites_count": 131, "followers_count": 179, "friends_count": 132, "id": 121557269, "lang": "fr", "listed_count": 16, "location": "Pointe-Noire, Congo", "name": "Pierre Poulain", "profile_background_color": "5EAED6", "profile_background_tile": true, "profile_image_url": "http://a0.twimg.com/profile_images/1693624182/20100124-110315_normal.jpg", "profile_link_color": "009999", "profile_sidebar_fill_color": "http://a0.twimg.com/profile_background_images/264836396/dullhunk__A_molecular_model_of_the_bacterial_cytoplasm__Flickr__CC-BY.jpg", "profile_text_color": "333333", "protected": false, "screen_name": "pierrepo", "statuses_count": 2180, "time_zone": "Paris", "url": "http://cupnet.net", "utc_offset": 3600}}
To date, my timeline contains about 2100 tweets and the SQLite database file weights 5 Mo. All tweets are then put all together in a single html page. The profile pictures of users are optionally displayed. Tweet timestamp, usernames, hashtags and urls are clickable.
I have also found out that some incomplete tweets in my database (due to the 140-characters limit) are fully displayed on Twitter. Any idea why and how I can get a full tweet?
Twitter just released the ability to download your Twitter archive. I haven't tested this feature yet but this seems really nice. However, I am not sure it will be possible to download someone else's timeline.