Reading the Skype Chat Log

I don’t use Skype that often, but as it seems sometimes you just have to… If you think it would be nice to read your Skype’s chat log without starting up the Skype software, now you can. I hacked a python script that will read the chat log from the Skype user data folder and print it to the console so you can rediscover long-lost treasures:

Sonium - 2006-10-24 18:25:30 : someone speak python here?
lucky - 2006-10-24 18:25:53 : HHHHHSSSSSHSSS
lucky - 2006-10-24 18:26:08 : SSSSS
Sonium - 2006-10-24 18:26:16 : the programming language

The script was written using the storage format analysis by Dmytry Lavrov. It works across at least Skype version 2.1.0.81 on Linux and some not-so-recent Windows version (don’t know exactly, sorry). The only command line parameter is the path to the directory where it will find the chatmsg*.dbb files and extract the relevant information. The main benefits of this script are

  • that it decodes the obscure date/time format in the database (?) files, and
  • uses this information to sort the log entries before printing them out, so you can follow conversations

Otherwise you can simply look at these files using your favorite hex editor as the messages themselves are stored in plain text format. You can download the script or copy-and-paste it from below.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
from datetime import datetime

workDir = sys.argv[1]

sizes = [256, 512, 1024, 2048, 4096]

def readString(data, offset):
  result=''
  while (data[offset] != '\x00'):
    result += str(data[offset])
    offset += 1
  return result

class Message:
  def __init__(self, time, sender, message):
    self.time = time
    self.sender = sender
    self.message = message

  def __cmp__(self, other):
    return cmp(self.time, other.time)

  def __str__(self):
    return self.sender + " - " + str(self.time) +  " : " + self.message

def decodeTime(data, tsOffset):
  ts = ((ord(data[tsOffset+2]) & ~0x80) << 0) | \
    ((ord(data[tsOffset+3]) & ~0x80) << 7) | \
    ((ord(data[tsOffset+4]) & ~0x80) << 14) | \
    ((ord(data[tsOffset+5]) & ~0x80) << 21) | \
    ((ord(data[tsOffset+6]) & 0x0f) << 28)

  return datetime.fromtimestamp(ts)

allMessages = []

for size in sizes:
  try:
    f = open(workDir + '/chatmsg' + str(size) + '.dbb', 'rb')
    data = f.read()
    f.close()
  except IOError:
    continue

  pos=0
  while (1):
    membersOffset = data.find('\xe0\x03\x23', pos)

    if (membersOffset == -1): break
    # ignore member names (until ";")

    tsOffset = data.find('\xe5\x03', membersOffset)
    if (tsOffset == -1): break
    time = decodeTime(data, tsOffset)

    senderOffset = data.find('\xe8\x03', membersOffset + 4)
    if (senderOffset == -1): break
    sender = readString(data, senderOffset+2)

    msgOffset = data.find('\xfc\x03', senderOffset + 2 + len(sender))
    if (msgOffset == -1): break
    msg = readString(data, msgOffset + 2)

    allMessages.append(Message(time, sender, msg))

    pos=senderOffset

allMessages.sort()

for msg in allMessages:
  print msg

About this entry