from math import sqrt from collections import defaultdict def ngrams(s,n): """ >>> ngrams('abcde', 2) ['ab', 'bc', 'cd', 'de'] """ return [''.join(s[i:i+n]) for i in range(len(s)-n+1)] def freq(seq): fq = defaultdict(int) for elem in seq: fq[elem] += 1 return fq def scalar(vec): return sqrt(sum([k*k for v,k in vec.items()])) def sim(v, w): total = 0 for elem in v: if elem in w: total += v[elem] * w[elem] return float(total) / (scalar(v) * scalar(w))