.lex extracts from a wikipedia dump, format: (tab delimiter)