Conversation with l.raymus@gmail.com/gmail.7C66642B

(03:32:42 PM) Lauri-Ann Raymus: gonna make some coffee
(03:32:48 PM) pathall@gmail.com/Gaim: o dag me too
(03:32:49 PM) pathall@gmail.com/Gaim: RACE IS ON
(03:36:10 PM) Lauri-Ann Raymus: back
(03:36:13 PM) Lauri-Ann Raymus: coffee brewing
(03:36:22 PM) Lauri-Ann Raymus: we have this stupid basket filter
(03:36:33 PM) Lauri-Ann Raymus: we need a coffee maker with a cone filter
(03:36:38 PM) Lauri-Ann Raymus: the basket always gets all wonky
(03:36:49 PM) Lauri-Ann Raymus: my old coffee maker was a cone :/
(03:38:03 PM) pathall@gmail.com/Gaim: i took some pictures
(03:38:04 PM) pathall@gmail.com/Gaim: heh
(03:38:07 PM) pathall@gmail.com/Gaim: water boiling
(03:38:12 PM) pathall@gmail.com/Gaim: i just got 3 shirts from threadless
(03:38:16 PM) pathall@gmail.com/Gaim: i ordered mediums
(03:38:23 PM) Lauri-Ann Raymus: what's threadless?
(03:38:30 PM) pathall@gmail.com/Gaim: because i told myself i have to get in better shape so i am not embarrassed to wear them
(03:38:30 PM) pathall@gmail.com/Gaim: heh
(03:38:39 PM) Lauri-Ann Raymus: I'm wearing xl and 2xl :/
(03:38:45 PM) Lauri-Ann Raymus: I'd love to get back to L
(03:38:52 PM) Lauri-Ann Raymus: good for you! :)
(03:39:00 PM) Lauri-Ann Raymus: so what do you think
(03:39:07 PM) Lauri-Ann Raymus: should I try and avoid dates on my resume?
(03:39:11 PM) Lauri-Ann Raymus: so people don't know how old I am?
(03:39:31 PM) pathall@gmail.com/Gaim: speaking of dates
(03:39:39 PM) pathall@gmail.com/Gaim: i totally flirted wtih this girl in front of her boyfriend
(03:39:40 PM) pathall@gmail.com/Gaim: haha
(03:39:54 PM) pathall@gmail.com/Gaim: she built this amazing little pyramid of sugar packets
(03:40:01 PM) pathall@gmail.com/Gaim: and i saw her from around the corner
(03:40:03 PM) pathall@gmail.com/Gaim: waaay cute
(03:40:09 PM) pathall@gmail.com/Gaim: a lot like adrianne, actually
(03:40:18 PM) pathall@gmail.com/Gaim: and i said
(03:40:23 PM) pathall@gmail.com/Gaim: "whoa, that is ridiculously cool"
(03:40:30 PM) pathall@gmail.com/Gaim: and she sort of started giggling
(03:40:30 PM) Lauri-Ann Raymus: man
(03:40:34 PM) pathall@gmail.com/Gaim: and then i realized she had a boyfriend
(03:40:37 PM) pathall@gmail.com/Gaim: sitting there next to her
(03:40:40 PM) Lauri-Ann Raymus: you like practically took her on the table :P
(03:40:41 PM) pathall@gmail.com/Gaim: or at least a friend
(03:40:43 PM) pathall@gmail.com/Gaim: i dunno, maybe not
(03:40:50 PM) Lauri-Ann Raymus: that was perfectly acceptable flirtation
(03:40:56 PM) pathall@gmail.com/Gaim: then i was like "no, i'm serious, you should so put that on your resume"
(03:41:00 PM) pathall@gmail.com/Gaim: and she started laughing
(03:41:00 PM) Lauri-Ann Raymus: heh
(03:41:02 PM) Lauri-Ann Raymus: cute
(03:41:04 PM) pathall@gmail.com/Gaim: heh
(03:41:08 PM) Lauri-Ann Raymus: see you can do cute
(03:41:09 PM) pathall@gmail.com/Gaim: and then the guy started laughing
(03:41:09 PM) Lauri-Ann Raymus: women like cute
(03:41:13 PM) pathall@gmail.com/Gaim: SO I WALKED AWAY
(03:41:14 PM) pathall@gmail.com/Gaim: haha
(03:41:20 PM) pathall@gmail.com/Gaim: yeah, cute i can do
(03:41:30 PM) pathall@gmail.com/Gaim: it's the part where they find out that i'm psychotic that is problematic
(03:42:23 PM) Lauri-Ann Raymus: you're not psychotic
(03:54:43 PM) pathall@gmail.com/Gaim: on good days i'm not :P
(03:54:57 PM) Lauri-Ann Raymus: uhuh
(03:55:02 PM) Lauri-Ann Raymus: bleh
(03:55:06 PM) Lauri-Ann Raymus: so sick of doing this resume thing
(03:55:08 PM) pathall@gmail.com/Gaim: hey can i tell you about a dorky project i'm doing?
(03:55:12 PM) Lauri-Ann Raymus: go ahead
(03:55:16 PM) pathall@gmail.com/Gaim: (it will be a distraction from resumes, hehe)
(03:55:26 PM) pathall@gmail.com/Gaim: ok go here:
(03:55:26 PM) pathall@gmail.com/Gaim: http://ruphus.com/stash/katakana.html
(03:55:29 PM) pathall@gmail.com/Gaim: and type "fuji"
(03:55:35 PM) pathall@gmail.com/Gaim: you should see some japanese
(03:55:57 PM) Lauri-Ann Raymus: heh neat
(03:55:59 PM) Lauri-Ann Raymus: very neat
(03:56:20 PM) Lauri-Ann Raymus: I wanna see some thai!
(03:56:22 PM) pathall@gmail.com/Gaim: yeah and for certain languages very useful
(03:56:29 PM) pathall@gmail.com/Gaim: thai huh
(03:56:35 PM) pathall@gmail.com/Gaim: okay, i will work on thai for you
(03:56:48 PM) pathall@gmail.com/Gaim: but let me explain what the project is
(03:57:00 PM) pathall@gmail.com/Gaim: so that process is called transliteration of course
(03:57:18 PM) pathall@gmail.com/Gaim: and in the case of htat particular tool, the transliteration is from latin script to katakana
(03:57:32 PM) pathall@gmail.com/Gaim: so it boils down to writing some rules more or less like this:
(03:58:00 PM) Lauri-Ann Raymus: ok
(03:58:20 PM) pathall@gmail.com/Gaim: pyu » ピュ
pyo »ピョ
(03:58:21 PM) pathall@gmail.com/Gaim: etc
(03:58:57 PM) pathall@gmail.com/Gaim: fu » フ
(03:58:58 PM) pathall@gmail.com/Gaim: you know
(03:59:14 PM) Lauri-Ann Raymus: hmm
(03:59:14 PM) pathall@gmail.com/Gaim: there are some complications but basically it works with a series of substitutions
(03:59:25 PM) Lauri-Ann Raymus: that sounds really complex
(03:59:39 PM) pathall@gmail.com/Gaim: well, in the case of katakana there are a fair number of rules
(03:59:40 PM) pathall@gmail.com/Gaim: http://ruphus.com/stash/katakana-translit.js
(03:59:44 PM) pathall@gmail.com/Gaim: that's the code that's doing it, actually
(03:59:54 PM) pathall@gmail.com/Gaim: it's mostly just a list of rules
(04:00:09 PM) pathall@gmail.com/Gaim: but writing the rules can be very fiddly and difficult and time consuming
(04:00:47 PM) pathall@gmail.com/Gaim: but a lot of times, settin gup a computer to input in a non-latin script can be hard
(04:01:07 PM) pathall@gmail.com/Gaim: and in the case of someone in an internet cafe, just about impossible, because they can't futz with fonts , keyboard layouts, blah blah
(04:01:16 PM) pathall@gmail.com/Gaim: so this thing can be useful in such circumstances
(04:01:30 PM) pathall@gmail.com/Gaim: and also for amundo, it could be useful for translators
(04:01:45 PM) pathall@gmail.com/Gaim: so i was thinking, well, it would be great if we could have systems like this for ALL writing systems
(04:01:50 PM) pathall@gmail.com/Gaim: which seems like a tall order
(04:01:58 PM) pathall@gmail.com/Gaim: but i think i have started to crack that nut
(04:03:04 PM) pathall@gmail.com/Gaim: i wrote this program that takes bilingual dictionaries as input
(04:03:06 PM) pathall@gmail.com/Gaim: like this one:
(04:03:09 PM) pathall@gmail.com/Gaim: (greek/english)
(04:03:18 PM) pathall@gmail.com/Gaim: http://ruphus.com/svn/translit/corpora/en2el.txt
(04:03:26 PM) pathall@gmail.com/Gaim: and learns letter correspondences, like these:
(04:03:42 PM) Lauri-Ann Raymus: whoah
(04:03:45 PM) pathall@gmail.com/Gaim: http://ruphus.com/svn/translit/schemes/scheme-en2el.txt
(04:04:05 PM) Lauri-Ann Raymus: wow neat
(04:04:11 PM) pathall@gmail.com/Gaim: the beauty of it is that it's totally language-independent
(04:04:14 PM) Lauri-Ann Raymus: does seem like a tall order but immensely useful
(04:04:17 PM) Lauri-Ann Raymus: yeah exactly
(04:04:52 PM) pathall@gmail.com/Gaim: so if i have a russian - finnish dictionary (and i do, extracted from wikipedia), it can learn rules for finns to type russian: http://ruphus.com/svn/translit/schemes/scheme-ru2fi.txt
(04:05:52 PM) pathall@gmail.com/Gaim: it's not quite done yet
(04:06:00 PM) pathall@gmail.com/Gaim: but i think it will work
(04:06:40 PM) pathall@gmail.com/Gaim: want to konw how it works? :D
(04:06:46 PM) Lauri-Ann Raymus: sure!
(04:06:58 PM) pathall@gmail.com/Gaim: ok so
(04:07:34 PM) pathall@gmail.com/Gaim: do youhave a font for this page? http://ka.wikipedia.org/wiki/%E1%83%9E%E1%83%94%E1%83%9E%E1%83%A1%E1%83%98
(04:07:42 PM) pathall@gmail.com/Gaim: or do you see question marks
(04:09:07 PM) Lauri-Ann Raymus: nope font
(04:10:51 PM) pathall@gmail.com/Gaim: how about here: http://bg.wikipedia.org/wiki/%D0%9F%D0%B5%D0%BF%D1%81%D0%B8
(04:11:00 PM) pathall@gmail.com/Gaim: ack wait
(04:11:02 PM) pathall@gmail.com/Gaim: don't click that
(04:11:04 PM) pathall@gmail.com/Gaim: did you click that?
(04:11:05 PM) pathall@gmail.com/Gaim: don't click that
(04:11:06 PM) pathall@gmail.com/Gaim: haha
(04:11:09 PM) pathall@gmail.com/Gaim: oh i bet you clicked that.
(04:12:54 PM) Lauri-Ann Raymus: yeah I can see it
(04:13:02 PM) pathall@gmail.com/Gaim: ok
(04:13:14 PM) pathall@gmail.com/Gaim: so, here's a little game:
(04:13:40 PM) pathall@gmail.com/Gaim: страница
момента
Кока Кола
нашия
(04:13:46 PM) pathall@gmail.com/Gaim: which one of those is bulgarian for "coke"
(04:14:48 PM) Lauri-Ann Raymus: koka kona?
(04:14:54 PM) pathall@gmail.com/Gaim: right
(04:14:55 PM) pathall@gmail.com/Gaim: now
(04:15:02 PM) pathall@gmail.com/Gaim: how do you tell that to your computer?
(04:15:06 PM) pathall@gmail.com/Gaim: like, why do you guess that
(04:15:24 PM) Lauri-Ann Raymus: it looks like coca cola - same number of characters
(04:15:37 PM) Lauri-Ann Raymus: and similar starting sounds?
(04:17:09 PM) pathall@gmail.com/Gaim: same number of characters! that's key
(04:17:13 PM) pathall@gmail.com/Gaim: and not just starting sounds
(04:17:20 PM) pathall@gmail.com/Gaim: here's a slightly harder puzzle:
(04:18:19 PM) pathall@gmail.com/Gaim:
Плевен
Бургас
Пловдив
Русе
София
(04:18:23 PM) pathall@gmail.com/Gaim: ok now
(04:18:29 PM) pathall@gmail.com/Gaim: those are some bulgarian cities
(04:19:17 PM) pathall@gmail.com/Gaim: namely (but not in order): Rousse, Sofia, Burgas, Plovdiv, and Pleven
(04:19:24 PM) pathall@gmail.com/Gaim: which one is Pleven?
(04:19:35 PM) pathall@gmail.com/Gaim:
1. Плевен
2. Бургас
3. Пловдив
4. Русе
5. София
(04:19:49 PM) Lauri-Ann Raymus: 1?
(04:19:56 PM) pathall@gmail.com/Gaim: yep. why?
(04:20:15 PM) pathall@gmail.com/Gaim: remember, computers are dumb
(04:20:29 PM) Lauri-Ann Raymus: 1 and 3 have the same character initially and so do plovdiv and pleven
(04:20:49 PM) Lauri-Ann Raymus: and 1 has the correct characters :P
(04:20:52 PM) Lauri-Ann Raymus: er number of
(04:21:22 PM) pathall@gmail.com/Gaim: ah ok so you can read a little cyrillic
(04:21:29 PM) pathall@gmail.com/Gaim: then i must make it more challenging, one sec :P
(04:28:49 PM) pathall@gmail.com/Gaim: http://ruphus.com/stash/georgian.png
(04:28:54 PM) pathall@gmail.com/Gaim: okay, now no cheating is possible
(04:28:55 PM) pathall@gmail.com/Gaim: hehe
(04:29:50 PM) pathall@gmail.com/Gaim: it's doable, right?
(04:29:53 PM) pathall@gmail.com/Gaim: a little puzzle...
(04:30:04 PM) Lauri-Ann Raymus: looking
(04:30:07 PM) pathall@gmail.com/Gaim: i mean you don't have to go through & try to solve it but you can see how it owuld be doable
(04:30:36 PM) Lauri-Ann Raymus: harder
(04:30:52 PM) Lauri-Ann Raymus: word lengths do not necessarily match
(04:31:00 PM) pathall@gmail.com/Gaim: you were half way there with the observatoin about the lengths
(04:31:08 PM) pathall@gmail.com/Gaim: so for instnace, if you start with Gori and Java
(04:31:15 PM) pathall@gmail.com/Gaim: each has just 2 possibilities right?
(04:31:18 PM) Lauri-Ann Raymus: right
(04:31:20 PM) pathall@gmail.com/Gaim: so, how do you distinguish them?
(04:31:25 PM) Lauri-Ann Raymus: two as
(04:31:28 PM) Lauri-Ann Raymus: in java
(04:31:29 PM) pathall@gmail.com/Gaim: yeah, exactly
(04:31:37 PM) pathall@gmail.com/Gaim: therefore 2 is probably java, right?
(04:32:05 PM) pathall@gmail.com/Gaim: from that, you can determine 7 pairs of letters
(04:32:14 PM) pathall@gmail.com/Gaim: g,o,r,i,j,a,v
(04:33:28 PM) Lauri-Ann Raymus: I love logic puzzles :) think that's why I liked programming
(04:35:17 PM) pathall@gmail.com/Gaim: yah
(04:35:20 PM) pathall@gmail.com/Gaim: it is fun isn't it
(04:35:30 PM) pathall@gmail.com/Gaim: and i just randomly picked those from a list of georgian cities on wikipedia
(04:35:36 PM) pathall@gmail.com/Gaim: but the thing is
(04:35:54 PM) pathall@gmail.com/Gaim: that's still work, right>? i mean, if i wanted to be able to do that for a bazillion writing systems, i would have to find "good" inputs
(04:35:56 PM) pathall@gmail.com/Gaim: like these
(04:36:02 PM) pathall@gmail.com/Gaim: so here's what i came up with
(04:36:17 PM) pathall@gmail.com/Gaim: start with a list of (say) english/georgian words
(04:36:29 PM) Lauri-Ann Raymus: dictionary
(04:36:32 PM) pathall@gmail.com/Gaim: which i just happen to have:
(04:36:32 PM) pathall@gmail.com/Gaim: http://ruphus.com/svn/translit/corpora/en2ka.txt
(04:36:38 PM) pathall@gmail.com/Gaim: yeah, a lexicon, to be picky
(04:36:46 PM) pathall@gmail.com/Gaim: oops, that's greek/georgian
(04:37:28 PM) Lauri-Ann Raymus: grabbing my coffee
(04:40:21 PM) pathall@gmail.com/Gaim: ok here it is http://ruphus.com/svn/translit/corpora/en2ka.txt
(04:40:38 PM) Lauri-Ann Raymus: heh ang lee
(04:40:48 PM) pathall@gmail.com/Gaim: you don't have a georgian font i guess so the right side is prolly ?s
(04:41:01 PM) pathall@gmail.com/Gaim: yeah, those are actually the titles of articles on the english and georgian wikipedias
(04:41:06 PM) pathall@gmail.com/Gaim: that's how i got the lexicons
(04:41:37 PM) pathall@gmail.com/Gaim: so here's the question: how do we determine which of these pairs of words have the kind of one-to-one letter correspondences we saw in those city names?
(04:42:11 PM) Lauri-Ann Raymus: no I can see them
(04:42:15 PM) Lauri-Ann Raymus: earlier I said I had a font
(04:42:20 PM) pathall@gmail.com/Gaim: orly? nice
(04:42:24 PM) Lauri-Ann Raymus: nope, font meant no question marks, font
(04:42:33 PM) pathall@gmail.com/Gaim: i see
(04:42:34 PM) pathall@gmail.com/Gaim: cool
(04:43:36 PM) pathall@gmail.com/Gaim: so there are two things, you've already named them:
(04:43:41 PM) pathall@gmail.com/Gaim: 1) same word length
(04:43:47 PM) pathall@gmail.com/Gaim: 2) finding repeated letters
(04:43:59 PM) pathall@gmail.com/Gaim: (like we did when comparing "Java" and "Gori")
(04:44:07 PM) Lauri-Ann Raymus: rght
(04:44:13 PM) pathall@gmail.com/Gaim: so, here's the trick:
(04:44:19 PM) pathall@gmail.com/Gaim: watch this:
(04:44:38 PM) pathall@gmail.com/Gaim: Gori 1234
Java 1232
(04:44:43 PM) pathall@gmail.com/Gaim:
Gori 1234
Java 1232
(04:45:20 PM) Lauri-Ann Raymus: neat counter
(04:45:55 PM) pathall@gmail.com/Gaim:
გორი 1234
ჯავა 1232
(04:46:24 PM) pathall@gmail.com/Gaim: so, you go through the whole lexicon, and look for pairs that have the same pattern
(04:46:26 PM) Lauri-Ann Raymus: a=b b=c
(04:46:38 PM) Lauri-Ann Raymus: so a=c
(04:46:49 PM) Lauri-Ann Raymus: er that didn't make sense
(04:46:59 PM) Lauri-Ann Raymus: translate them both to numbers
(04:47:24 PM) Lauri-Ann Raymus: where b is a number
(04:47:27 PM) Lauri-Ann Raymus: it did make sense
(04:48:23 PM) pathall@gmail.com/Gaim: http://ruphus.com/svn/translit/corpora/matches-en2ka.txt
(04:48:38 PM) pathall@gmail.com/Gaim: that's the intermediate output
(04:48:45 PM) pathall@gmail.com/Gaim: those are the pairs of words from the lexicon that have the same patter
(04:48:46 PM) pathall@gmail.com/Gaim: n
(04:49:23 PM) pathall@gmail.com/Gaim: there are some false positives in there
(04:49:28 PM) pathall@gmail.com/Gaim: for instnace, this is clearly an error:
(04:49:34 PM) pathall@gmail.com/Gaim:
Juan Gris
012345678
გრი, ხუან
(04:49:54 PM) pathall@gmail.com/Gaim: but by and large the pairs are good
(04:50:00 PM) pathall@gmail.com/Gaim: anyway, that's teh basic idea
(04:50:01 PM) Lauri-Ann Raymus: he saw them
(04:50:04 PM) Lauri-Ann Raymus: and saw that they were good
(04:50:08 PM) Lauri-Ann Raymus: it
(04:50:12 PM) Lauri-Ann Raymus: it's a great idea
(04:50:14 PM) pathall@gmail.com/Gaim: once you have good pairs like that, you can just count up the letter correspondences
(04:51:38 PM) pathall@gmail.com/Gaim: and you get this: http://ruphus.com/svn/translit/schemes/scheme-en2ka.txt
(04:52:38 PM) Lauri-Ann Raymus: very nice :)