Conversation with nathan.c.jones@mac.com


20:33Conversation with natedog on Fri 03 Nov 2006 08:13:59 PM EST:
(20:13:59) natedog: sorry.
(20:14:02) natedog: back..
(20:14:22) natedog: yeahman, good shit.
(20:14:34) natedog: i'm still wundrin bout that generalization.
(20:15:26) natedog: like, for "log_2" you use the rooted tree with 2 roots coming out of each vertex.
(20:15:43) natedog: and for log_3 you'd use one with 3 roots.
(20:15:49) ME: yeah i should draw one for three
(20:15:52) natedog: so what do you get if you use a tree that's not regular.
(20:15:58) ME: it doesn't really make the point wiht it
(20:16:00) ME: not regular how?
(20:16:15) natedog: like, a tree where some of them have 3 coming out and some of them 2?
(20:16:17) natedog: see what i mean?
(20:16:30) ME: oh rigth
(20:16:35) ME: one thing i've always wondered
(20:16:46) natedog: yeah?
(20:16:52) ME: is what do you call a structure where the number of roots changes as a function
(20:16:58) ME: like
(20:17:04) ME: root divides in two
(20:17:08) ME: those divide in three
(20:17:12) ME: those divde in four
(20:17:20) natedog: ?
(20:17:25) natedog: k,
(20:17:27) ME: i'll draw it
(20:17:27) natedog: i get it.
(20:17:30) natedog: i got it.
(20:17:47) natedog: dude, it's called "factorial"
(20:21:30) natedog: oh, or maybe you're asking: what's the *inverse* of factorial?
(20:21:45) ME: oh yeah huh
(20:21:48) ME: it is the factorial
(20:22:01) natedog: or the inverse, depending on what you're asking.
(20:22:04) ME: http://ruphus.com/stash/tehweird.png
(20:22:13) natedog: like, the "log" of it is the "inverse" of factorial.
(20:22:25) ME: yeah
(20:22:27) ME: what's the name fo that?
(20:22:31) natedog: dunno.
(20:22:34) natedog: thinking...
(20:23:12) ME: http://www.halfbakery.com/idea/Inverse_20Factorial_20_27_3f_27
(20:23:12) natedog: huh.... do you know about the "Gamma function"?
(20:23:22) ME: hmm not off the top of my head
(20:23:25) ME: i think i've seen it somewhere
(20:23:40) natedog: it interpolates the factorial function.
(20:23:57) natedog: in other words, in the same way that "2^x" extends to any real number,
(20:24:20) natedog: there's a function called Gamma, which specializes to the factorial function on the positive integers.
(20:24:48) ME: there are abuinch of links to stuff about the gamma function in that link
(20:24:55) natedog: yeahman. it's daylight.
(20:25:00) ME: interesting
(20:25:05) natedog: it's used by analytic number theorists all the time.
(20:25:06) ME: what's that mean?
(20:25:20) natedog: it's the "infinity factor" for the riemann zeta function.
(20:25:23) natedog: oh, daylight?
(20:25:30) natedog: it means that it's good stufffff.
(20:25:41) natedog: zat what you're asking?
(20:27:03) ME: yeah
(20:27:07) natedog: k
(20:27:20) ME: http://mathworld.wolfram.com/MinkowskisQuestionMarkFunction.html
(20:27:23) natedog: i meant as in the "daylight, bruthuh" sense of the word.
(20:27:40) ME: that's different tho huh
(20:27:42) ME: hrh
(20:27:43) ME: heh
(20:28:13) natedog: whoa, this is *supercool*.
(20:28:20) natedog: i've never heard of this. sounds interesting.
(20:29:09) ME: you've seen mathworld before haven't you?
(20:29:16) ME: long story behind that site
(20:29:44) natedog: yeah, i've seen mathworld before.
(20:29:53) natedog: can find answers there actually...
(20:30:12) ME:
I guess all this depends on whether a factorial of a non-natural number makes sense. For example, what would 2.5! mean? If you can make the leap to accepting factorials of non-natural numbers as a valid concept, then it would follow that should be inverse factorials for numbers that are not factorials of natural numbers. Since 2!=2 and 3!=6, you'd think that 5? would be somewhere between 2 and 3, probably closer to 3. So, if you had a way to calculate 2.5!, you could do successive approximation to work out a value for 5? by starting with 2.5! and narrowing it down from there. No idea how you would go about calculating 2.5!, though.
(20:30:34) natedog: who's this a quote of?
(20:30:42) ME: a comment on that halfbakery thread
(20:30:46) ME: http://www.halfbakery.com/idea/Inverse_20Factorial_20_27_3f_27
(20:30:54) natedog: thing is, that dude should be told about the Gamma function.
(20:30:55) ME: Steve DeGroof, Oct 28 2004
(20:31:09) ME: heh, immediately following comment:
(20:31:14) ME:        [Steve] The gamma function (it's the Greek gamma, but I'll use G) is the generalized factorial. You can input any number, integer or not.

x! = G(x+1)

See link.


ldischler, Oct 28 200

(20:31:21) natedog: yeah.
(20:31:36) ME: it's interesting to me
(20:31:47) ME: that drawing the log and the factorial in tihs way makes one see an analogy
(20:32:37) natedog: thing is, it's making me wonder about the inverse of factorial.
(20:32:43) natedog: it's a really good question...
(20:32:59) natedog: what if you did,
(20:33:06) natedog: root splits in two,
(20:33:18) natedog: then one of those splits in 3 and one splits in 4




natedog: and then one of the resulting guys splits in 5, 6 and so on...
natedog: get me?
ME: hah, right
ME: yeah that's interesting too
ME: where the function goes across *all* nodes
ME: it seems you could define a function in many ways
ME: you could define it by "level"
ME: or you could define it by "node"
natedog: interesting...
natedog: i'm buzzed.
natedog: and getting buzzder.
natedog: i'm "celebrating": i submitted my second article today!
natedog: ...which makes me *done* with submitting articles out of my thesis.
ME: whoohoo!
ME: http://ruphus.com/stash/moreweird.png
ME: is that what you meant?
natedog: now, all i need to do is sit around and wait for the rejections to come trickling in...
natedog: exactly.
natedog: that's it.
ME: hey did i tell you
ME: my little dictionary generation thing seems to work
ME: shows potential, anyway
natedog: how do you mean?
natedog: (no, you didn't)
ME: well, so it works like this
ME: you put in 2 texts
ME: that are translations
natedog: yeah?
ME: (parallel textso r "bitexts", they're called)
natedog: yeah.
ME: and you get out results like this:
ME:
input: english/french debian faq

query: installation

result:

» 0.550603      package des
» 0.553158      package un
» 0.556622      package pour
» 0.564851      package les
» 0.569422      package dans
» 0.612173      package ou
» 0.651722      package de
» 0.666187      package paquets
» 0.760069      package le
» 0.761824      package paquet


ME: elo?
natedog: yeah.
natedog: huh... i don't get it yet.
natedog: i don't read this language.
ME: well
ME: what that output is
ME: is just
ME: like
natedog: (dude, i thought of a great quote, "the words coming out of your mouth don't speak my language") heh.
ME: i build this data structure that measures thie similarity of the distributiosn of words in a bitext
ME: haha
ME: good sheeot
natedog: yeahman. anyhow, go on.
ME: so
ME: accepting for the moment that i have such a measure
20:43
ME: then, that list i just gave you are the best results (they range from 0 to 1) for the word "package"
ME: so
natedog: ok. what does the number in front mean?
ME: the vector of english package has a similarity of 0.761824 to the vector of the french word paquet
natedog: like, how many times it occurs in basically the same place?
ME: here's how it works:
ME: let's make some definitions
ME: we have: a text
natedog: k
ME: a wordlist
ME: the wordlist is the sequence of all the owrds in the text, in order
natedog: k
ME: now, each word has a "distribution"
ME: which are all the index points at which it occurs
ME: so if the text is "Pease porridge hot, pease porridge cold..."
ME: then the distribution of the word "porridge" begins 1, 4 ...
ME: (starting at 0... waht can i say i'm a programmer heh)
ME: okay?
natedog: k
ME: so
natedog: i see, so you measure something like the root-mean-square of the difference of these
natedog: and then minimize that?
ME: so each word in the wordlist has a distribution
ME: well, here's what i do
ME: so
ME: we need to be able to compare hte distributions of words from the source text to words in the target text.
ME: but
ME: the two texts are different lengths
ME: so what i did was the simplest thing i could think of
ME: instead of using the absolute value of the index of the word in the wordlist
ME: i represented it as a percentage
natedog: i see...
ME: which results in a distribution ranging over 0 100
natedog: how far off are the text lengths?
ME: but it can (and this is key) have repeats
ME: oh, depends on the language pair
ME: but for en/pt they were pretty close
ME: french seems about 20% longer on average
natedog: what's pt?
natedog: portuguese?
ME: yeah
natedog: interesting...
ME: so
ME: then you end up with a distribtuion that is a list of integers between 0 and 100, with repeats
natedog: that's really interesting.
ME: and then
ME: you just measure the vector similarity on this normalized distribution of a particular word (the "query") with all the vectors in the other text
natedog: wait. why with repeats? there shouldn't be repeats, should there?
ME: and rank them
ME: oh whoops
ME: i forgot a steop
ME: because
ME: what you're doing is
ME: you actually turn these things into a frequency
ME: like
ME: if the ditribution is
ME: [1, 1, 1, 3, 10, 50, 50 ... ]
ME: then want a distribution that looks like:
natedog: wait: how can a distribution look like [1,1,...]?
natedog: i thought these were positions.
ME: because opf the normalizing
20:53
ME: essentially it's choping up the text into k parts
ME: 100 parts, in fact
ME: so, if the word occurs 3 times in the first part, its raw distribution looks like [1, 1, 1, ... ]
ME: doesn't make sense yet huh
natedog: i see, for the sake of data compression. right?
ME: no because i *want* the repeats
ME: because i want vectors of the same lenght across both languages
ME: because like
ME: it's the "skyline" of the frequencies of occurrences of the word in these 100-cell distributions that are comparable as a vector
ME: argh, i'm not really being clear
natedog: so wait: can you explain to me in simple terms what the distribution
natedog: [1,1,1,3,10,50,50.
ME: actually
natedog: ..] means?
ME: lets' do this
ME: i'll get a short text and we'll look at it
natedog: k
ME: let's pick a song
natedog: okay. can i pick
natedog: ?
ME: sure
ME: as long as it has a transltiont
ME: any language pair you like
natedog: oh, let's do something by gainsbourgh.
natedog: that french dude.
natedog: "annie loves anice" or something like that.
natedog: ...
ME: heh, ok good call
natedog: heh, you know this one?
ME: serge is good stuff
ME: hmm i wonder if we can find a translation
ME: voila
natedog: yeahman.
natedog: jussa sec: bafoom break.
ME: k
natedog: b ackskie.
ME: http://eggparm.com/gainsbourg/monproprerolecontents.html voila
ME: pick one :-D
21:03
natedog: k, i pick the first one.
ME: actually hmm
natedog: where's the english?
ME: we should prolly use a couple catted together
natedog: k, you pick then...
ME: ok
ME: gimme a couple minutes
natedog: k