Lexical Translation with Application to Image Search on the Web
Oren Etzioni, Kobi Reiter, Stephen Soderland, and Marcus Sammer
Turing Center
Dept. of Computer Science and Engineering
University of Washington, Seattle, WA 98195 USA
{etzioni, dbgalur, soderlan, sammer} @cs.washington.edu
Abstract
We introduce a novel approach to the task of lexical translation. We utilize the translation graph, a massive
lexical resource where each node denotes a word in some language and each edge denotes a word sense shared
by a pair of words. Our current graph contains 1,267,460 nodes and 2,315,783 edges. The graph is automatically
constructed from machine-readable dictionaries and Wiktionaries. Paths through the graph suggest word transla-
tions absent from any of the input dictionaries. We define a probabilistic inference procedure that enables us to
quantify our confidence in a translation derived from the graph, and thus trade precision against recall.
We demonstrate the graph’s utility by employing it in the PANIMAGES cross-lingual image search engine.
Google retrieves images based on the words in their “vicinity”, which limits the ability of a searcher to retrieve
them. Although images are universal, an English searcher will fail to find images tagged in Chinese, and con-
versely. PANIMAGES addresses this problem by translating and disambiguating queries, using the translation
graph, before sending them to Google. Our experiments show that, for queries in “minor” languages, PANIM-
AGES increases the number of correct images in the first 15 pages of results by 75%.
1
Introduction
edges denote the same word sense and use this probabil-
ity, coupled with the structure of the graph, to compute the
Lexical translation is the task of translating individual
probability that a path denotes a correct translation.
words or phrases, either on their own (e.g., search-engine
Before we consider lexical translation in more detail, we
queries or meta-data tags) or as part of a knowledge-based
need to ask: is lexical translation of any practical utility?
Machine Translation (MT) system. In contrast with statisti-
While it does not solve the full machine-translation prob-
cal MT, lexical translation does not require aligned corpora
lem, lexical translation is valuable for a number of practical
as input. Because large aligned corpora are non-existent
tasks including the translation of search queries, meta-tags,
for many language pairs, and are very expensive to gener-
and individual words or phrases. For example, Google and
ate, lexical translation is possible for a much broader set of
other companies have fielded WordTranslator tools that al-
languages than statistical MT.
low the reader of a Web page to view the translation of par-
While lexical translation has a long history (cf. (Helm-
ticular word, which is helpful if you are, say, a Japanese
reich et al., 1993; Copestake et al., 1994; Hull and Grefen-
speaker reading an English text and you come across an
stette, 1996)), interest in it peaked in the 1990’s. Yet, as this
unfamiliar word.
paper shows, the proliferation of Machine-Readable Dictio-
In the case of image search, the utility of lexical transla-
naries (MRDs) and the rapid growth of multi-lingual Wik-
tion is even more readily apparent. Google retrieves images
tionaries offers the opportunity to scale lexical translation
based on the words in their “vicinity”, which limits the abil-
to an unprecedented number of languages. Moreover, the
ity of a searcher to retrieve them. Although images are uni-
increasing international adoption of the Web yields oppor-
versal, an English searcher will fail to find images tagged
tunities for new applications of lexical translation systems.
in Chinese, and a Dutch searcher will fail to find images
This paper presents a novel approach to lexical trans-
tagged in English. To address this problem, we built the
lation based on the translation graph.
A node in the
PANIMAGES cross-lingual image search engine.1 PANIM-
graph represents a word in a particular language, and
AGES enables searchers to translate and disambiguate their
an edge denotes a word sense shared between words
queries before sending them to Google. PANIMAGES uti-
in a pair of languages.
Our TRANSGRAPH system
lizes the translation graph; thus it also enables us to evalu-
automatically constructs a graph from a collection of
ate the quality of translations inferred from the graph in the
independently-authored, machine-readable bilingual dic-
context of a practical application.
tionaries and multi-lingual Wiktionaries as described in
The key contributions of the paper are as follows:
Section 2. Figure 1 shows an example translation graph.
When all the edges along a path in the translation graph
• We introduce the translation graph, a unique,
share the same word sense, then the path denotes a correct
automatically-constructed lexical resource, which cur-
translation between its end points. When word senses come
rently consists of over 1.2 million words with over 2.3
from distinct dictionaries, however, we are uncertain about
million edges indicating possible translations.
whether the senses are the same or not. Thus, we define an
inference procedure that computes the probability that two
1cs.washington.edu/research/panimages

• We formalize the problem of lexical translation as
probabilistic inference over the translation graph and
quantify the gain of inference over merely looking up
translations in the source dictionaries.
• We identify a set of challenges in searching the
Web for images, and introduce PANIMAGES, a cross-
lingual image search application that is deployed on
the Web to address these challenges.
• We report on experiments that show how PANIMAGES
substantially increases image precision and recall for
queries in “minor” languages, thereby demonstrating
the utility of PANIMAGES and the translation graph.
The remainder of the paper is organized as follows. Sec-
tion 2 introduces the translation graph. Section 3 describes
PANIMAGES, our cross-lingual image search application.
Section 4 reports statistics on the translation graph and eval-
Figure 1: A fragment of a translation graph for two senses
uates the utility of the graph by reporting on the precision
of the English word ‘spring’. Edges with the label ‘1’ or
and recall of the PANIMAGES application. Section 5 dis-
‘3’ are for spring in the sense of a season; edges labeled
cusses related work, followed by conclusions and future
‘2’ or ‘4’ are for the flexible coil sense. This graph shows
work in Section 6.
translation entries from an English dictionary merged with
translation entries from a French dictionary.
2
The Translation Graph
This paper introduces a novel lexical resource, which we
2.1
Building the Translation Graph
call the translation graph. This section describes how the
TRANSGRAPH builds the translation graph from online
TRANSGRAPH system constructs a graph from multiple
dictionaries and Wiktionaries of two kinds: bilingual dic-
dictionaries, and uses paths in the graph to infer lexical
tionaries that translate words from one language to an-
translations.
other, and multilingual dictionaries that translate words in a
Each node n in the graph is an ordered pair (w, l) where
source language to multiple target languages. Some dictio-
w is a word in a language l. An edge in the graph between
naries provide separate translations for each distinct word
(w1, l1) and (w2, l2) represents the belief that w2 is a trans-
sense, which is particularly helpful for our purposes, but
lation into l2 of a particular sense of the word w1. The edge
others do not.
is labeled by an integer denoting an ID for that word sense.
As TRANSGRAPH adds to the graph from each new en-
Paths through the graph represent correct translations so
try in a dictionary, it assigns a new, unique word sense ID
long as all the edges on the path share a single word sense,
for each word sense in that entry. Thus, edges for trans-
and thus enable TRANSGRAPH to identify translations that
lations of the season ‘spring’ from the English dictionary
are absent from any of its source dictionaries.
have one word sense ID, edges for translations of the flex-
Figure 1 shows a portion of a translation graph for two
ible coil ‘spring’ have a different word sense ID, and so
senses of ‘spring’ in English.
The graph also shows two
forth. When the translation in the entry is not word-sense
corresponding French words ‘printemps’ (spring season)
distinguished, TRANSGRAPH makes the conservative as-
and ‘ressort’ (flexible spring).
sumption that each translation is in a distinct word sense.
TRANSGRAPH builds the translation graph incremen-
Section 2.2 explains how we recover from word sense in-
tally on the basis of entries from multiple, independent dic-
flation caused by this assumption and from integrating mul-
tionaries, as described in detail in Section 2.1. As edges are
tiple dictionaries.
added on the basis of entries from a new dictionary, some
We implement the translation graph as a relational
of the new word sense IDs are redundant because they are
database. Each row in the Translation table represents an
equivalent to word senses already in the graph from another
edge in the graph, while each row in the Word sense equiv-
dictionary. For example, TRANSGRAPH assigns one word
alence table represents the probability, prob(si = sj), that
sense ID to the seasonal sense of ‘spring’ from an English
two word sense IDs si and sj are equivalent.
dictionary, a new word sense ID to the French dictionary
entry for ‘printemps’, and so forth (see labels ‘1’ and ‘3’ in
2.2
Word-Sense Equivalence
Figure 1). We refer to this phenomenon as sense inflation.
As pointed out earlier, accumulating entries from multiple
Sense inflation would severely limit the utility of the
dictionaries results in sense inflation. Below, we explain
translation graph, so we have developed a mechanism for
how TRANSGRAPH addresses this problem by computing
identifying duplicate word senses automatically. TRANS-
word-sense equivalence probabilities of the form prob(si =
GRAPH computes the probability prob(si = sj) that a pair
sj).
of distinct IDs si and sj refer to the same word sense (see
Figures 2 and 3 give a schematic illustration of how
Section 2.1 for the details). Thus, TRANSGRAPH deter-
TRANSGRAPH accumulates entries from multiple dictio-
mines that word sense ID ‘3’ on edges from ‘printemps’
naries.
Figure 2 shows graph edges from an entry for
has a high probability of being equivalent to ID ‘1’.
the word E from an English dictionary that gives transla-
The following section discusses building the graph, fo-
tions into French, German, Hungarian, Polish, and Span-
cusing on the algorithm for merging word senses originat-
ish. TRANSGRAPH assigns the word sense ID 1 for these
ing from different dictionaries.
edges. This figure also shows edges from an entry for word

If |nodes(si) ∩ nodes(sj )| ≥ k, then:
prob(si = sj ) =
|nodes(si) ∩ nodes(sj)| |nodes(si) ∩ nodes(sj)|
max(
,
)
|nodes(si)|
|nodes(sj)|
(1)
where nodes(s) is the set of nodes that have edges labeled
by word sense ID s, and k is a sense intersection threshold.
As an example of computing the probability of word
sense equivalence, our translation graph has 56 translations
for the season sense of ‘spring’ from an English dictionary,
Figure 2: Schematic diagram of edges from an entry for the
and 12 translations for ‘printemps’ from a French dictio-
word E from an English dictionary and edges from an entry
nary. 8 of these translations overlap, giving a probability of
for the word R from a Russian dictionary.
8 = 0.67 that the two senses are equivalent.
12
R from a Russian dictionary, which in this case has transla-
2.3
Computing Translation Probabilities
tions into German, Hungarian, Latvian, and Polish. These
Given the translation graph coupled with the word sense
edges are assigned word sense ID 2.
equivalence probabilities, TRANSGRAPH can compute the
Figure 3 shows the situation after both sets of edges have
probability that a particular word is a translation of another
been added to the translation graph. There are 6 nodes with
word in a given word sense. First, we show how to compute
edges labeled with word sense ID 1, { E, F, G, H, P, S }; 5
the probability of a single translation path. Then, we show
nodes with edges labeled 2, { G, H, L, P, R }; and an inter-
how we combine evidence across multiple paths.
section of these sets comprising 3 nodes, {G, H, P}. The
Consider a single path P that connects node n1 to nk,
three nodes in the intersection have two incident edges with
where ni is the word wi in language li and the ith edge has
distinct sense IDs 1 and 2. The proportion of intersecting
word sense si. Let pathP rob(n1, nk, s, P ) be the probabil-
nodes provides evidence that these IDs refer to the same
ity that (w1, l1) is a correct translation of (wk, lk) in word
word sense.
sense s, given a path P connecting these nodes.
The simple case is where the path is of length 1. If s
is the same sense ID as s1, then the probability is simply
1.0; otherwise it is the probability that the two senses are
equivalent:
pathP rob(n1, n2, s, P ) = prob(s = s1)
(2)
Where the path P has more than one edge, the path prob-
ability is reduced by prob(si = si+1) whenever the word
sense ID changes along the path. We make the simplifying
assumption that sense-equivalence probabilities are mutu-
ally independent. Formally, this gives the term
prob(s
i=1...|P |−1
i = si+1).
If the desired sense s is not found on the path, we also
need to factor in the probability that s is equivalent to at
least one sense si on the path, which we approximate by
the maximum of prob(s = si) over all si. Formally, this
Figure 3: After the entries from Figure 2 have both been
gives the term
added to the graph, the set of nodes with word sense ID
1 overlaps with the set of nodes for word sense ID 2. The
maxi=1...|P |(prob(s = si)),
proportion of overlapping nodes gives evidence that the two
which is equal to 1.0 if s is found on path P.
word senses may be equivalent.
Putting these two terms together, we have the following
formula for simple paths of length greater than one (i.e.,
|P | > 1):
TRANSGRAPH determines the probability that two word
pathP rob(n
sense IDs s
1, nk , s, P ) =
i and sj are equivalent as follows:
max (prob(s = s
prob(s
• A word sense is equivalent to itself: prob(s = s) = 1.
i)) ×
i = si+1)
(3)
i=1...|P |
i=1...|P |−1
• If si and sj are alternate word senses from the same
entry in a sense-distinguished dictionary, then they are
Note that we disallow paths that contain non-consecutive
assumed to be distinct: prob(s
repetition of sense IDs (e.g.1, 2, 1).
i = sj ) = 0.
There are typically multiple paths from one node to an-
• If word senses si and sj have at least k intersecting
other in the translation graph. The simplest way to compute
nodes, then set the probability by equation 1 below.
prob(n1, nk, s) is to take the maximum probability of any
• In all other cases, the probability is undefined.
path between n1 and nk.
TRANSGRAPH estimates the probability that si and sj
are equivalent word senses by the following equation.
prob(n1, nk, s) =
max (pathP rob(n1, nk, s, P ))
(4)
P ∈paths

We also experimented with another method that gives
even English WordNet is not a strong source of evidence
higher probability if there are multiple, distinct paths be-
for non-synonymy. Of the cases where nodes(si) includes
tween words. We define two paths from n1 to nk to be
two English translations that are not WordNet synonyms,
distinct if there is a distinct sequence of unique word sense
they were actually synonymous about half the time. Our
IDs on each path.
preliminary experiments indicate that even crude estima-
We use the standard Noisy-Or model to combine evi-
tion of prob(si) can improve the precision of translation
dence. The basic intuition is that translation is correct un-
graph traversal. The results shown in Section 4 include a
less every one of the translation paths fails to maintain the
early attempt to estimate prob(si).
desired sense s. We multiply the probability of failure for
each path. We then subtract that probability from one to get
2.5
Bilingual Dictionaries
the probability of correct translation. The probability that
The method for computing word-sense equivalence dis-
n1 is a correct translation of nk in word sense s is:
cussed in 2.2 relies on having multiple translations for each
word sense. Unfortunately, we do not always have this lux-
prob(n1, nk, s) = 1 −
(1 − pathP rob(n1, nk, s, P ))
ury. In response, we have identified cliques in the graph as
P ∈distinctP
an additional structure that helps to combat sense inflation.
(5)
Consider, for example, the simple clique shown in Fig-
where distinctP is the set of distinct paths from n1 to nk.
ure 4. The figure shows a 3-node clique where each of
We found that our current implementation of the Noisy-
the edges was derived from a distinct dictionary, and hence
Or model tends to give inflated probability estimates, so
has a distinct word sense ID. The edge from (spring, En-
we use the maximum path probability in experiments re-
glish) to (printemps, French) is labeled ‘1’ and comes from
ported here. Defining distinct paths as those with distinct
an entry for the season of spring from the English Wik-
sense IDs in not sufficient to ensure that paths are based
tionary. The edge ‘2’ from (xuˆan, Vietnamese) to (spring,
on independent evidence. We are exploring better methods
English) is from a Vietnamese-English dictionary that does
for determining independent paths, and more sophisticated
not specify which sense of spring is intended. The edge
probability models to combine evidence.
‘3’ from (xuˆan, Vietnamese) to (printemps, French) is from
a Vietnamese-French dictionary, again without any indica-
tion of word sense.
2.4
Confidence in Dictionary Entries
It has long been known that this kind of triangulation
Our methods for computing translation probabilities have,
gives a high probability that all three words share a com-
thus far, made a strong assumption. We assume that each
mon word sense (Gollins and Sanderson, 2001). We em-
word sense ID comes from a sense-distinguished dictionary
pirically estimated the probability that all three word sense
entry. This means that nodes(si), the set of nodes with
IDs of a 3-node clique are equivalent to be approximately
edges to sense si, are mutual translations of each other in
0.80 in our current translation graph. The TRANSGRAPH
the same sense.
compiler finds all cliques in the graph of size 3 where two
We found that many of the errors in computing
word senses are from bilingual dictionaries. It then adds an
pathP rob(n1, nk, s, P ) are from cases where this assump-
entry to the word sense equivalence table with probability
tion is violated by some word sense ID along the path. If all
0.80 for each pair of sense IDs in the clique. We plan to in-
words in the set nodes(si) do not share the same sense, any
vestigate longer cliques and evidence from other elements
path that passes through sense si may result in translation
of graph structure.
errors.
These “impure” word sense IDs may arise either from
errors in a dictionary or from errors parsing the dictionary.
As an example, the French Wiktionary has an entry for the
word ‘boule’ with English translations as ‘ball’, ‘boule’,
‘bowl’, ‘chunk’, ‘clod’, and ‘lump’. These are all good
translations of ‘boule’, but clearly not all in the same sense.
An example of a parsing error is the truncation of trans-
lation phrases in some dictionary entries, causing bizarre
Figure 4: TRANSGRAPH infers that word senses are equiv-
translations.
alent with high probability when nodes form a 3-node
To compensate for these impure sense IDs, we have be-
clique in the graph. In this example the Vietnamese word
gun experimenting with methods to compute prob(si), the
‘xuˆan’ is translated to English ‘spring’ and French ‘print-
probability that all words in nodes(si) share a common
emps’ in the season sense of spring.
word sense. This adds the term prob(s1) to Equations 2 and
3, and adjusts Equation 3 to include prob(si+1) for each
new sense si+1 along the path.
3
Image Search with PanImages
The a priori probability for prob(si) is set according to
a global confidence in the dictionary. If the dictionary has
We now turn to a discussion of the application of the trans-
a high ratio of word senses per entry, the assumption is that
lation graph to cross-lingual image search.
the dictionary entries distinguish word senses, and the de-
The Web has emerged as a rich source of images that
fault prob(si) is set to 1.0.
serve a wide range of purposes from children adorning their
The existence of multiple, possibly non-synonymous
homework with pictures to anthropologists studying cul-
translations into the same language lowers our confidence
tural nuances. Most people find images on the Web by
that a dictionary entry is pure. While it is possible to find
querying an image search engine such as Google’s.
evidence that two words are synonyms, determining that
Google collects images as part of its crawl of the Web
they are non-synonymous is more difficult. We found that
and tags them with the words that appear in their vicinity

on the crawled HTML documents and links. It is not sur-
prising that most of the tags are in “major” languages such
as English. So while images are universal, most of them
can be found through Google only if you can query in the
“right” language.
More broadly, monolingual image search engines face
the following challenges:
• Limited Resource Languages - The lower the Web
presence of a language, the fewer hits a speaker of that
language gets from a query. A query for ‘grenivka’
(Slovenian for ‘grapefruit’) produces only 24 results,
of which only 9 are images of grapefruits. Yet translat-
ing the query into English produces tens of thousands
of images with high precision.
• Cross-Cultural Images - Results of an image search
Figure 5: Architecture: The PANIMAGES compiler creates
may vary considerably depending on the language of
a translation graph from multiple dictionaries. The query
the query term. Translating the query ‘baby’ or ‘food’
processor takes a user query and presents a set of trans-
into Chinese, Arabic, or Zulu allows an interesting cul-
lations. The user selects the desired translation(s), which
tural comparison.
PANIMAGES sends to Google Image Search.
• Cross-Lingual Masking - A word in one language is
often a homonym for an unrelated word in another lan-
of nodes (wj, lj) where wj is a translation into lj for a
guage. Relevant results can be swamped by results
particular sense of wi. For each word sense, PANIMAGES
for the unrelated word. The Hungarian word for tooth
follows paths of length up to k in which the probability that
happens to be ‘fog’; the only way to get images of
the word sense has not changed according to Equation 4 is
teeth rather than misty weather is to query with a trans-
above a threshold τ . In our experiments we set k to 3 and τ
lation that doesn’t suffer from cross-lingual masking.
to 0.2.
In the example in Figure 1 for the English word ‘spring’,
• Word Sense Ambiguity - Searching for an image that
translations in sense 1 include nodes reachable from sense 1
corresponds to a minor sense of a word is problem-
and nodes reachable from (printemps, French) along edges
atic. Most results for the query ‘spring’ are images of
for sense 3. Beginning from ‘spring’ with sense 3 and con-
flowers and trees in bloom. If a user wants images of
tinuing on paths for sense 1 or 3 produces an identical set
flexible coils or of bubbling fountains, the most effec-
of translations that TRANSGRAPH later merges with trans-
tive queries are translations of this sense of ‘spring’
lations for sense 1.
into languages where that word is not ambiguous.
Presenting Translations to the User:
PANIMAGES, a cross-lingual image-search application
deployed on the Web, enables a monolingual user to se-
PANIMAGES presents these sets of translations and allows
lect from any of 50 input languages, automatically looks
the user to select one or more translation to be sent to
up word-sense specific translations into more than 100 lan-
Google Images. As a practical consideration, PANIMAGES
guages, and lets the user control which translations are sent
defaults to selecting translations in a language with high
to an image search engine. At compile time, P
Web presence: an English translation for all source lan-
ANIMAGES
merges information from multiple Wiktionaries and open-
guages but English, and a French translation for English
source dictionaries into a translation graph as described in
queries. The user may add or remove any of the translation-
Section 2. At run time, P
language pairs to the query before clicking on Show Im-
ANIMAGES accepts a query from
a user, presents the user with possible translations found in
ages. Another option is to click on a single translation to
the translation graph, then sends the translations selected
immediately send that translation as a query to Google’s
by the user to Google’s image search as described below.
image search.
This section describes the implementation and inter-
Handling Word Senses:
face of our PANIMAGES system for cross-lingual im-
PANIMAGES lists each distinct word sense along with a
age search, accessible at www.cs.washington.edu/
gloss if available and the number of translations for this
research/panimages. Figure 5 shows the system ar-
word sense. The user can click on a word sense to see the
chitecture.
list of translations for that sense. PANIMAGES presents the
word sense with the largest number of translations first, and
3.1
Interface Design
selects this as the default word sense.
The PANIMAGES graphical user interface allows a user to
enter a search query in any of n source languages (n = 50
4
Experimental Results
currently). PANIMAGES presents translations of the query
term, presenting multiple sets of translations if the graph
This section presents statistics on our current, automatically
has multiple senses of the term. The user selects one or
constructed translation graph; reports on an evaluation of
more translations, and P
translation inferences over the graph; and reports on recall
ANIMAGES sends this as a query to
Google Images.
and precision results from a sample of image search queries
over this translation graph.
Finding Translations:
PANIMAGES looks up the node (w
4.1
Graph Statistics
i, li) in the translation
graph that corresponds to the query word and language,
The translation graph is composed of 1,267,460 words in
then follows edges in the graph to create one or more sets
more than 100 languages. 3 of the languages have over

100,000 words and 58 of the languages have at least 1,000
The language pair English-Russian is an interesting test
words. The words were extracted from 3 multilingual dic-
of the translation graph, because we had neither a bilingual
tionaries (English and French Wiktionaries, and an Es-
nor a multilingual Russian dictionary. The only edges to
peranto dictionary) and 14 bilingual dictionaries, giving a
Russian words came from one of the multilingual dictio-
total of 2,315,783 direct translations or edges in the graph.
naries. The baseline accuracy of the source dictionaries is
Further translations can be found from graph paths with
91%. Adding inferences from Equation 1 gave an 8% in-
length greater than one edge.
crease in translated words with a drop in precision to 82%.
Building a translation graph from a combination of these
There was a greater gain from combining all translation
dictionaries provides more translations than any of these
paths in the graph – 33% more translated words than the
dictionaries alone. The English Wiktionary had translations
baseline with negligible further drop in precision.
for 19,500 words – after adding the other dictionaries, the
graph has translations for over 255,000 English words and
phrases, the bulk of them from bilingual dictionaries. Sim-
ilarly, coverage of French went from 12,700 words in the
French Wiktionary to 32,800 in the graph.
4.2
Evaluating Inferred Translations
We evaluated the precision and recall gain from inference
using Equations 1 through 4 as follows. We took a ran-
dom set of 1,000 English words and found Hebrew or Rus-
sian translations using the translation graph. We also took
a random set of 1,000 Turkish words and found Russian
translations.2 The set of random words was not weighted
by word frequency, thus they contained many relatively ob-
scure words (e.g., abashment, abjectly, Acrididae, ‘add up’)
Figure 7: Graph traversal increased the number of transla-
for which no translation was found in the target language.
tions from English to Hebrew by 80%, again with a modest
The baseline is the number of words in the source lan-
drop in precision.
guage that can be translated using only direct edges in the
graph. We then added inferred translations that can be made
Like Russian, there are no bilingual dictionaries for He-
from a single application of the word sense equivalence
brew and no Hebrew multilingual dictionary. Inference
equation (Equation 1) with k set to 2 at a probability thresh-
based on Equation 1 boosts translated words by 43% and
old of 0.2. Finally, we found all inferred translations using
using all translation paths gives a gain of 80% over the base-
Equations 1 - 4 and using graph paths from all 17 source
line. The baseline precision drops from 93% to 79%.
dictionaries with path length up to 3 word sense IDs at a
probability threshold of 0.2.
Figures 6 through 8 compare the number of words trans-
lated and the proportion of correct translations. The total
height of each bar represents the number of source lan-
guage words that have at least one translation. We measure
precision as the number of correct translation pairs divided
by the number of translation pairs that the system outputs.
Note that precision is computed over all translations for a
given word, some of which may be correct and others may
be erroneous.
Figure 8: Translation from Turkish to Russian benefited
from interaction between several bilingual dictionaries, re-
sulting in 3.15 times as many translated words as the base-
line.
Translations from Turkish to Russian showed a large gain
from inferences based on bilingual dictionaries. While di-
rect edges came only from the three multilingual dictio-
naries, there were also three bilingual dictionaries between
Turkish and English, German, or Kurdish. In turn, these
dictionaries interacted with other bilingual dictionaries for
Figure 6: A comparison of direct vs. inferred translations
English, German, and Kurdish. Inference from all paths
from English to Russian. Inference from graph traversal
resulted in a three-fold increase in translated words, while
boosted the number of translated words by 33% with a
maintaining high precision (80%).
modest drop in precision.
In summary, we see that inference over the translation
graph yields a tradeoff between translation coverage and
2We chose these language pairs because we were able to find
precision. We can control the tradeoff using the proba-
bilingual speakers who would evaluate the results for us.
bility threshold—lowering the threshold increases coverage

but reduces precision. In the Web image retrieval context,
More significant is the number of correct results found
where precision is already far-from-perfect, the tradeoff
in the first 15 pages (containing 270 images). Here the
seems like a good one, particularly for the numerous “mi-
PANIMAGES translation resulted in a 75% gain over the un-
nor” languages where few images are returned in response
translated query, from an average of 49.6 correct results to
to many queries.3 Finally, we anticipate that as we add dic-
an average of 86.8. Average precision also rose 27% from
tionaries to TRANSGRAPH, and as Wiktionaries grow in
0.25 to 0.32. The main cause of low precision for the mi-
size, both coverage and precision will increase in tandem.
nor language queries was cross-lingual masking. The query
term was a homonym of a completely unrelated word in a
4.3
Image Retrieval Performance
major language.
We also evaluated coverage and precision of PANIMAGES
image search for non-English queries, comparing the re-
5
Related Work
sults of sending the non-English query directly to Google
Image search with the results of sending the default PAN-
There was considerable research in the 1990’s on meth-
IMAGES translation instead. We chose a limited test set of
ods to acquire translation lexicons for knowledge-based
languages and words to limit the manual tagging effort nec-
MT (Neff and McCord, 1990; Helmreich et al., 1993;
essary for the experiment.
Copestake et al., 1994). Many of these systems used MRDs
To generate our test set of words, we selected 10 arbi-
to assist manual creation of lexicons, or used automated ac-
trary concepts that are associated with distinctive images, 6
quisition with post editing. Despite the shift in emphasis
nouns (ant, clown, fig, lake, sky, train), 2 verbs (eat, run),
towards statistical MT, research on knowledge-based MT
and two adjectives (happy, tired). Next, we selected 32
has continued, with its need for lexicon acquisition.
languages with a limited Web presence ranging from Dan-
Translation lexicons are also a vital resource for cross-
ish and Dutch to Telugu and Lithuanian.4 Now, for each
lingual information retrieval (CLIR), a subfield prompted
concept, we chose 1/4 of the languages at random, and
in part by the TREC conferences (Harman, 1996) and a se-
recorded the word for the concept in the language. These
ries of SIGIR CLIR workshops (Gey et al., 2006). Sur-
80 words became our set of non-English queries. We then
veys of CLIR research may be found in (Oard, 1997) and
compared

precision and recall of Google’s image search


(Kishida, 2005). Much of the CLIR research, in contrast to
for these 80 words “as is” with the precision and recall of
PANIMAGES, has focused on a small number of language
Google’s image search for these words translated by PAN-
pairs, much of it building systems that must be adapted to
IMAGES into English.
one language pair at a time.

While early CLIR systems typically relied on bilin-
0.6
gual dictionaries (Hull and Grefenstette, 1996), corpus-
based methods or hybrid methods soon outstripped purely
0.5
dictionary-based systems (Yang et al., 1998).
Methods
0.4
n
that derive word-translations from parallel text include
io
is
0.3
(Gale and Church, 1991; Fung, 1995; Melamed, 1997;
c
r
e

Franz et al., 2001). There are also hybrid systems (Balles-
P 0.2
teros and Croft, 1998; Sharoff et al., 2006) that use corpus-
0.1
based techniques to disambiguate translations provided by
bilingual dictionaries.
0
The main drawback of using bilingual dictionaries, in
0
20
40
60
80
100
Correct results
past work, has been word-sense ambiguity. A single term
in the source language is typically translated into multiple
Untranslated query
terms in the target language, mixing different word senses.
PanImages translation
Combining information from multiple bilingual dictionar-


ies only exacerbated this problem: translating from lan-
Figure 9: Image search results for random words in 32 lan-

guage l
guages with a limited Web presence. The P
1 into l2 and then translating each of the possible l2
ANIMAGES
translations into a third language l
translation into English increases correct results by 75%
3, quickly leads to an ex-
plosion of translations. But this is exactly the problem that
from an average of 49.6 correct results on the first 270 im-
the translation graph inference mechanism solves. This use
ages (the rightmost white diamond in the graph) to 86.8 for
of inference to reduce ambiguity is a key capability of PAN-
PANIMAGES (the rightmost black triangle in the graph). In
IMAGES that makes it more powerful than any collection of
addition, PANIMAGES boosts precision by approximately
dictionaries and Wiktionaries.
27% throughout the graph.
The ability of TRANSGRAPH to automatically determine
distinct senses of a word is similar to the work on “seman-
Translating the queries from minor languages into a ma-
tic mirrors” (Dyvik, 2004). Dyvik clusters translations be-
jor language gives a large boost in recall. The average num-
tween Norwegian and English to find alternate senses of
ber of results as estimated by Google was 33,000 for minor
words in either language.
language queries and 1,856,000 for the queries translated
On the Web, commercial search engines such as Google,
by PANIMAGES, a 57-fold increase. For 10% of the minor
French Yahoo (http://fr.yahoo.com) and German
language queries Google failed to return any images.
Yahoo (http://de.yahoo.com), offer query transla-


3An experiment in Section 4.3 shows that 10 % of the “minor’
tion capability for only a handful of languages. For ex-
language queries in our experiments returned no results whatso-
ample, French and German Yahoo automatically translate
ever!
query terms into any of several major languages using Sys-
4We selected languages that had words for these concepts
tran (http://www.systran.com) and translate the re-
present in our translation graph.
sulting Web pages.

In contrast, PANIMAGES translates between a large num-
References
ber of languages, and infers word-sense preserving trans-
(Ballesteros and Croft, 1998) Lisa
Ballesteros
and
lations that are not found in any single dictionary. PAN-
W. Bruce Croft.
Resolving ambiguity for cross-
IMAGES’s translation graph is also a platform for plugging
language retrieval. In ACM SIGIR, 1998.
in more and more dictionaries, increasingly comprehensive
Wiktionaries, and corpus-based translations, all of which
(Copestake et al., 1994) A.
Copestake,
T.
Briscoe,
will lead directly to improved cross-lingual image search
P. Vossen, A. Ageno, I. Castellon, F. Ribas, G. Rigau,
over time.
H. Rodriquez, and A. Samiotou. Acquisition of lexical
translation relations from MRDs. Machine Translation,
3(3–4):183–219, 1994.
6
Conclusions and Future Work
(Dyvik, 2004) H. Dyvik. Translation as semantic mirrors:
The recent proliferation of bilingual MRDs and multi-
from parallel corpus to WordNet. Language and Com-
lingual Wiktionaries is a valuable resource for acquisition
puters, 49(1):311–326, 2004.
of translation lexicons. Yet, difficulties arise in making use
(Franz et al., 2001) M. Franz, S. McCarly, and W. Zhu.
of these lexical resources. Most bilingual dictionaries do
English-Chinese information retrieval at IBM. In TREC
not distinguish between word senses, giving instead a list
2001, 2001.
of translations of all senses of a source word. Many MRDs
(Fung, 1995) P. Fung. A pattern matching method for find-
have only spotty coverage.
ing noun and proper noun translations from noisy paral-
To address these issues, we introduced the translation
lel corpora. In ACL-1995, 1995.
graph, which combines multiple independently-authored
(Gale and Church, 1991) W. Gale and K.W. Church.
A
MRDs and Wiktionaries. Our TRANSGRAPH system has
Program for Aligning Sentences in Bilingual Corpora.
built a graph with over 1.2 million words in more than 100
In ACL-1991, 1991.
languages and over 2.3 million edges that represent trans-
lations between words. TRANSGRAPH provides a proba-
(Gey et al., 2006) F.C. Gey, N. Kando, C-Y. Lin, and
bilistic inferencing mechanism over the graph that can infer
C. Peters. New directions in multilingual information
translation pairs that are not found in any of the source dic-
access: Introduction to the workshop at SIGIR 2006. In
tionaries. Our experiments found that using the graph to in-
Workshop on New Directions in Multilingual Informa-
fer translations not found in any single dictionary can more
tion Access at SIGIR 2006, 2006.
than triple the number of translated words, while maintain-
(Gollins and Sanderson, 2001) T. Gollins and M. Sander-
ing high precision (0.80).
son. Improving cross language retrieval with triangu-
This paper also introduces PANIMAGES, a fully-
lated translation. In SIGIR, 2001.
implemented cross-lingual image search system for the
(Harman, 1996) D. Harman. Overview of the Fourth Text
Web based on the translation graph.
Our experiments
Retrieval Conference (TREC-4). In TREC-4, 1996.
show that, for queries in languages with a limited Web
footprint, PANIMAGES increases the total number of re-
(Helmreich et al., 1993) S. Helmreich, L. Guthrie, and
sults 57-fold (from 33,000 to 1,856,000). PANIMAGES in-
Y. Wilks. The use of machine readable dictionaries in the
creases the number of correct images by 75% on the first 15
Pangloss project. In AAAI Spring Symposium on Build-
pages (containing 270 images), while increasing precision
ing Lexicons for Machine Translation, 1993.
by 27%.
(Hull and Grefenstette, 1996) D.A. Hull and G. Grefen-
Our future work includes expanding the coverage of the
stette. Querying across languages: a dictionary-based
translation graph by increasing the number of source dic-
approach to multilingual information retrieval. In ACM
tionaries, and re-parsing Wiktionaries that have grown in
SIGIR 1996, pages 49–57, 1996.
coverage. We have recently added 10 more Wiktionaries
(Kishida, 2005) K. Kishida.
Technical issues of cross-
and are in the process of adding 20 more bilingual dictio-
language information retrieval: a review. Information
naries to the translation graph.
Processing and Management, 41:433–455, 2005.
We are also exploring ways to improve translation pre-
(Melamed, 1997) I.D. Melamed. A Word-to-Word Model
cision with better estimates of prob(si) and prob(si = sj)
of Translational Equivalence. In ACL-1997 and EACL-
and exploring probabilistic models of how to combine ev-
1997, pages 490–497, 1997.
idence from multiple graph paths. Informal evaluation on
our latest translation graph shows higher precision than the
(Neff and McCord, 1990) M. Neff and M. McCord. Ac-
results presented here.
quiring lexical data from machine-readable dictionary
In future work, we plan to apply the translation graph
resources for machine translation. In 3rd Intl Confer-
to tasks other than image search, including the translation
ence on Theoretical and Methodological Issues in Ma-
of tags in social tagging systems such as del.icio.us and in
chine Translation of Natural Language, 1990.
on-line games such as von Ahn’s “ESP game”
(Oard, 1997) D. Oard.
Cross-language text retrieval re-
search in the USA. In 3rd DELOS Workshop, 1997.
Acknowledgments
(Sharoff et al., 2006) S. Sharoff, B. Babych, and T. Hart-
ley. Using comparable corpora to solve problems diffi-
This research was supported by a gift from the Utilika
cult for human translators. In ACL/HLT, 2006.
Foundation to the University of Washington’s Turing Cen-
(Yang et al., 1998) Yiming Yang, Jaime G. Carbonell,
ter. We thank Ethan Phelps-Goodman, Doug Downey, and
Ralf D. Brown, and Robert E. Frederking. Translingual
Jonathan Pool for helpful comments and thank Jonathan
information retrieval: Learning from bilingual corpora.
Pool and Julia Schwarz for help with evaluating translation
Artificial Intelligence, 103(1-2):323–345, 1998.
accuracy.