GALE Program: Arabic to English Translation Guidelines
Version 2.7
September 7, 2010
Linguistic Data Consortium
http://www.ldc.upenn.edu/Projects/GALE
NOTE: If for any reason translators are uncomfortable working with any particular
document included in their assignment, please contact LDC at
translations@ldc.upenn.edu to request a replacement.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
2
1 Introduction .............................................................................................................. 3
2 Translation Teams ................................................................................................... 3
3 File Formats ............................................................................................................. 4
3.1 Source file......................................................................................................... 4
3.2 Completed translation file ................................................................................. 5
3.3 File Naming Conventions.................................................................................. 5
4 Delivery of Completed Translations ......................................................................... 6
4.1 Email Correspondence about Translation......................................................... 6
5 Specific Rules for Translation .................................................................................. 6
5.1 General Principles............................................................................................. 6
5.2 Proper Names................................................................................................... 7
5.3 Numbers ........................................................................................................... 8
5.4 Capitalization .................................................................................................... 9
5.5 Punctuation ....................................................................................................... 9
5.6 Idioms ............................................................................................................. 10
5.7 English or other language content .................................................................. 10
5.8 Factual Errors in Source Text ......................................................................... 11
5.9 Typographical errors ....................................................................................... 11
5.10 Difficult to Translate Source Text .................................................................... 11
5.11 Headlines and Titles ....................................................................................... 12
5.12 Special Issues for Translation of Speech Sources.......................................... 12
5.12.1 Disfluent Speech...................................................................................... 12
5.12.2 Speaker Noise ......................................................................................... 13
5.12.3 Filled Pauses ........................................................................................... 13
5.12.4 Translation of ๎€ุฃ, ุฃู… ,๎€†๎€‡ุฅ ................................................................................ 13
5.12.5 Repetition and Restarts ........................................................................... 14
5.12.6 Partial Words ........................................................................................... 14
5.12.7 Semi-intelligible and Unintelligible Speech .............................................. 15
5.12.8 Program Names....................................................................................... 15
5.12.9 Transcription Mark-ups ............................................................................ 16
5.13 Special Issues for Translation of Weblogs and Newsgroups .......................... 16
5.13.1 Emoticons (Emotion Icons) ...................................................................... 16
5.13.2 URLs........................................................................................................ 16
5.13.3 Punctuation.............................................................................................. 16
6 Quality Control at LDC ........................................................................................... 16
7 Guidelines .............................................................................................................. 19
Appendix: Transcription and translation mark-ups ........................................................ 20
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
3
1 Introduction
These guidelines provide an overview of requirements for translation of Arabic text into
English to support the GALE Program evaluation of machine translation technology.
Genres covered by these guidelines include the following:
โ€ข
Broadcast News (BN) transcripts consisting of "talking head"-style news
broadcasts from radio and/or television networks.
โ€ข
Broadcast Conversation (BC) transcripts consisting of talk shows plus roundtable
discussions and other interactive-style broadcasts from radio and/or television
networks.
โ€ข
Newswire (NW) consisting of newswire feeds.
โ€ข
Web Newsgroups (NG) consisting of posts to electronic bulletin boards, Usenet
newsgroups, discussion groups and similar forums
โ€ข
Weblogs (WL) consisting of posts to informal web-based journals of varying
topical content
This document describes the format of the source text and its translation, and
addresses specific issues when translating text from different genres.
2 Translation Teams
Initial translation is performed by translation services employing professional translators.
Services must assign files to be translated to a team consisting of at least two
members: an Arabic dominant bilingual and an English dominant bilingual. One team
member does the initial translation while the other one proofreads the translation. Itโ€™s up
to the translation agencies to decide who does the initial translation and who does the
proofreading. The team makeup may not change during translation of a particular data
set.
A translation service may have multiple teams working simultaneously. Proofreaders
may be shared among teams, unless LDC provides instructions to the contrary for a
particular project.
Translation teams may use an automatic machine translation system and/or a
translation memory system to assist them during translation.
Translation teams must be documented as follows:
โ€ข Translator and proofreader profiles consisting of name or pseudonym, native
language, second languages, age and years of translation experience. When
multiple translation teams are used, also indicate team membership for each
person.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
4
โ€ข Work assignment information consisting of the team number or the name of the
translator and proofreader for each file in the data set.
โ€ข The name and version number of any translation system or translation memory
used.
โ€ข A description of any additional quality control procedures or other relevant
parameters or factors that affect the translation.
This documentation should be submitted to LDC along with the completed translations.
3 File Formats
The source text for translation comes in many different data formats, and may include
such metadata as speaker labels, timestamps, section and turn boundaries or other
information. LDC converts all source text into a standard translation format before
sending data out to translators in order to 1) make the source files easy for translators
to read; 2) to avoid translatorโ€™s tampering with the metadata; and 3) to aid automatic
processing of the data after the translation is returned to LDC.
3.1 Source file
Each source file delivered from LDC is divided into segments that roughly correspond to
sentences or sentence-like units. Each Arabic segment in the source file consists of 3
components.
โ€ข <ar=##> is a unique identifier for the Arabic segment;
โ€ข [Speaker id] is a speaker ID label that is added for segments taken from
speech sources like broadcast news and talk shows. Not all segments and
files have corresponding speaker IDs. Data drawn from non-speech genres
(like newswire and web text) will not have speaker IDs.
โ€ข Arabic text to be translated.
Each corresponding English segment in the source file consists of one component:
โ€ข <en=##> is a unique identifier indicating where the English translation should
be added for a given Arabic segment.
The ## for a given English segment is identical to the corresponding Arabic segment to
be translated.
A sample source file appears below:
<ar=9> [speaker1]
ุจ ุซุฏ๎€Œ๎€๎€Žุง ุง๎€ู‡ ๎€’๎€“๎€” ๎€•๎€–๎€— ู†๎€™๎€š๎€–๎€›๎€œ ๎€๎€žุข ๎€ ๎€ก๎€›๎€‡ ๎€•๎€ข๎€™๎€” ๎€ุฃ ูˆ๎€ค๎€ฅ๎€— ุฑ๎€™๎€งุขุฏ-ู†๎€Œ๎€ก๎€ฉ๎€Ž ๎€ุฃ ุŸ
<en=9>
<ar=10> [Amr_Moussa]
ูŠ๎€ค๎€‡๎€ค๎€๎€Žุง ๎€ฌ๎€ž๎€ญุฑ ๎€ฎ๎€ž๎€ฏ๎€Žุง ๎€•๎€›๎€ก๎€‡ ๎€ ๎€ฅ๎€ขุฑ ู†๎€Œ๎€ž๎€ฐ ุชุฑ๎€ฎ๎€ฒุฃูˆ ๎€ณ๎€š๎€–๎€— ๎€Œ๎€ดุฃ ๎€ุฃ.
<en=10>
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
5
3.2 Completed translation file
The completed translation should be formatted exactly the same as the source file; the
only difference between the source file and the completed translation file is that an
English translation will be added after each English segment identifier.
Translators should type the English translation after each โ€œ<en=##>โ€ tag without altering
any other part of the file. Altering anything in the Arabic segment, or adding
unnecessary line breaks, carriage returns or other stray marks in the files makes it
difficult for LDC to automatically post-process the translation files. Similarly, in cases
where a single Arabic sentence is translated into multiple English sentences, no blank
lines should be inserted between the English sentences.
Speaker IDs in square brackets, e.g. [speaker1], are provided to facilitate clear
understanding of conversational speech. They should not be translated or copied over
into the translation. Occasionally speaker names do not appear inside square brackets
but instead appear at the start of the source text. In such cases, the names should be
translated as usual, using the standard guidelines for translating proper names (see
Section 5.2).
To summarize: the content inside square brackets should not be translated or copied
over into the translation. Other content should be translated
English translations should be rendered in plain ASCII text using UTF-8 encoding.
A sample of a completed translation in the correct format follows:
<ar=10> [Amr_Moussa] .
ูŠ๎€ค๎€‡๎€ค๎€๎€Žุง ๎€ฌ๎€ž๎€ญุฑ ๎€ฎ๎€ž๎€ฏ๎€Žุง ๎€•๎€›๎€ก๎€‡ ๎€ ๎€ฅ๎€ขุฑ ู†๎€Œ๎€ž๎€ฐ ุชุฑ๎€ฎ๎€ฒุฃูˆ ๎€ณ๎€š๎€–๎€— ๎€Œ๎€ดุฃ ๎€ุฃ
<en=10> Uh, I have commented and issued an official statement about the death of Mr. Rafik
Hariri.
<ar=11> [Amr_Moussa]
๎€ถุงูˆ ู†๎€ท๎€๎€Žุง ๎€ธ๎€‡๎€Œ๎€น ๎€ ๎€ญ ๎€Œ๎€ž๎€บ๎€ป๎€ผ ๎€Œ๎€ดุฃ ๎€’๎€ฝ๎€ค๎€Žุง ุง๎€ู‡ ู†ุง๎€ฎ๎€š๎€ญ ๎€•๎€–๎€— ๎€๎€ข ๎€พ๎€ž๎€ฟ๎€›๎€Žุง๎€ ๎€–๎€Žุง ๎€Œ๎€๎€ฝูˆ ๎€ค๎€ฉ๎€ง๎€›๎€‡ ู†๎€Œุข
๎€Œ๎€ž๎€ฐ๎€ค๎€—๎€Œ๎€ž๎€ด๎€Œ๎€ก๎€ฉ๎€Ž ๎๎€š๎€ญ ๎‚๎€ž๎€Žูˆ .
<en=11> I am personally very sad and sorry about the loss of this great man, who was regarded
as a prominent Arab personality and not only a Lebanese one.
3.3 File Naming Conventions
It is very important that completed translation files use the exact same name as the
initial file provided by LDC. Please do not add anything to the file name (like ENG or
the agency name) or change anything in the file name, including the file extension.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
6
4 Delivery of Completed Translations
Completed translation files should be submitted electronically, as zipped email
attachments, via FTP or by web upload as specified by LDC. Paper transmission is not
acceptable. All completed translation files must be in plain text, not in some proprietary
software format (like Microsoft Word).
Translations must be delivered on time according to the schedule negotiated with LDC
at the start of the project. Late submissions may result in payment penalties. If it looks
like a translation delivery may need to be later than the agreed-upon submission date,
please contact LDC as soon as possible to discuss the situation. In some cases, a
partial delivery may be required before the final delivery date.
4.1 Email Correspondence about Translation
When sending correspondence about a translation project, please include the following
information in the subject header:
โ€ข Agency Name
โ€ข Translation Package Name and Part Number (where applicable)
โ€ข A descriptive phrase like "Translation Delivery", "Invoice", "Question" etc.
For instance, the following subject headers are helpful:
ABCAgency GALETrain01.Arabic Part 1 - Translation Delivery
AgencyX GALEPhase2Eval.Arabic Part 2 โ€“ Invoice
FGH MT07.Arabic โ€“ Schedule Question
Using a descriptive header helps LDC direct your message or delivery to the right place,
ensuring timely response.
5 Specific Rules for Translation
5.1 General Principles
The goal of the GALE Translation process is to take Arabic source text drawn from
many different genres, both spoken and written, and translates it into fluent English
while preserving all of the meaning present in the original Arabic text. Translation
agencies will use their own best practices to produce high quality translations. While we
trust that each agency has its own mechanism of quality control, we provide the
following specific guidelines so that all translations are guided by some common
principles.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
7
โ€ข The English translation must be faithful to the original Arabic text in terms of
both meaning and style. If the Arabic source text is a news story, the
translation should also be journalistic. If the Arabic source text is transcript of
a talk show, the translation should be conversational. The translation should
mirror the original meaning as much as possible while preserving
grammaticality, fluency, and naturalness.
โ€ข Try to maintain the same speaking style or register as the source. For
example, if the source is polite, the translation should maintain the same level
of politeness. If the source is rude, excited or angry, the translation should
convey the same tone.
โ€ข In the case of speech sources like broadcast news and talk shows, the source
text is an unedited transcription of spoken conversations. In some cases this
means the transcript is hard to read, and may make more sense if you read it
aloud. You will see that the source text sometimes reflects the kinds of
"mistakesโ€ people make when they're speaking aloud, like hesitation sounds
(um, uh), restarted sentences and partial words. Your translations should
retain the flavor of this โ€œspontaneous speechโ€ style, which will be quite
different from what you produce when you translate prose.
โ€ข The translation should contain the exact meaning conveyed in the source
text, and should neither add nor delete information. For instance, if the
original text uses Bush to refer to the current US President, the translation
should not be rendered as President Bush, George W. Bush, etc. No
bracketed words, phrases or other annotation should be added to the
translation as an explanation or aid to understanding.
โ€ข The translation should also respect the cultural assumptions of the original.
For example, if the Arabic text uses the phrase Comrade Jalal Talabani, the
translation should not be rendered as Mr. Jalal Talabani โ€“ instead, it should
keep the term used in the original source.
โ€ข All files should be spell checked and reviewed for typographical or formatting
errors before submission.
5.2 Proper Names
Proper names should be translated using conventional practices.
Whenever an Arabic proper name has an existing conventional translation in English,
that translation should be used. For example, Gamal Abdel Nasser ๎€ค๎€ฒ๎€Œ๎€ก๎€Žุง ๎€ฎ๎€ฉ๎€— ู„๎€Œ๎€ฅ๎€ฝ( ), the
late former president of Egypt, should be translated as Gamal Abdel Nasser, not Jamal
Abdel Nasser as Modern Arabic would have suggested.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
8
The order of last name, first name presentation for the name in the source file should be
preserved. For instance, if the source file reads Osama Bin Laden, this should not be
changed to Bin Laden Osama in the translation.
Recall that Speaker IDs that appear inside square brackets are provided as an aid to
comprehension and should not be translated or copied over into the translation.
In some cases there may be mistakes in the name used within the speaker ID. If this
happens, translators must be careful to use the proper name within the translation file.
They should not repeat the mistake shown in the transcript. For instance:
๎…๎€ฐ ู‹ ๎€Œ๎€ฉ๎‡๎€ค๎€” ๎€ ๎€š๎ˆู๎€— ๎€ฎ๎€ฝ๎€Œ๎€” ุฑ๎€™๎€ดุฃ ุฑ๎€™๎€งุขุฏ.
<ar=34> [host]
<en=34> Welcome Dr. Anwar Majid Ishqi
.
๎…๎€ฐ ู‹ ๎Šู‡ุฃ .
<ar=35> [Anwar_Majed_Ishqi
]
Non-Arabic proper names should be translated as they would be translated into English
directly from the original language. In case of an original English name appearing in the
Arabic text, the normal English form should be used.
Lacking preexisting knowledge of how to translate a proper name, the translator should
consult a standard resource or do a quick web search. This is especially important for
names that re-occur, for instance names that are part of a news story or names of
political leaders. For names of "regular people" that occur only once and are highly
unlikely to be found on the web (for instance, the names of the author of a newsgroup
post), translators should use their best judgment about how to translate the name,
following standard conventions. Lacking information to the contrary, translators should
simply proceed as if the name was an Arabic name.
For specific proper names such as names of agencies, programs, conferences, books,
films, and other media, translators should follow the generally accepted or most
commonly used form. If no common form exists or if there are multiple forms in
frequent use, translators should provide the translation that follows linguistic rules
instead of a translation that is word-for-word but incorrect or awkward.
Names should be translated consistently within and across files.
5.3 Numbers
As a general rule of thumb, numbers in the translation should appear either spelled out
in full, or written as ASCII numbers, according to how they appear in the source text.
However, there are some general conventions for writing numbers in American English
that should be followed.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
9
In American English, commas are generally used for numbers with more than three
digits unless they are the names of years.
ู„๎€™๎€–๎€‡ุฃ ๎€ธ๎€›๎€ฏ๎€œ ุŒุท๎€Œ๎€ฉ๎€ผ ๎€ธ๎€ž๎€ด๎€Œ๎€ฅ๎ ๎€ุฅ ุŒู‰๎€ค๎€ฉ๎๎€Žุง ๎€ธ๎€ž๎€ฐ๎€ค๎€›๎€Žุง ุฉุฑ๎€™๎€“๎€Žุง ุฑ๎€Œ๎€›๎ˆ๎€ฐ ๎…๎‘๎€ฏ๎€ฅ๎€ง๎€‡ ู„ุง๎€ท๎€‡ ๎’ ๎€’๎€ฝ๎€ค๎€Žุง.
The man is still holding on to the slogan of the Great Arab Revolution,
uh the Eighth of February, the Ninth of September.
๎€ธ๎€ก๎€ข ๎€ธ๎€ž๎€ฅ๎€›๎€ฐุฑูˆ ๎€๎€Ž๎€ถ ๎€’๎€บ๎€œ ๎“๎๎€ฅ๎€‡ ๎€ธ๎€–๎€‡๎€™๎” ุฉ๎€ค๎€ง๎•๎€Ž ุŒ๎€ุฃ ๎–๎€‡ุฑ๎€Œ๎€ง๎€Žุง ๎—๎€ก๎€บ๎€ด ๎€พ๎€Ž ๎“๎€๎€ด.
We have not made history, uh, for a long period of time that goes back
1,400 years.
Use a combination of numerals and words for very large numbers.
๎€พ๎€Ž๎€Œ๎€›๎€Žุง ุก๎€Œ๎€๎€ดุฃ ๎—๎€ž๎€ฅ๎€ฝ ๎“๎€” ุŒุฑ๎€Œ๎•๎€ฒุฃ ๎€ธ๎๎Š๎ ๎€ฎ๎‡ุงูˆ ๎“๎€‡๎€ค๎•๎€ฒ ๎€ธ๎€›๎€ฏ๎€œ ๎€ค๎™๎š ๎€ธ๎€Žูˆุฏ ๎€’๎›ุงุฏ ๎“๎€” ๎€๎€œ๎€Œ๎€๎€Žุง ๎€พ๎šุฑ ๎€•๎€–๎€— ๎€ุฅ ๎€ณ๎€‡๎€™๎€บ๎€ง๎€Žุง ๎€พ๎๎€ด๎€Œ๎๎€”๎œ๎€ฐ
๎“๎€‡๎€ค๎•๎€ฒ ๎€ธ๎€›๎€ฏ๎€œ ๎€ฎ๎‡ุงูˆ ๎“๎€‡๎€ค๎•๎€ฒ ๎€ธ๎€›๎€ฏ๎€œ ๎€ธ๎€›๎€ฐุฑุฃ ๎€ธ๎€›๎€ฉ๎€ข ๎€ธ๎€›๎€ฏ๎€œ ๎“๎€‡๎€ค๎•๎€ฒ .
You can vote uh from inside the country of Qatar at phone number
9001000, and from all parts of the world at 009749001900.
๎€ค๎€›๎€ข ๎€ธ๎๎€Œ๎€ฅ๎๎Š๎ูˆ ู†๎€™๎€ž๎€–๎€” ๎€ณ๎€ž๎€ฉ๎€Žุง ุง๎€ู‡ ุฑ๎’ูˆุฏ ๎€๎€Žุฃ ู†๎€™๎€›๎€ฐุฑุฃูˆ ๎€ธ๎€ฏ๎€ฅ๎›ูˆ.
This house costs 1,345,000 dollars.
5.4 Capitalization
Translators should follow standard written English rules for capitalization unless there is
strong evidence in the source text that suggests a different treatment. Proper names
should be capitalized, including personal names and names of organizations and geo-
political entities. The first word of each sentence should also be capitalized.
5.5 Punctuation
Written standards for punctuation vary across languages. As a general rule of thumb,
punctuation in the translation should match the flavor of the punctuation in the source
data, while following Standard English punctuation conventions. Punctuation in the
source text primarily serves to enhance readability, so translators should not spend too
much time worrying about the exact placement of commas and internal punctuation in
the English translation.
Different genres will vary widely in their use of punctuation, and the translation of each
genre should respect the flavor of the source text when it comes to punctuation.
Newswire files typically use standard written punctuation conventions, and these should
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
10
be preserved in the translation. Broadcast transcripts typically have reasonably
standard punctuation, which should be preserved in the translation. When broadcast
transcripts are missing punctuation, it is due to a transcription error, so translators
should add punctuation in the translation following Standard English punctuation
conventions. Web text (newsgroups and weblogs) is the most challenging genre in
terms of punctuation. For web genres, translators should not standardize punctuation
in the translation when it is missing from the source text. Non-standard punctuation
(dashes, ellipses, missing end-of-sentence punctuation and the like) should also be
preserved in the translation for web text.
Often in transcripts of conversational Arabic, speakers tend to change the subject and
restart a sentence in the middle of an unfinished one. When this occurs in the source
text, the translation should mimic the source textโ€™s punctuation in interpreting the type of
pause, restart, or change of subject occurring in speech (see Section 5.12).
5.6 Idioms
Idioms, colloquial expressions and the like are particularly difficult to translate. If a
similar expression exists in English, you should use it. When there is no direct
translation in English, you should preserve the meaning of the Arabic expression but
render it in fluent English rather than providing a literal word-for-word translation.
ุจ๎€Œ๎ž๎€Ž ุจ๎€Œ๎€ฝ ุดุฅ
Thereโ€™s no comparison.
Literal but incorrect Translation: what did he brings to bring
๎€ธ๎‡ุงุฑ ๎…๎€ฉ๎€›๎€œ
No problem. (A courteous way to say)
Literal but incorrect Translation: To get tired for you is a comfort
5.7 English or other language content
Occasionally English or another language may appear in the source text. This happens
often in newsgroups when internet users post messages in English. It also happens in
broadcast news or broadcast conversation when a speaker speaks in English.
English sentences in source text should be copied over to the English translation
exactly as they appear in the source text. Do not make any changes or corrections to
the English, even if the English contains grammatical or other errors.
He have a good handwriting.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
11
5.8 Factual Errors in Source Text
Factual errors in the source text should be translated as is. They should not be
corrected. This also applies to grammatical errors or other speaker "mistakes" in the
source text.
๎“๎€ž๎€œ๎€™๎€ฐ ๎€ ๎๎€‡๎€ค๎€”๎€ถุง ๎‚๎€ž๎๎€ค๎€Žุง ู…๎€™๎€ž๎€Žุง ๎€™๎๎€ข๎€™๎€” ุฑุงุฒ .
American President Putin visited Moscow today.
๎€ธ๎€ž๎€ด๎€Œ๎€ฅ๎ูˆ ๎“๎€ž๎•๎€Žุฃ ู…๎€Œ๎€— ๎€ ๎€ญ ๎€ธ๎€ž๎€ฉ๎€ฅ๎€Žูˆ๎€ถุง ุจ๎€Œ๎€›๎€Ž๎€ถุง ู„๎€™๎€ข ๎€๎€ž๎ก๎€ง๎€ฏ๎€ง๎€ข.
Seoul will host 2008 Olympics.
5.9 Typographical errors
Translators will occasionally notice obvious typographical errors or obvious incorrect
use of homophones in the source text. In such cases, translators should translate the
intended meaning but should add the flag = before the translated word to indicate that it
is a correction of a typo. For example,
๎€๎€‚๎€ƒ๎€„๎€…๎€†ุง ๎€‰๎€Šุง ๎€‹๎€‚๎€Œ๎€Šุง ๎€๎€๎€Ž๎€๎€๎€‘ ๎€‰๎€’๎€Šุง ๎€“๎€๎€”๎€๎€Šุง
the =aid that China provides to Africa
Be careful to distinguish obvious typographical errors, which should be corrected, from
factual errors in the source text, which should not be corrected. If it is not clear whether
the item is a typographical error or a factual mistake, translators should not correct the
item.
5.10 Difficult to Translate Source Text
In rare cases the source text may be so difficult to understand that translation is very
difficult. In such cases, translators should make their best guess about the appropriate
translation, but should surround the translated text with [ ] to indicate that this is a
guess based on confusing source text, for instance:
๎€†๎”๎€™๎€š๎€ข ๎€’๎€ฉ๎š ุฉุฑูˆ๎€ค๎ก๎€Žุง ๎€ฎ๎๎€Œ๎€š๎€Žุง ุฒ๎€Œ๎€ญ ๎€Œ๎€ฅุข ุชุง๎€™๎€ฒ๎€ถุง ๎“๎€” ๎€ธ๎๎€Œ๎€ฅ๎€Ž๎€Œ๎€ฐ ๎€ธ๎๎€Œ๎€ฅ๎€ฐ ุฒ๎€™๎•๎€‡ ู๎€ฎ๎‡ุงูˆ ู๎ฃ๎€ผ๎€ค๎€” ๎€•๎€–๎€— ูŠ๎€ค๎ž๎€œ ๎’ ๎€ธ๎€ž๎€ฐ๎€ค๎€›๎€Žุง ุช๎€Œ๎€ฐ๎€Œ๎€ป๎€ง๎€ก๎€ด๎คุง
Arab elections are not held with one candidate who gets a hundred per
cent of the vote, just as the [bloody] leader did before his fall.
๎€ูˆ๎€๎›ุฃ ๎€๎€™๎€š๎€Ž ๎€Œ๎€”ูˆ ุŒ๎€ณ๎€ž๎€ฉ๎€Žุง ๎“๎€” ุฉ๎€ค๎€š๎€ฉ๎€Žุง ุง๎€™๎€–๎€ผูˆ ๎€Œ๎€ด๎€™๎€ญ๎€ฎ๎€š๎€ผ.
[They expelled us] and [took] the cow from home. They took everything
they found.
As always translators should use available resources including the internet to find the
most appropriate translation for unfamiliar terms or phrases.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
12
5.11 Headlines and Titles
Newswire sources frequently contain a headline or title as the first sentence of the
document. Headlines are typically written in English a little differently than other kinds
of news text, and so headline translations should observe these special guidelines.
Content words in the English translation of headlines/titles should be capitalized.
Function words like the, and, of, is should not be capitalized. For example:
๎“๎€ž๎€ฏ๎‡ ู…ุง๎€ฎ๎€ฒ ุฑ๎€™๎€บ๎€‡ ๎€ ๎€ก๎€ญ ๎€’๎€ฅ๎€— ุถ๎€ค๎€— ๎—๎€ก๎€ฅ๎€‡ ๎€ ๎๎€ž๎ž๎€–๎€ฐ ๎€ธ๎€‡๎€ฎ๎€–๎€ฐ ๎‚๎€ž๎ุฑ
Belgian Mayor Bans Display of Artwork Depicting Saddam Hussein
When formatting titles, translators should follow style of the original source text: if two
titles or headings exist, the translation should reflect this; if no titles exist in the source,
the translation should not have titles either.
English headlines follow some special stylistic guidelines, summarized as follows:
โ€ข State or imply a complete sentence in the present tense.
โ€ข Avoid using passive voice.
โ€ข Limit use of punctuation within headlines. End-of-sentence punctuation is not
required.
โ€ข Omit most "helping" and "to be" verbs
Road Improvements Planned for Belvedere Avenue Southwest
instead of
Road Improvements are Planned for Belvedere Avenue Southwest
โ€ข Cut articles (a, an, the)
School District Schedules Open House on Proposed Curriculum Changes
instead of
School District has Scheduled an Open House on the Proposed Curriculum
Changes
โ€ข Use infinitive instead of future tense
City Council to Consider Budget Recommendation
instead of
The City Council will Consider the Budget Recommendation
5.12 Special Issues for Translation of Speech Sources
This section addresses issues related to translation of transcripts of speech data, such
as broadcast news and broadcast conversations (talk shows, call-in shows and the like).
5.12.1 Disfluent Speech
Speakers may stumble over their words, repeat themselves, utter partial words, restart
phrases or sentences, and use a lot of hesitation sounds. The sections below describe
how to deal with these phenomena in translation.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
13
5.12.2 Speaker Noise
Transcripts may sometimes include markup for speaker-produced noise like coughing,
sneezing and laughter. These markers should be copied over into the translation using
their original formatting, e.g. {cough}, {laugh}. Please refer to the Appendix for detailed
description and examples.
5.12.3 Filled Pauses
Filled pauses are hesitation sounds that speakers employ to indicate uncertainty or to
maintain control of a conversation while thinking of what to say next. Filled pauses do
not add any new information to the conversation (other than to indicate the speaker's
hesitation) and they do not alter the meaning of what is uttered, but they do provide
structural information and are an important part of spoken language so they should be
translated.
Arabic filled pauses include ๎€ุฃ, ุฃู… , ๎€†๎€‡ุฅ, ูˆูˆุฃ etc. They should be translated to their closest
counterpart in English, such as uh, um, eh or ah.
๎€ ๎€–๎€ž๎ุง๎€ค๎€ข๎คุง ๎€ ๎€ฐ๎€ค๎€›๎€Žุง ุนุง๎€ท๎€ก๎€Žุง ุฑ๎€Œ๎€ฏ๎€” ๎€ ๎€ญ ๎€ ๎€ฏ๎€ž๎ุฑูˆ ู…๎€Œู‡ ูŒุฑูˆุฏ ๎€Œ๎€๎€Ž ๎€ุฃ ุฉ๎€ฎ๎€๎€ง๎€ฅ๎€Žุง ุช๎€Œ๎€‡๎’๎€™๎€Žุง
.
The United States, uh, has an important and principal role in the Arab-
Israeli dispute.
๎€ุฃ ุฃู…ู…๎€ท๎€— ูƒ๎€Œ๎€กู‡ูˆ ุŒ๎€’๎€”ุฃ ูƒ๎€Œ๎€กู‡ ุŒ๎€พ๎€›๎€ด ุŒ๎€†๎€ดุฅ ๎€ฎ๎€š๎€ง๎€—ุฃ .
Uh, um, I think that, yes, there is hope, and there is determination.
It may sometimes be difficult to decide where to place filled pause in the translated text,
since languages vary in where filled pauses occur in speech. The translator is free to
shift the location of the filled pause in the translation to make the English more natural,
but the filled pause should not be deleted or ignored.
5.12.4 Translation of
๎€ุฃ
,
ุฃู… ,๎€†๎€‡ุฅ
In conversational speech, ๎€ุฃ, ุฃู… , ๎€†๎€‡ุฅ can be used in many ways. Translators should
differentiate the multiple uses of these words and translate accordingly.
๎€ุฃ, ุฃู… ,๎€†๎€‡ุฅ can function as a filled pause. See section 6.12.3 for discussion of how to
translate filled pauses.
๎€ุฃ, ุฃู… ,๎€†๎€‡ุฅ can also function as a backchannel, in order to provide positive feedback to the
speaker to encourage further talk or to confirm that the listener is listening, as in the
following example between speaker A and speaker B:
๎€ธ๎€ž๎€ด๎€Œ๎€“๎€Žุง ๎€ธ๎™๎€š๎€ก๎€Žุง ๎€•๎€Žุฅ ๎ฉู‡๎€๎€ด ู†ุฃ ๎€’๎€ฉ๎š ๎€ธ๎™๎€š๎€ก๎€Žุง ๎€๎€ู‡ ๎€ ๎€ญ ๎€ฎ๎€‡ุฑุฃ ๎€Œ๎€ดุฃ
A:
๎€ุฃ
B:
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
14
๎€ธ๎€ž๎€ฅ๎€Ž๎€Œ๎€— ๎€ธ๎€ž๎ž๎€ž๎€œุง๎€ค๎€ง๎€ขุฅ ูุง๎€ฎู‡ุฃ ๎€Œ๎€๎€Ž ๎€ธ๎€ž๎๎€‡๎€ค๎€”๎€ถุง ๎€ธ๎€”๎€™๎๎€๎€Žุง
A:
ู…ุฃ
B:
ู๎“๎€ž๎€ก๎€ฏ๎€Žุง ุชุง๎€ค๎ˆ๎€— ุŒ ๎€ ๎€ก๎€›๎€‡ ๎€Œ๎€๎€Ž
:
A
In backchannel cases, the appropriate English counterpart is something uh-huh or yes.
5.12.5 Repetition and Restarts
When a speaker repeats him/herself or restarts a sentence halfway through, the
repeated words should be translated into English:
ุง๎€๎€๎€Ž ุถ๎€ค๎€›๎€œ ูŠ๎€ู‘๎€Žุง ๎€†๎€ž๎€Žุฅ ู„ูˆ๎ญ๎€‡ ู๎€™๎€ข ๎€Œ๎€” ๎€ุฃ ุŒู‹๎Š๎€›๎€ญ ุŒุงูˆุฏ๎€ฎ๎€๎€‡ ๎“๎€ž๎€ง๎€ž๎€ข๎€Œ๎€ขุฃ ๎“๎€ž๎€ง๎€Ž๎ฎ๎€ฏ๎€” ูƒ๎€Œ๎€กู‡ู„ุง -๎—๎€ž๎€ฟ๎•๎€Žุง ๎€ค๎€”๎€ถุง ุง๎€๎€๎€Ž .
There are two main issues, defining actually, uh, what will happen to
those who were subjected to this %pw, to this horrible thing.
๎€Œ๎๎€‡๎€ค๎€”ุฃ ๎€ ๎€ญ ๎“๎€ž๎€ฅ๎€ž๎€š๎€ฅ๎€Žุง ๎“๎€ž๎€ž๎€Žุง๎€ค๎€ฉ๎€ž๎€–๎€Žุง ๎ฏ๎€›๎€ฐ ๎€Œ๎€๎€Ž ๎€’๎€ฉ๎™๎€‡ ๎€Œ๎€”๎€ฎ๎€ก๎€— ๎€’๎€ฉ๎™๎€‡ ๎€Œ๎€”๎€ฎ๎€ก๎€— ุŒ๎€Œ๎€”๎€ฎ๎€ก๎€— ุฉุฑุงุฏุฅ ๎€ณ๎€ฏ๎€ž๎€Ž ๎€ณ๎€ฏ๎€ž๎€Ž ุŒ๎€•๎€ก๎€›๎€‡
.
I mean, it is not, it is not an administration when some liberals
residing in America beat, beat the drum for it.
๎€Œ๎€ฐูˆุฑูˆุฃ ๎€ ๎€ญ ๎€ ๎€ญ ๎€ณ๎€ฏู‘๎€ข๎ฎ๎€œ ๎€ธ๎€ž๎€Žุง๎€ค๎€ฉ๎€ž๎€–๎€Žุง.
Liberalism was established in in Europe.
5.12.6 Partial Words
A speaker may stop in the middle of pronouncing a word, which results in a partial word.
Transcribers will sometimes use a dash โ€œ-โ€œ to indicate a partial word in the source text
and the point at which word was broken off. Translators should not attempt to translate
the partial word; their existence should be indicated by using the symbol %pw in the
English translation. For instance:
๎€ฐ ุจ๎€Œ๎€บ๎€ž๎€ฐ ๎€ค๎€ž๎€ฉุข ุก๎€ท๎€ฝ ุŒ๎€ค๎€—๎€Œ๎ˆ๎€ฅ๎€Žุง ู‰๎€™๎€ง๎€ฏ๎€” ๎€•๎€–๎€—๎€ฌ -๎€ฎ๎€‡๎€ฎ๎€ผ ๎€ฌ๎€–๎€š๎€ฐ .
At the emotional level, a large portion is inflicted with %pw severe
worry.
ุต๎€™๎€œ ุŒู‹ุง๎€ฎ๎€ฝ ๎€ธ๎ž๎€—๎€ท๎€” ู…๎Š๎‡ุฃ ๎€ุฃ ๎€’๎€บ๎€๎€‡ ุฃ๎€ฎ๎€ฉ๎€‡ -๎‚๎€ž๎€ฐุง๎€™๎๎€Žุง ๎€ฎ๎‡ ๎€ ๎€Žุฅ ๎€’๎€บ๎€œ ุŒ.
They start to have, uh, very disturbing dreams, %pw reaching the point
of nightmares.
๎€ ๎€ก๎”๎€™๎€Žุง.
ู„ุงูˆ ๎€ ๎€ก๎”๎€™๎€Žุง ๎€พู‡๎€Œ๎•๎€ง๎€Žุง ๎“๎€” ๎€ ๎€Ž๎€Œ๎€— ู‰๎€™๎€ง๎€ฏ๎€” ๎€•๎€–๎€— ู„๎€ฎ๎€ง๎€ข ๎ฑ๎๎€Œ๎€ง๎€ด -ู„ุง ๎€ฌ๎€ญุง๎€™๎€ง๎€Žุง -ู„ุง -๎€ุฃ
Results that will show a high degree of national mutual understanding
and %pw, accord, %pw, %pw, uh, national accord.
.
ูƒ ๎€Œ๎€ดุฃ ุง๎€๎ู‡ -๎—๎€ฅ๎€ง๎€ฏ๎€ฅุข
This is how I %pw, as a listener.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
15
ุณ ุงุฐุฅ ุŒ๎€ณ๎€๎€ฅ๎€ข ุงุฐุฅ-
If I may, if %pw
5.12.7 Semi-intelligible and Unintelligible Speech
Sometimes an audio file will contain a section of speech that is impossible to
understand. In these cases, transcribers use empty double parenthesis (( )) to mark
totally unintelligible speech. For example:
ู†ุฅ ๎€ธ๎€ž๎šุง๎€ค๎€›๎€Žุง ๎€ธ๎”๎€ค๎ˆ๎€Žุง ๎€ณ๎€Ž๎€Œ๎š37 (())ุง๎€™๎€–๎€ง๎š ๎€ฎ๎š.
If it is possible for the transcriber to guess the speakerโ€™s words, they transcribe what
they think they hear and surround the uncertain transcription/text with double
parenthesis. For example:
๎€†๎€œ๎€™๎š ๎€ณ๎ต๎€–๎€ฐ ูŠ๎€™๎š ู„ุง๎€ท๎€Žุฒ ุจ๎€ค๎ถ6,8 ุณ๎€Œ๎€ž๎€š๎€” ๎€•๎€–๎€— ))๎€ค๎€ง๎€ป๎€‡ุฑ (( ๎€ธ๎€—๎€Œ๎€ฏ๎€Žุง ๎€ ๎€ฐูˆ๎€ค๎€ž๎€ด12,19๎ท๎€ง๎€ก๎€‡๎€ค๎€น ๎€ณ๎€ž๎š๎€™๎€ง๎€ฐ.
Translators should transfer the double parenthesis over to the English translation, along
with the translated words (if there are words to translate). For instance:
ู†ุฅ ๎€ธ๎€ž๎šุง๎€ค๎€›๎€Žุง ๎€ธ๎”๎€ค๎ˆ๎€Žุง ๎€ณ๎€Ž๎€Œ๎š37 (())ุง๎€™๎€–๎€ง๎š ๎€ฎ๎š.
Iraqi police said that 37 (()) were killed.
๎€†๎€œ๎€™๎š ๎€ณ๎ต๎€–๎€ฐ ูŠ๎€™๎š ู„ุง๎€ท๎€Žุฒ ุจ๎€ค๎ถ6,8 ุณ๎€Œ๎€ž๎€š๎€” ๎€•๎€–๎€— ))๎€ค๎€ง๎€ป๎€‡ุฑ (( ๎€ธ๎€—๎€Œ๎€ฏ๎€Žุง ๎€ ๎€ฐูˆ๎€ค๎€ž๎€ด12,19๎ท๎€ง๎€ก๎€‡๎€ค๎€น ๎€ณ๎€ž๎š๎€™๎€ง๎€ฐ.
A strong earthquake, measuring 6.8 on the ((Richter)) scale, hit
Nairobi at 12:19 Greenwich Time.
5.12.8 Program Names
There are many ways to translate names of Arabic TV and radio programs into English.
Translators should use the translation that is in common use for these programs. The
following table provides common translations for some of the programs in the LDC
source data:
๎“๎™๎€ก๎€ผุงูˆ ๎“๎€”
From Washington
ุญ๎€™๎€ง๎•๎€” ุฑุง๎€™๎‡
Open Dialogue
๎‚ุข๎€Œ๎€›๎€ฅ๎€Žุง ๎€๎€Œ๎ž๎€œ๎’ุง
Opposite Direction
ุฏูˆ๎€ฎ๎‡ ๎Š๎€ฐ
Without Bounds
ูŠุฃุฑ ๎“๎€” ๎€ค๎€“ุขุฃ
More Than One Opinion
๎€ฎ๎€ž๎€›๎€ข ๎€พุขุฑ๎€Œ๎€๎€ด
Naharkum Saiid
ู…๎€™๎€ž๎€Žุง ู‚ุง๎€ค๎€›๎€Žุง
Iraq Today
ู‚ุง๎€ค๎€›๎€Žุง ๎“๎€”
From Iraq
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
16
๎€ธ๎€—๎€Œ๎€ฏ๎€Žุง ุซุง๎€ฎ๎‡ุฃ
Affairs of the Hour
ุฑุง๎€ค๎€š๎€Žุง ุน๎€Œ๎€ก๎€ฒ
Decision Makers
ู…๎€™๎€ž๎€Žุง ๎€พ๎€Ž๎€Œ๎€›๎€Žุง
The World Today
5.12.9 Transcription Mark-ups
Some mark-ups are used in transcripts to mark special speech phenomenon, such as
mispronounced words, dialect, foreign language etc.. Some signs need to be copied
over into the translation. Please refer to the appendix on how to handle the transcription
mark-ups.
5.13 Special Issues for Translation of Weblogs and Newsgroups
Weblogs and Newsgroups frequently contain emoticons and URLs. Please follow the
instructions below when translators come across these cases.
5.13.1 Emoticons (Emotion Icons)
An emoticon is an ASCII glyph used to indicate an emotional state in email, news or
online posting. Emoticons should be copied over to English translation.
The following is an incomplete list of popular emoticons you may see in weblogs and
newsgroup text:
:-) Standard Smiley (you are joking; satisfied)
:) Standard Smiley for lazy people
,-) Winking Smiley. You don't mean it, even if you are joking
;-) Winking Smiley. See above
:-> Follows a really sarcastic remark
5.13.2 URLs
URLs that appear in the source text should be copied directly into the translation.
5.13.3 Punctuation
Translators should not standardize punctuation in the translation when it is missing from
the source text. Non-standard punctuation (dashes, ellipses, missing end-of-sentence
punctuation and the like) should also be preserved in the translation for web text.
6 Quality Control at LDC
The quality of a translation is not determined by its style of prose or elegance in use of
English, but mostly in its accuracy. Our definition of โ€œqualityโ€ first and foremost requires
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
17
the translation delivery to be faithful to the source, preserving its original meaning and
style. This should be accomplished with the consideration that the translation should
also be comprehensible and in fluent English.
Each translation delivery received by LDC is reviewed for completeness, accuracy and
overall quality. Payment for completed translation is contingent upon successful
completion of the quality review.
Fluent bilinguals working at LDC select a portion of each delivery and grade it according
to several criteria. The amount of data selected for review varies depending on the
delivery size, but at minimum constitutes 1,200 words drawn from multiple documents.
The grading system used by all translation reviewers is outlined below:
Error Deduction
Syntactic 4 points
Lexical 2 points
Poor English usage 1 point
Significant spelling or punctuation error ยฝ point (to a maximum of 5 points)
Overlooked file or section 40 point
Below are the error categories:
o Syntactic: syntactic error
o Missed translation: lexical error, when certain word is missing from translation
o Added translation: lexical error, when certain word is inserted in translation
o Wrong translation: lexical error, when translation is wrong
o Poor English Usage: lexical error, when English is not awkward
o Punctuation: punctuation error, when punctuation is changed or missing in
translation
o Spelling error
o Format problems: spelling/punctuation error, when translation mark-up is
missing or incorrect
o Overlooked file or section: translation missing from a big portion or a file is
missing its translation
For each error found, the corresponding number of points will be deducted. For
instance, if the original text says Bush will address the General Assembly of the United
Nations tomorrow, and tomorrow is missing in the translation, 2 points would be
deducted.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
18
If more than 40 points are deducted from a 1200-word sample, the translation will be
considered unacceptable and the whole delivery will be sent back to the translation
team for improvement.
If a delivery is sent back to the translation team for further proofreading, the improved
version must be completed within 5 business days.
Upon completion of the QC review, the LDC translation team will receive a summary
report that includes the following components:
Part 1: Data Profile
Information about the data under review (volume, genre, etc.) and an overall rating for
this delivery as excellent, very good, good, fair, poor.
Excellent - 2 or less points deducted
v. good - 3-10 points
good - 11-20 points
fair - 21-40 points
poor - more than 40 points
Part 2: QC Report Summary
Number of words checked:
Error tally: __ points deducted overall
o Syntactic: __ points deducted
o Missed translation: __ points deducted
o Added translation: __ points deducted
o Wrong translation: __ points deducted
o Poor English Usage: __points deducted
o Punctuation: __ points deducted
o Spelling error: __ points deducted
o Format problems: __ points deducted
o Overlooked file or section: __ points deducted
o Other: __ points deducted
Part 3: QC Report Details
For each significant deduction above, at least one example is provided, along with the
following information:
โ€ข FileID: e.g. google.com._edfdkfjd_1223.txt
โ€ข Your translation
โ€ข Suggested translation
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
19
โ€ข Discussion: a description of what should be changed and why
7 Guidelines
In case these guidelines prove to be unclear, LDC reserves the right to modify them.
Agencies will always use the latest version.
GALE Program: Arabic to English Translation Guidelines
Version 2.7 9/7/2010
20
Appendix: Transcription and translation mark-ups