JGlossator

Contents

Overview

You may use JGlossator to create a gloss for Japanese text complete with de-inflected expressions, readings, audio pronunciation, example sentences, pitch accent, word frequency, kanji information, and grammar analysis.

JGlossator will automatically gloss any Japanese text that you copy to the clipboard. Setting aside more obvious usage, this makes it ideal for use with Capture2Text when reading manga or with AGTH/ITH when either playing visual novels or watching video with Japanese subtitles.

By right-clicking on a glossed entry you will be presented with a menu that allows you to view alternate entries, save the current entry to file, or listen to an audio pronunciation. You may hover over a Japanese word to see a Rikaichan-style popup.

JGlossator is highly configurable and allows you to modify many of the default behaviors and settings. For example, you can turn off the clipboard monitor, change themes, specify a new save format, remove pitch accent, etc. Just press the options button on the far right.

Type in an English word to search definitions instead. The resulting list will be sorted based on frequency. To search whole words only, add "w/" in front of the word. To use a regular expression, type "r/" followed by a regular expression.

Kanji search is supported as well. Just use one of these formats:

Search Based On Format
Meanings km/<comma-separated list of meanings> (Example: km/dragon)
RTK Primitives kp/<comma-separated list of RTK primitives> (Example: kp/rain,eel)
Radical meanings kr/<comma-separated list of radical meanings> (Example: kr/heart,moon,sword)
ON readings ko/<comma-separated list of ON readings> (Exampe: ko/ねん)
KUN readings kk/<comma-separated list of KUN readings> (Exampe: kk/こころ)

You can also perform a gloss using your favorite EPWING dictionaries. Just add them to the Dictionary Setup tab of the Options dialog.

Know basic HTML/CSS? Want to change a font, color, or maybe even the format of the kanji gloss? No problem, just create a new theme in the themes directory or modify an existing one.

Some useful shortcut keys:

ESC Place cursor in the input box
Up (when the input box has focus) Clear text in the input box
Backspace (when the input box doesn't have focus) Go back through the history
Ctrl-Up Go back through the history
Ctrl-Down Go forward through the history

Download

The latest version may be found on the JGlossator download page hosted by SourceForge. The source code is also available at this link.

Installation

  1. Make sure that you have .Net Framework Version 3.5 installed (you probably already do). If not, you can get it through Windows update or via the Microsoft website
  2. Unzip JGlossator. Make sure that there are no non-ASCII (ex. Japanese) characters in the JGlossator path. Also don't place JGlossator in Program Files due to write permission issues.
  3. In the unzipped directory, simply double-click JGlossator.exe to launch JGlossator.

Screenshots

Blacklist

You may add the dictionary form of words that you would rather not see appear in the gloss to blacklist.txt (in the same directory as JGlossator.exe).

Word Frequencies

Most words will have a frequency number attached to them. This is the number of words that are more frequent than the given word + 1. So if this number is 700, it means that there are 699 words that are more frequent. The lower the number, the more frequent the word. Colors of the frequency numbers: very common words are green-ish, common words are yellow-ish, uncommon words are orange-ish, rare words are pink-ish. Frequencies are based on analysis of 5000+ novels. Naturally, frequency based on other mediums (such as newspapers) might vary. Not all words have frequency information.

Abbreviations

Part of Speech Marking
adj-iadjective (keiyoushi)
adj-naadjectival nouns or quasi-adjectives (keiyodoshi)
adj-nonouns which may take the genitive case particle `no'
adj-pnpre-noun adjectival (rentaishi)
adj-t`taru' adjective
adj-fnoun or verb acting prenominally (other than the above)
adjformer adjective classification (being removed)
advadverb (fukushi)
adv-nadverbial noun
adv-toadverb taking the `to' particle
auxauxiliary
aux-vauxiliary verb
aux-adjauxiliary adjective
conjconjunction
ctrcounter
expExpressions (phrases, clauses, etc.)
intinterjection (kandoushi)
ivirregular verb
nnoun (common) (futsuumeishi)
n-advadverbial noun (fukushitekimeishi)
n-prefnoun, used as a prefix
n-sufnoun, used as a suffix
n-tnoun (temporal) (jisoumeishi)
numnumeric
pnpronoun
prefprefix
prtparticle
sufsuffix
v1Ichidan verb
v2a-sNidan verb with 'u' ending (archaic)
v4hYodan verb with `hu/fu' ending (archaic)
v4rYodan verb with `ru' ending (archaic)
v5Godan verb (not completely classified)
v5aruGodan verb - -aru special class
v5bGodan verb with `bu' ending
v5gGodan verb with `gu' ending
v5kGodan verb with `ku' ending
v5k-sGodan verb - iku/yuku special class
v5mGodan verb with `mu' ending
v5nGodan verb with `nu' ending
v5rGodan verb with `ru' ending
v5r-iGodan verb with `ru' ending (irregular verb)
v5sGodan verb with `su' ending
v5tGodan verb with `tsu' ending
v5uGodan verb with `u' ending
v5u-sGodan verb with `u' ending (special class)
v5uruGodan verb - uru old class verb (old form of Eru)
v5zGodan verb with `zu' ending
vzIchidan verb - zuru verb - (alternative form of -jiru verbs)
viintransitive verb
vkkuru verb - special class
vnirregular nu verb
vsnoun or participle which takes the aux. verb suru
vs-csu verb - precursor to the modern suru
vs-isuru verb - irregular
vs-ssuru verb - special class
vttransitive verb

Field of Application
BuddhBuddhist term
MAmartial arts term
compcomputer terminology
foodfood term
geomgeometry term
gramgrammatical term
linglinguistics terminology
mathmathematics
milmilitary
physicsphysics terminology

Miscellaneous Markings
Xrude or X-rated term
abbrabbreviation
archarchaism
atejiateji (phonetic) reading
chnchildren's language
colcolloquialism
derogderogatory term
eKexclusively kanji
ekexclusively kana
famfamiliar language
femfemale term or language
gikungikun (meaning) reading
honhonorific or respectful (sonkeigo) language
humhumble (kenjougo) language
ikword containing irregular kana usage
iKword containing irregular kanji usage
ididiomatic expression
ioirregular okurigana usage
m-slmanga slang
malemale term or language
male-slmale slang
oKword containing out-dated kanji
obsobsolete term
obscobscure term
okout-dated or obsolete kana usage
on-mimonomatopoeic or mimetic word
poetpoetical term
polpolite (teineigo) language
rarerare (now replaced by "obsc")
senssensitive word
slslang
uKword usually written using kanji alone
ukword usually written using kana alone
vulgvulgar expression or word

Pitch Accents

What are pitch accents?

The following was taken from Wikipedia.

In standard Japanese (標準語 hyōjungo), pitch accent has the following effect on words spoken in isolation:
  1. If the accent is on the first mora, then the pitch starts high, drops suddenly on the second mora, then levels out. The pitch may fall across both moras, or mostly on one or the other (depending on the sequence of sounds)—that is, the first mora may end with a high falling pitch, or the second may begin with a (low) falling pitch, but a native speaker will hear the first mora as accented regardless.
  2. If the accent is on a mora other than the first or the last, then the pitch has an initial rise from a low starting point, reaches a near-maximum at the accented mora, then drops suddenly on the next.
  3. If the word doesn't have an accent, the pitch rises from a low starting point on the first mora or two, and then levels out in the middle of the speaker's range, without ever reaching the high tone of an accented mora. Japanese describe the sound as "flat" (平板 heiban) or "accentless".
Japanese accent is presented with a two-pitch-level model. In this representation, each mora (syllable) is either high (H) or low (L) in pitch, with the shift from high to low of an accented mora transcribed H*L.
  1. If the accent is on the first mora, then the first syllable is high-pitched and the others are low: H*L, H*L-L, H*L-L-L, H*L-L-L-L, etc.
  2. If the accent is on a mora other than the first, then the first mora is low, the following moras up to and including the accented one are high, and the rest are low: L-H, L-H*L, L-H-H*L, L-H-H-H*L, etc.
  3. If the word is heiban (doesn't have an accent), the first mora is low and the others are high: L-H, L-H-H, L-H-H-H, L-H-H-H-H, etc. This high pitch spreads to unaccented grammatical particles that attach to the end of the word, whereas these would have a low pitch when attached to an accented word. Although only the terms "high" and "low" are used, the high of an unaccented mora is not as high as an accented mora.
Format of JGlossator's pitch accents:

<blank> - Example: 単眼鏡 たんがんきょう
No pitch accent information available for this word.

0 – Example: 洗う あらう 0
Zero means no accent. From Wikipedia: "Word doesn't have an accent, the pitch rises from a low starting point on the first mora or two, and then levels out in the middle of the speaker's range, without ever reaching the high tone of an accented mora. Japanese describe the sound as "flat" (平板 heiban) or "accentless". "

2 – Example: 願う ねがう 2
The "2" indicates that the accent is on the 2nd mora (the が).

32 – Example: 著作権 ちょさくけん 32
The "32" indicates that the accent can be on either the 3rd mora (く) or 2nd mora (さ). This is in frequency order, meaning that it is more common for the accent to be on the 3rd mora than the 2nd mora.

{11} – Example: 超越論的観念論 ちょうえつろんてきかんねんろん {11}
Curly braces are placed around pitch accents that are in the double digits. The "11" indicates that the accent is on the 11th mora.

21,0 – Example: 飛車 しゃ 21,0
For some words, the pitch accent dictionary contains multiple sub-definitions in an entry. Sometimes each sub-definition can have a different pitch. A comma separates the pitch accents for the multiple sub-definitions. The "21,0" means that in the 1st sub-definition of the word, the accent is on either the 2nd mora (しゃ) or 1st mora (ひ), and that in the 2nd sub-definition of the word, no accent is present.

1|Ø – Example: 朝日 あさひ 1|Ø
For some words, the pitch accent dictionary contains multiple entries that have identical expressions and readings. The "|" separates the pitch found in each entry. The "1" indicates that in the first entry, the pitch accent was on the first mora. The "Ø" symbol indicates that the other entry contained no pitch accent information.

1-2 – Example: 思案投げ首 しあんなげくび 1-2
I'm not sure what the "-" is supposed to represent. It is present in the pitch accent dictionary so I left it in.

3? – Example: 手投弾 てなげだん 3?
A trailing question mark is added to pitch accents that have a small chance of being inaccurate and have not yet been checked by a human.

(part-of-speech) – Example: 道道 みちみち (副)0,(名)2
Sometimes pitch accent changes depending on the word's part-of-speech. The part-of-speech is placed inside of parenthesis. The above example shows that the pitch accent is "0" when the word is used as an adverb and "2" when the word is used as a noun.

Valid part-of-speech options:

(名)名詞
(代)代名詞
(動五)動詞五段活用
(動五[四])動詞口語五段活用・文語四段活用
(動四)動詞四段活用
(動上一)動詞上一段活用
(動上二)動詞上二段活用
(動下一)動詞下一段活用
(動下二)動詞下二段活用
(動カ変)動詞カ行変格活用
(動サ変)動詞サ行変格活用
(動ナ変)動詞ナ行変格活用
(動ラ変)動詞ラ行変格活用
(動特活)動詞特別活用
(形)形容詞
(形ク)形容詞ク活用
(形シク)形容詞シク活用
(形動)形容動詞
(形動ナリ)形容動詞ナリ活用
(形動タリ)形容動詞タリ活用
(ト|タル)「~と」(副)「~たる」(連体詞)の形で用いられるもの
(連体)連体詞
(副)副詞
(接続)接続詞
(感)感動詞
(助動)助動詞
(格助)格助詞
(接助)接続助詞
(副助)副助詞
(係助)係助詞
(終助)終助詞
(間投助)間投助詞
(並立助)並立助詞
(準体助)準体助詞
(接頭)接頭語
(接尾)接尾語
(連語)連語
(枕詞)枕詞