In this article, an integrated model is derived that jointly identifies and aligns bilingual named entities (NEs) between Chinese and English. The model is motivated by the following observations: (1) whether an NE is translated semantically or phonetically depends greatly on its entity type, (2) entities within an aligned pair should share the same type, and (3) the initially detected NEs can act as anchors and provide further information while selecting NE candidates. Based on these observations, this article proposes a translation mode ratio feature (defined as the proportion of NE internal tokens that are semantically translated), enforces an entity type consistency constraint, and utilizes additional new NE likelihoods (based on the initially detected NE anchors).
Experiments show that this novel method significantly outperforms the baseline. The type-insensitive F-score of identified NE pairs increases from 78.4% to 88.0% (12.2% relative improvement) in our Chinese–English NE alignment task, and the type-sensitive F-score increases from 68.4% to 83.0% (21.3% relative improvement). Furthermore, the proposed model demonstrates its robustness when it is tested across different domains. Finally, when semi-supervised learning is conducted to train the adopted English NE recognition model, the proposed model also significantly boosts the English NE recognition type-sensitive F-score.