Abstract
This paper explores new methods for disambiguating the identity of individuals in classical Arabic citations (isnāds) using a network-based approach. After training a model to extract name mentions from classical Arabic, we embed these mentions in vector space using fine-tuned BERT representations and use community detection to infer clusters of coreferent mentions. The best-performing clustering approach reduces error on the CoNLL metric by 30%. Then, as a case study, we examine the problem of determining the number of direct transmitters to Ibn ʿAsākir (d. 1176) in a set of isnāds taken from the 12th century historical text Taʾrīkh Madīnat Dimashq (TMD, History of Damascus), using our method to replicate human judgement.
| Original language | English (UK) |
|---|---|
| Pages (from-to) | 1-20 |
| Number of pages | 20 |
| Journal | Journal of Historical Network Research |
| Volume | 8 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 2023 |
Keywords
- hadith
- name disambiguation
- natural language processing
- network analysis