An Arabic Probabilistic Parser Based on a Property Grammar

Raja Bensalem, Kais Haddar, and Philippe Blache.
2023. ACM Transactions on Asian and Low-Resource Language Information Processing 22 (10): 1–25  —  @HAL
The specificities of the Arabic parsing such as the agglutination, the vocalization and the relatively order-free of words in the Arabic sentences, remain a major issue to consider. To promote its robustness, such parser should define different types of constraints. The Property Grammar formalism (PG) verify the satisfiability of the constraints directly on the units of the structure, thanks to its properties (or relations). In this context, we propose to build a probabilistic parser with syntactic properties, using a PG, and we measure the production rules in terms of different implicit information and in particular the syntactic properties. We experimented our parser on the treebank ATB using the parsing algorithm CYK and obtained encouraging results. Our method is also automatic for the implementation of most property type. Its generalization for other languages or corpus domains (using treebanks) could be a good perspective. Its combination with the pre-trained models of BERT may also make our parser faster.
Posted in Featured publication.