
We present the Multilingual Word Aligner (MWA), a new open source Multilingual model for words timestamps boundaries alignment for given audio. We composed new embedding and developed architecture for accurate, multilingual word alignment reached state-of-the-art performances on Timit and Buckeye known benchmarks. We evaluate Transformer, Conformer and VGG as sequence boundary detection models in our experiments, ultimately selecting Conformer, based on its superior performance. Finally, dynamic programming was used to perform the final alignment, refining the boundaries and ensuring accurate word segmentation. We compared the proposed model to the best aligners model today surpassing all the leading models and reaches state-of-the-art performance. Furthermore, final model performed very good results on languages that were not seen during the training phase (Hebrew, Dutch and German).