MantaID - A Machine-Learning Based Tool to Automate the Identification of
Biological Database IDs
The number of biological databases is growing rapidly, but
different databases use different IDs to refer to the same
biological entity. The inconsistency in IDs impedes the
integration of various types of biological data. To resolve the
problem, we developed 'MantaID', a data-driven,
machine-learning based approach that automates identifying IDs
on a large scale. The 'MantaID' model's prediction accuracy was
proven to be 99%, and it correctly and effectively predicted
100,000 ID entries within two minutes. 'MantaID' supports the
discovery and exploitation of ID patterns from large quantities
of databases. (e.g., up to 542 biological databases). An
easy-to-use freely available open-source software R package, a
user-friendly web application, and API were also developed for
'MantaID' to improve applicability. To our knowledge, 'MantaID'
is the first tool that enables an automatic, quick, accurate,
and comprehensive identification of large quantities of IDs,
and can therefore be used as a starting point to facilitate the
complex assimilation and aggregation of biological data across
diverse databases.