While believers often turn to scriptures in hordes for solace and spiritual guidance, researchers at Dartmouth College have found that Bible could do beyond just that. The holy book can even help them better computer-based text Algorithm Translation.
The Bible is home to enormous amount of data and algorithms excel when they get more data to train upon. By using the data in the Bible, the research team developed an algorithm that trained on several versions of sacred texts. It can now convert written works into different styles for wide range of audiences.
Although there are many digital tools to translate text between languages such as English and Spanish, creating style translators that transform the style while keeping text in the same language have been slower to make an appearance. This is where the researchers turned to the Bible.
According to a press statement released on 23 October, the researchers saw a ‘large, previously untapped dataset of aligned parallel text’ in the Bible. Apart from being a source of spiritual guidance for various population around the world, over 31,000 verses are present in each version of the Bible. These data enabled the researchers to produce more than 1.5 million unique pairings of target verses and source for machine-learning training sets.
The study was featured in the recent issue of journal Royal Society Open Science.
Researchers said that it is not the first parallel dataset developed for style translation, though Bible is used for the first time. Previously used texts ranging from classic Shakespeare to latest Wikipedia entries, provided either smaller datasets or were not suited for learning style translation.
According to lead author Keith Carlson, Bible is a remarkable source text to work with for style translation as it comes in many different written styles. In addition, it is already extensively indexed through the consistent use of chapter, book, and verse numbers which greatly benefited the research team. Algorithm Translation, Algorithm alignment errors are often caused by automatic processes of matching different versions of the same text; it is eliminated by the predictable organization of the text across versions.
The researchers referred sentence length to analyze ‘style’ for the study including the use of passive and active voices, and choices of words that could provide texts with varying degrees of formality and simplicity. The study explained that different wording might covey multiple levels of familiarity of the reader, showcase cultural ethics of the writer, or be simpler to understand for specific populations.
In the study, two algorithms—a machine translation system named ‘Moses’ and a neural network termed ‘Seq2Seq’—were fed with 34 stylistically distinct Bible versions.
While various Bible versions were adopted to train the computer code, researchers reported, unique systems could be developed that allow style translation of any written text for a variety of audiences.