Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its audio recordings are freely available at http://www.audiotreasure.com/. The only problem when you use those in speech-relevant tasks is that each file is too long. That's why I split each audio file such that an audio clip is equivalent to a verse. Subsequently I aligned them to the text.
This dataset is composed of the following:
- README.md
- wav files sampled at 12,000 KHZ
- transcript.txt.
transcript.txt
is in a tab-delimited format. The first column is the audio file paths. The second one is the script. Finally, the rightmost column is the duration of the audio file.
I would like to show my respect to Dave, the host of www.audiotreasure.com and the reader of the audio files.
You may want to check my project using this dataset at https://github.com/Kyubyong/tacotron.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides the full text of the King James Bible, a sacred book for Christians with a rich and varied history. The Old Testament, originally written in Hebrew, recounts the story of the Israelite people and includes religious law, poetry, and prophecy. The New Testament, originally in Greek, details the life of Jesus Christ and the early development of the Christian church. Authorised in 1604 by King James I of England for the Church of England, this translation has become the most popular English version of the bible. It is an excellent resource for Natural Language Processing (NLP) techniques, offering opportunities to explore unique linguistic features such as Hebrew parallelism and chiasmus, or to uncover "riddles" referenced by King Solomon in the book of Proverbs.
The dataset is typically provided in a CSV format. It contains 30,833 unique verse values. Approximately 74% of the verses belong to the Old Testament, with the remaining 26% from the New Testament. The book of Psalms accounts for about 8% of the verses, while Genesis constitutes 5%, and other books make up 87%. The distribution of verse text length varies, with significant counts of verses falling into various character length ranges, from 1.00-4.25 characters (4,893 verses) up to longer ranges such as 40.00-43.25 characters (3,779 verses) and 17.25-20.50 characters (4,446 verses).
This dataset is ideal for various applications, especially those involving Natural Language Processing (NLP). Potential uses include identifying instances of Hebrew literary techniques like parallelism, detecting chiastic structures spanning chapters, and exploring the "riddles" mentioned in the book of Proverbs. It can also be used for linguistic analysis, text mining, and creating large language models.
The dataset has global relevance, providing a foundational text for users worldwide. The content spans the historical periods covered by the Old Testament (focusing on the Israelite people) and the New Testament (covering the life of Jesus Christ and the early Christian church). The translation itself was authorised in 1604.
CC0
This dataset is suitable for: * Researchers and academics: For studies in theology, linguistics, literary analysis, and digital humanities. * Developers and data scientists: For building NLP models, text generation, and historical text analysis tools. * Educators: For teaching about biblical texts, history, and language. * Individuals interested in religious texts: For personal study or exploration of the King James Bible.
Original Data Source: The King James Bible
One of my favorite things to do is study the usage of Hebrew and Greek words throughout the Bible. I use study tools such as Strong's Concordance, Englishman's Hebrew and Greek Concordances, Thayer's Greek Lexicon, and Gesenius' Hebrew Lexicon to gather details about the word or words of study. I then analyze the usage across all passages in order to get a better understanding of the Word of God as well as derive any specific themes from the Bible.
This notebook explores the usage of two Greek word groups used to designate sound mindedness in the New Testament.
G3524/5 - lit. "abstaining from wine"; sober, sober-minded; by metaphor, self-control, aware, watchful, in possession of one's faculties
G4993/8 - being of sound mind, in one's right mind
The Excel spreadsheet associated with this notebook is a collection of data acquired from the study tools mentioned above. I decided to try to use Pandas to analyze the word usage and provide various breakdowns to spot anomalies and deviations.
Here is a summary of the Excel data contents.
Old and New Testament chapters from the King James Version (KingJamesBibleOnline.org).
The "chapters" file holds 1189 files that each pertain to a chapter of the Bible. Each chapter csv file contains rows for each verse, and each word for each verse/row has its own column.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its audio recordings are freely available at http://www.audiotreasure.com/. The only problem when you use those in speech-relevant tasks is that each file is too long. That's why I split each audio file such that an audio clip is equivalent to a verse. Subsequently I aligned them to the text.
This dataset is composed of the following:
- README.md
- wav files sampled at 12,000 KHZ
- transcript.txt.
transcript.txt
is in a tab-delimited format. The first column is the audio file paths. The second one is the script. Finally, the rightmost column is the duration of the audio file.
I would like to show my respect to Dave, the host of www.audiotreasure.com and the reader of the audio files.
You may want to check my project using this dataset at https://github.com/Kyubyong/tacotron.