4 datasets found
  1. The World English Bible

    • kaggle.com
    Updated Feb 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kyubyong Park (2018). The World English Bible [Dataset]. https://www.kaggle.com/datasets/bryanpark/the-world-english-bible-speech-dataset/discussion?sortBy=hot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 27, 2018
    Dataset provided by
    Kaggle
    Authors
    Kyubyong Park
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its audio recordings are freely available at http://www.audiotreasure.com/. The only problem when you use those in speech-relevant tasks is that each file is too long. That's why I split each audio file such that an audio clip is equivalent to a verse. Subsequently I aligned them to the text.

    Content

    This dataset is composed of the following:
    - README.md
    - wav files sampled at 12,000 KHZ
    - transcript.txt.

    transcript.txt is in a tab-delimited format. The first column is the audio file paths. The second one is the script. Finally, the rightmost column is the duration of the audio file.

    Acknowledgements

    I would like to show my respect to Dave, the host of www.audiotreasure.com and the reader of the audio files.

    Reference

    You may want to check my project using this dataset at https://github.com/Kyubyong/tacotron.

  2. o

    King James Bible Text Dataset

    • opendatabay.com
    .undefined
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). King James Bible Text Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/bed25800-59bc-493a-be2e-762b9fc891bf
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Knowledge Bundles
    Description

    This dataset provides the full text of the King James Bible, a sacred book for Christians with a rich and varied history. The Old Testament, originally written in Hebrew, recounts the story of the Israelite people and includes religious law, poetry, and prophecy. The New Testament, originally in Greek, details the life of Jesus Christ and the early development of the Christian church. Authorised in 1604 by King James I of England for the Church of England, this translation has become the most popular English version of the bible. It is an excellent resource for Natural Language Processing (NLP) techniques, offering opportunities to explore unique linguistic features such as Hebrew parallelism and chiasmus, or to uncover "riddles" referenced by King Solomon in the book of Proverbs.

    Columns

    • version_name: The name of the bible version.
    • version_abbr: The abbreviation for the bible version.
    • testament_abbr: An abbreviation for the bible section, either Old Testament (OT) or New Testament (NT).
    • testament_name: The full name of the bible section, Old Testament or New Testament.
    • book_name: The name of the book within the bible.
    • book_number: The numerical order of the book within the bible.
    • chapter_number: The chapter number within a book.
    • verse_number: The verse number within a chapter.
    • verse_text: The actual text of the verse.

    Distribution

    The dataset is typically provided in a CSV format. It contains 30,833 unique verse values. Approximately 74% of the verses belong to the Old Testament, with the remaining 26% from the New Testament. The book of Psalms accounts for about 8% of the verses, while Genesis constitutes 5%, and other books make up 87%. The distribution of verse text length varies, with significant counts of verses falling into various character length ranges, from 1.00-4.25 characters (4,893 verses) up to longer ranges such as 40.00-43.25 characters (3,779 verses) and 17.25-20.50 characters (4,446 verses).

    Usage

    This dataset is ideal for various applications, especially those involving Natural Language Processing (NLP). Potential uses include identifying instances of Hebrew literary techniques like parallelism, detecting chiastic structures spanning chapters, and exploring the "riddles" mentioned in the book of Proverbs. It can also be used for linguistic analysis, text mining, and creating large language models.

    Coverage

    The dataset has global relevance, providing a foundational text for users worldwide. The content spans the historical periods covered by the Old Testament (focusing on the Israelite people) and the New Testament (covering the life of Jesus Christ and the early Christian church). The translation itself was authorised in 1604.

    License

    CC0

    Who Can Use It

    This dataset is suitable for: * Researchers and academics: For studies in theology, linguistics, literary analysis, and digital humanities. * Developers and data scientists: For building NLP models, text generation, and historical text analysis tools. * Educators: For teaching about biblical texts, history, and language. * Individuals interested in religious texts: For personal study or exploration of the King James Bible.

    Dataset Name Suggestions

    • King James Bible Text Dataset
    • KJV Verses Collection
    • Biblical Text (King James Version)
    • Sacred Scripture Dataset

    Attributes

    Original Data Source: The King James Bible

  3. Sound Mind Bible Word Study

    • kaggle.com
    Updated Nov 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JORGE GARCIA-INIGUEZ (2022). Sound Mind Bible Word Study [Dataset]. https://www.kaggle.com/datasets/jorgegarciainiguez/sound-mind-bible-word-study/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    JORGE GARCIA-INIGUEZ
    Description

    One of my favorite things to do is study the usage of Hebrew and Greek words throughout the Bible. I use study tools such as Strong's Concordance, Englishman's Hebrew and Greek Concordances, Thayer's Greek Lexicon, and Gesenius' Hebrew Lexicon to gather details about the word or words of study. I then analyze the usage across all passages in order to get a better understanding of the Word of God as well as derive any specific themes from the Bible.

    This notebook explores the usage of two Greek word groups used to designate sound mindedness in the New Testament.

    1. G3524/5 - lit. "abstaining from wine"; sober, sober-minded; by metaphor, self-control, aware, watchful, in possession of one's faculties

    2. G4993/8 - being of sound mind, in one's right mind

    The Excel spreadsheet associated with this notebook is a collection of data acquired from the study tools mentioned above. I decided to try to use Pandas to analyze the word usage and provide various breakdowns to spot anomalies and deviations.

    Here is a summary of the Excel data contents.

    • Group - Word group for Greek word. Derived from first few characters of Strong's number. Will group related Greek words together.
    • Word - Strong's number for the Greek word.
    • Passsage - The book of the Bible where word is found.
    • Translation - The translation of the word in the specific Version.
    • Version - The translation version such as King James Version (KJV), New International Version (NIV), Diaglott (DIAG), etc.
    • Comments - Miscellaneous comments about the passage usage of the word. For example, any marginal reference from the translators about the word.
  4. King James Old/New Testament Chapters

    • kaggle.com
    zip
    Updated Jul 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Talbott (2020). King James Old/New Testament Chapters [Dataset]. https://www.kaggle.com/ttalbitt/king-james-oldnew-testament-chapters
    Explore at:
    zip(1856073 bytes)Available download formats
    Dataset updated
    Jul 10, 2020
    Authors
    Talbott
    Description

    Old and New Testament chapters from the King James Version (KingJamesBibleOnline.org).

    The "chapters" file holds 1189 files that each pertain to a chapter of the Bible. Each chapter csv file contains rows for each verse, and each word for each verse/row has its own column.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kyubyong Park (2018). The World English Bible [Dataset]. https://www.kaggle.com/datasets/bryanpark/the-world-english-bible-speech-dataset/discussion?sortBy=hot
Organization logo

The World English Bible

A large, single-speaker speech dataset in English

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 27, 2018
Dataset provided by
Kaggle
Authors
Kyubyong Park
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Context

The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its audio recordings are freely available at http://www.audiotreasure.com/. The only problem when you use those in speech-relevant tasks is that each file is too long. That's why I split each audio file such that an audio clip is equivalent to a verse. Subsequently I aligned them to the text.

Content

This dataset is composed of the following:
- README.md
- wav files sampled at 12,000 KHZ
- transcript.txt.

transcript.txt is in a tab-delimited format. The first column is the audio file paths. The second one is the script. Finally, the rightmost column is the duration of the audio file.

Acknowledgements

I would like to show my respect to Dave, the host of www.audiotreasure.com and the reader of the audio files.

Reference

You may want to check my project using this dataset at https://github.com/Kyubyong/tacotron.

Search
Clear search
Close search
Google apps
Main menu