100+ datasets found
  1. h

    consolidated-datasets

    • huggingface.co
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivendra S (2024). consolidated-datasets [Dataset]. https://huggingface.co/datasets/shivendrra/consolidated-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 24, 2024
    Authors
    Shivendra S
    Description

    Dataset Card for YouTubeTranscriptData

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    This dataset contains transcripts of around 167K youtube videos that include coding lectures, podcasts, interviews, news videos, commentary and song lyrics. Also there are multiple files that have been generated using webscrapping.

    Curated by: Shivendra Singh License: [none]

      Dataset Sources
    

    Repository: SmallLanguageModel Demo [optional]: [More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/shivendrra/consolidated-datasets.

  2. Consolidated Report of Condition and Income for Edge and Agreement...

    • catalog.data.gov
    • datasets.ai
    Updated Dec 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Board of Governors of the Federal Reserve System (2024). Consolidated Report of Condition and Income for Edge and Agreement Corporations [Dataset]. https://catalog.data.gov/dataset/consolidated-report-of-condition-and-income-for-edge-and-agreement-corporations
    Explore at:
    Dataset updated
    Dec 18, 2024
    Dataset provided by
    Federal Reserve Board of Governors
    Federal Reserve Systemhttp://www.federalreserve.gov/
    Description

    The Consolidated Report of Condition and Income for Edge and Agreement Corporations (FR 2886b) collects financial data from Edge and agreement corporations. It is filed quarterly or annually based on consolidated asset criteria.

  3. h

    speech-emotion-dataset-consolidated

    • huggingface.co
    Updated Jun 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathan Roll (2025). speech-emotion-dataset-consolidated [Dataset]. https://huggingface.co/datasets/NathanRoll/speech-emotion-dataset-consolidated
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Nathan Roll
    Description

    NathanRoll/speech-emotion-dataset-consolidated dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    medical-intent-audio-dataset-consolidated

    • huggingface.co
    Updated Mar 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreyas Dikshit (2024). medical-intent-audio-dataset-consolidated [Dataset]. https://huggingface.co/datasets/shreyas1104/medical-intent-audio-dataset-consolidated
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 25, 2024
    Authors
    Shreyas Dikshit
    Description

    shreyas1104/medical-intent-audio-dataset-consolidated dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. o

    consolidated

    • opencontext.org
    Updated Dec 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Tuck (2021). consolidated [Dataset]. https://opencontext.org/types/c57371fd-36db-4038-bb22-2e46341848a1
    Explore at:
    Dataset updated
    Dec 19, 2021
    Dataset provided by
    Open Context
    Authors
    Anthony Tuck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An Open Context "types" dataset item. Open Context publishes structured data as granular, URL identified Web resources. This record is part of the "Murlo" data publication.

  6. [DEPRECATED] Consolidated list of persons, groups and entities subject to EU...

    • data.europa.eu
    csv, html, pdf +2
    Updated Dec 11, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Service for Foreign Policy Instruments (2018). [DEPRECATED] Consolidated list of persons, groups and entities subject to EU financial sanctions [Dataset]. https://data.europa.eu/euodp/ga/data/dataset/consolidated-list-of-persons-groups-and-entities-subject-to-eu-financial-sanctions
    Explore at:
    xml, html, pdf, rss feed, csvAvailable download formats
    Dataset updated
    Dec 11, 2018
    Dataset authored and provided by
    Service for Foreign Policy Instruments
    License

    http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj

    Area covered
    European Union
    Description

    In its policy, the European Union intervenes when necessary to prevent conflict or in response to emerging or actual crises. In certain cases, EU intervention can take the form of restrictive measures or 'sanctions'. The application of financial sanctions and more precisely the freezing of assets constitutes an obligation for both the public and private sector. In this regard, a particular responsibility falls on credit and financial institutions, since they are involved in the bulk of financial transfers.

    In order to facilitate the application of financial sanctions, the European Banking Federation, the European Savings Banks Group, the European Association of Co-operative Banks, the European Association of Public Banks ("the EU Credit Sector Federations") and the European Commission recognised the need for an EU consolidated list of persons, groups and entities subject to financial sanctions and more precisely the freezing of assets. The Credit Sector Federations set up an initial database containing the consolidated list. The European Commission subsequently took over this database and is responsible for its maintenance and for keeping the consolidated list of sanctions up-to-date. In this respect, the Service for Foreign Policy Instruments (FPI) of the European Commission launched a new Web page in June 2017, where the consolidated lists of financial sanctions consisting in freezing of assets are published in different formats (see link below).

    Disclaimer: While every effort is made to ensure that the database and the consolidated list correctly reproduce all relevant data of the officially adopted texts published in the Official Journal of the European Union, neither the Commission nor the EU Credit Sector Federations accepts any liability for possible omissions of relevant data or mistakes, and nor for any use the database or of the consolidated list. Only the information published in the Official Journal of the EU is deemed authentic.

  7. e

    EDF group’s consolidated generation capacity

    • opendata.edf.fr
    • opendata-edf.opendatasoft.com
    csv, excel, json
    Updated Jun 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). EDF group’s consolidated generation capacity [Dataset]. https://opendata.edf.fr/explore/dataset/capacites-de-production-consolidees-du-groupe-edf/
    Explore at:
    csv, excel, jsonAvailable download formats
    Dataset updated
    Jun 13, 2025
    License

    Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
    License information was derived automatically

    Description

    Abstract:

    Energy mix of the EDF Group's installed capacities worldwide (countries in which EDF is present). Units are expressed either in MWe, which corresponds to electrical power, or in MWth, which corresponds to heat power.

    Data consolidated according to EDF's shareholding in Group companies, including investments in associates and joint ventures.

    Detailed description:

    EDF is a Group comprising a number of companies and affiliates. To consult the simplified organization chart of the Group, click here.

    Also, when we want to get an overall view of the Group's energy production, for example, we have to carry out what is called a consolidation of all our affiliates’ production. For this purpose, two consolidation methods are possible:

    Consolidation by full integration

    Only the affiliates over which EDF has control are consolidated. In this financial approach, subsidiaries are consolidated at 100%, regardless of their ownership rate. Entities over which EDF does not have control are therefore not consolidated at all.

    Net consolidation (or sometimes called Patrimonial)

    All affiliates are consolidated, provided that EDF holds a stake in them. They are then consolidated according to EDF's share of ownership.

  8. d

    Consolidated State Performance Report, 2012-13

    • catalog.data.gov
    • data.amerigeoss.org
    • +1more
    Updated Aug 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Elementary and Secondary Education (OESE) (2023). Consolidated State Performance Report, 2012-13 [Dataset]. https://catalog.data.gov/dataset/consolidated-state-performance-report-2012-13-50b5a
    Explore at:
    Dataset updated
    Aug 12, 2023
    Dataset provided by
    Office of Elementary and Secondary Education
    Description

    The Consolidated State Performance Report, 2012-13 (CSPR 2012-13), is part of the Consolidated State Performance Report (CSPR) program: a required annual reporting tool for each State, the Bureau of Indian Education, the District of Columbia, and Puerto Rico; program data is available since 2005-06 at . CSPR 2012-13 is a cross-sectional report that measures each state's progress towards implementation of the No Child Left Behind Act (NCLB) and the reporting instrument for state formula grant programs authorized by the Elementary and Secondary Education Act (ESEA) as amended by NCLB. The reporting was conducted using state education agencies' (SEAs) reports in the EDFacts online submission system. CSPR 2012-13 is a universe survey. The study's response rate is expected to be 100%. Key statistics include information on adequate yearly progress, state performance assessments, highly qualified teachers, public school choice and supplemental education services options.

  9. n

    Imnaviat Creek Alaska Fen Station IC 1523 Consolidated Dataset - Datasets -...

    • catalog.northslopescience.org
    Updated Feb 23, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Imnaviat Creek Alaska Fen Station IC 1523 Consolidated Dataset - Datasets - North Slope Science Catalog [Dataset]. https://catalog.northslopescience.org/dataset/490
    Explore at:
    Dataset updated
    Feb 23, 2016
    Area covered
    Alaska, North Slope Borough
    Description

    Ridge and fen sites at Imnaviat Creek were monitored with identical sensor suites. Measurements include eddy covariance, carbon dioxide, water vapor, energy, wind speed, air temperature,

  10. US EPA The Consolidated Human Activity Database (CHAD)

    • catalog.data.gov
    • data.ca.gov
    • +1more
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Environmental Protection Agency (2024). US EPA The Consolidated Human Activity Database (CHAD) [Dataset]. https://catalog.data.gov/dataset/us-epa-the-consolidated-human-activity-database-chad
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    California Environmental Protection Agencyhttps://calepa.ca.gov/
    Description

    The Consolidated Human Activity Database (CHAD) is a resource for learning about human exposure and health studies and predictive models.

  11. U

    United States State Street Bank (STT): Consolidated Assets

    • ceicdata.com
    Updated Feb 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). United States State Street Bank (STT): Consolidated Assets [Dataset]. https://www.ceicdata.com/en/united-states/commercial-banks-consolidated-assets/state-street-bank-stt-consolidated-assets
    Explore at:
    Dataset updated
    Feb 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 1, 2017 - Dec 1, 2019
    Area covered
    United States
    Description

    United States State Street Bank (STT): Consolidated Assets data was reported at 242.148 USD bn in Dec 2019. This records an increase from the previous number of 241.364 USD bn for Sep 2019. United States State Street Bank (STT): Consolidated Assets data is updated quarterly, averaging 167.260 USD bn from Mar 2001 (Median) to Dec 2019, with 76 observations. The data reached an all-time high of 289.425 USD bn in Jun 2015 and a record low of 62.663 USD bn in Mar 2001. United States State Street Bank (STT): Consolidated Assets data remains active status in CEIC and is reported by Federal Reserve Board. The data is categorized under Global Database’s United States – Table US.KB006: Commercial Banks: Consolidated Assets.

  12. g

    Consolidated file of recharging terminals for Electric Vehicles | gimi9.com

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Consolidated file of recharging terminals for Electric Vehicles | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_5448d3e0c751df01f85d0572_1
    Explore at:
    License

    Licence Ouverte / Open Licence 1.0https://www.etalab.gouv.fr/wp-content/uploads/2014/05/Open_Licence.pdf
    License information was derived automatically

    Description

    Planners, communities, data producers: find here the complete documentation to reference your terminals. ## Context In order to establish a national recharging infrastructure directory for electric vehicles (EVRI), open and accessible to all, local authorities carrying a project to install EVRI must, as the stations are put into service, publish static data on the location and technical characteristics of these installations on the data.gouv.fr platform as defined in Order of 4 May 2021. Etalab consolidates all the datasets produced by the various territorial players on a consolidated dataset. It aims to be as exhaustive as possible and aims to bring together all French IRVE terminals. ## Versions A new version of data scheme was published on 17 October 2022 (v2.1.0). It simplifies version 2.0.3 by making certain fields optional. Version v1.0.3 of the schema is no longer consolidated but remains historised in this dataset. ## Consolidation For your data to be integrated into the national consolidated database, you must first produce and publish your own dataset according to the data scheme. If you don't find your data in the consolidated database, it’s because it probably contains errors compared to the expected schema. To get the error report, you can use the [Validata] tool(https://validata.fr/table-schema?schema_name=schema-datagouvfr.etalab%2Fschema-irve). For more information on the process from data production to consolidation, visit here From the production of your data to their consolidation in the national database

  13. U

    United States STT: Cumulative Assets as % of Consolidated Assets

    • ceicdata.com
    Updated May 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2020). United States STT: Cumulative Assets as % of Consolidated Assets [Dataset]. https://www.ceicdata.com/en/united-states/commercial-banks-consolidated-assets
    Explore at:
    Dataset updated
    May 15, 2020
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 1, 2017 - Dec 1, 2019
    Area covered
    United States
    Description

    STT: Cumulative Assets as % of Consolidated Assets data was reported at 59.000 % in Dec 2019. This records an increase from the previous number of 56.000 % for Sep 2019. STT: Cumulative Assets as % of Consolidated Assets data is updated quarterly, averaging 57.000 % from Mar 2001 (Median) to Dec 2019, with 76 observations. The data reached an all-time high of 63.000 % in Mar 2010 and a record low of 46.000 % in Dec 2002. STT: Cumulative Assets as % of Consolidated Assets data remains active status in CEIC and is reported by Federal Reserve Board. The data is categorized under Global Database’s United States – Table US.KB006: Commercial Banks: Consolidated Assets.

  14. O

    UN Security Council Consolidated Sanctions

    • opensanctions.org
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United Nations Security Council (2025). UN Security Council Consolidated Sanctions [Dataset]. https://www.opensanctions.org/datasets/un_sc_sanctions/
    Explore at:
    json, application/json+senzing, txt, application/json+ftm, csv, xmlAvailable download formats
    Dataset updated
    Jul 12, 2025
    Dataset authored and provided by
    United Nations Security Councilhttp://un.org/sc
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The Security Council's set of sanctions serve as the foundation for most national sanctions lists.

  15. Consolidated Human Activities Database (CHAD)

    • catalog.data.gov
    • data.amerigeoss.org
    • +1more
    Updated Jun 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) - National Exposure Research Laboratory (NERL) (2021). Consolidated Human Activities Database (CHAD) [Dataset]. https://catalog.data.gov/dataset/consolidated-human-activities-database-chad
    Explore at:
    Dataset updated
    Jun 19, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The Consolidated Human Activity Database (CHAD) contains data obtained from human activity studies that were collected at city, state, and national levels. CHAD is intended to be an input file for exposure/intake dose modeling and/or statistical analysis.

  16. p

    Trends in Two or More Races Student Percentage (2013-2023): Manton...

    • publicschoolreview.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review, Trends in Two or More Races Student Percentage (2013-2023): Manton Consolidated High School vs. Michigan vs. Manton Consolidated Schools School District [Dataset]. https://www.publicschoolreview.com/manton-consolidated-high-school-profile
    Explore at:
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Manton, Manton Consolidated Schools
    Description

    This dataset tracks annual two or more races student percentage from 2013 to 2023 for Manton Consolidated High School vs. Michigan and Manton Consolidated Schools School District

  17. P

    CORD Dataset

    • paperswithcode.com
    Updated Mar 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seunghyun Park; Seung Shin; Bado Lee; Junyeop Lee; Jaeheung Surh; Minjoon Seo; Hwalsuk Lee (2024). CORD Dataset [Dataset]. https://paperswithcode.com/dataset/cord
    Explore at:
    Dataset updated
    Mar 27, 2024
    Authors
    Seunghyun Park; Seung Shin; Bado Lee; Junyeop Lee; Jaeheung Surh; Minjoon Seo; Hwalsuk Lee
    Description

    OCR is inevitably linked to NLP since its final output is in text. Advances in document intelligence are driving the need for a unified technology that integrates OCR with various NLP tasks, especially semantic parsing. Since OCR and semantic parsing have been studied as separate tasks so far, the datasets for each task on their own are rich, while those for the integrated post-OCR parsing tasks are relatively insufficient. In this study, we publish a consolidated dataset for receipt parsing as the first step towards post-OCR parsing tasks. The dataset consists of thousands of Indonesian receipts, which contains images and box/text annotations for OCR, and multi-level semantic labels for parsing. The proposed dataset can be used to address various OCR and parsing tasks.

  18. EURL ECVAM Genotoxicity and Carcinogenicity Consolidated Database of Ames...

    • data.europa.eu
    excel xlsx
    Updated Oct 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joint Research Centre (2020). EURL ECVAM Genotoxicity and Carcinogenicity Consolidated Database of Ames Negative Chemicals [Dataset]. https://data.europa.eu/data/datasets/38701804-bc00-43c1-8af1-fe2d5265e8d7?locale=et
    Explore at:
    excel xlsxAvailable download formats
    Dataset updated
    Oct 29, 2020
    Dataset authored and provided by
    Joint Research Centrehttps://joint-research-centre.ec.europa.eu/index_en
    License

    http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj

    Description

    The EURL ECVAM Genotoxicity and Carcinogenicity Consolidated Database is a structured master database that compiles available genotoxicity and carcinogenicity information, originating from different sources, for substances tested in the bacterial reverse mutation test (Ames test). The JRC presents here a new curated collection of 211 substances eliciting negative results in the Ames test. The collection adds to the previously published EURL ECVAM Genotoxicity and Carcinogenicity Consolidated Database of Ames positive substances (https://data.jrc.ec.europa.eu/dataset/jrc-eurl-ecvam-genotoxicity-carcinogenicity-ames) that has become, over the years, a reference for a number of activities in the area of genotoxicity testing across different product-type sectors, including regulatory initiatives, exploratory projects, development of testing strategies, and validation of new genotoxicity tests. Detailed information on the data and construction of the database are reported in Madia et al. 2020 https://doi.org/10.1016/j.mrgentox.2020.503199, recently published in Mutation Research - Genetic Toxicology and Environmental Mutagenesis journal.

    On October 29, 2020, the database was updated. The overall call for Benzoin, in vivo UDS, should be [-]. Thus, in the table: Column CC, row 31: change “[+] in rat hepatocytes#Glauert et al. 1985” to “[+] in vitro rat hepatocytes#Glauert et al. 1985” . Column CD, row 31: change “[+]” to “[-]”.

  19. O

    Consolidated Sanctions

    • opensanctions.org
    Updated Jul 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenSanctions Datenbanken GmbH (2025). Consolidated Sanctions [Dataset]. https://www.opensanctions.org/datasets/sanctions/
    Explore at:
    txt, json, csv, application/json+ftm, application/json+senzingAvailable download formats
    Dataset updated
    Jul 14, 2025
    Dataset authored and provided by
    OpenSanctions Datenbanken GmbH
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Consolidated list of sanctioned entities designated by different countries and international organisations. This can include military, trade and travel restrictions.

  20. U

    United States PNC: Domestic Assets as % of Consolidated Assets

    • ceicdata.com
    Updated May 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2020). United States PNC: Domestic Assets as % of Consolidated Assets [Dataset]. https://www.ceicdata.com/en/united-states/commercial-banks-consolidated-assets
    Explore at:
    Dataset updated
    May 15, 2020
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 1, 2017 - Dec 1, 2019
    Area covered
    United States
    Description

    PNC: Domestic Assets as % of Consolidated Assets data was reported at 99.000 % in Dec 2019. This stayed constant from the previous number of 99.000 % for Sep 2019. PNC: Domestic Assets as % of Consolidated Assets data is updated quarterly, averaging 99.000 % from Mar 2001 (Median) to Dec 2019, with 76 observations. The data reached an all-time high of 99.000 % in Dec 2019 and a record low of 98.000 % in Jun 2007. PNC: Domestic Assets as % of Consolidated Assets data remains active status in CEIC and is reported by Federal Reserve Board. The data is categorized under Global Database’s United States – Table US.KB006: Commercial Banks: Consolidated Assets.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shivendra S (2024). consolidated-datasets [Dataset]. https://huggingface.co/datasets/shivendrra/consolidated-datasets

consolidated-datasets

shivendrra/consolidated-datasets

Explore at:
348 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 24, 2024
Authors
Shivendra S
Description

Dataset Card for YouTubeTranscriptData

  Dataset Details





  Dataset Description

This dataset contains transcripts of around 167K youtube videos that include coding lectures, podcasts, interviews, news videos, commentary and song lyrics. Also there are multiple files that have been generated using webscrapping.

Curated by: Shivendra Singh License: [none]

  Dataset Sources

Repository: SmallLanguageModel Demo [optional]: [More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/shivendrra/consolidated-datasets.

Search
Clear search
Close search
Google apps
Main menu