Dataset Card for YouTubeTranscriptData
Dataset Details
Dataset Description
This dataset contains transcripts of around 167K youtube videos that include coding lectures, podcasts, interviews, news videos, commentary and song lyrics. Also there are multiple files that have been generated using webscrapping.
Curated by: Shivendra Singh License: [none]
Dataset Sources
Repository: SmallLanguageModel Demo [optional]: [More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/shivendrra/consolidated-datasets.
The Consolidated Report of Condition and Income for Edge and Agreement Corporations (FR 2886b) collects financial data from Edge and agreement corporations. It is filed quarterly or annually based on consolidated asset criteria.
NathanRoll/speech-emotion-dataset-consolidated dataset hosted on Hugging Face and contributed by the HF Datasets community
shreyas1104/medical-intent-audio-dataset-consolidated dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An Open Context "types" dataset item. Open Context publishes structured data as granular, URL identified Web resources. This record is part of the "Murlo" data publication.
http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj
In its policy, the European Union intervenes when necessary to prevent conflict or in response to emerging or actual crises. In certain cases, EU intervention can take the form of restrictive measures or 'sanctions'. The application of financial sanctions and more precisely the freezing of assets constitutes an obligation for both the public and private sector. In this regard, a particular responsibility falls on credit and financial institutions, since they are involved in the bulk of financial transfers.
In order to facilitate the application of financial sanctions, the European Banking Federation, the European Savings Banks Group, the European Association of Co-operative Banks, the European Association of Public Banks ("the EU Credit Sector Federations") and the European Commission recognised the need for an EU consolidated list of persons, groups and entities subject to financial sanctions and more precisely the freezing of assets. The Credit Sector Federations set up an initial database containing the consolidated list. The European Commission subsequently took over this database and is responsible for its maintenance and for keeping the consolidated list of sanctions up-to-date. In this respect, the Service for Foreign Policy Instruments (FPI) of the European Commission launched a new Web page in June 2017, where the consolidated lists of financial sanctions consisting in freezing of assets are published in different formats (see link below).
Disclaimer: While every effort is made to ensure that the database and the consolidated list correctly reproduce all relevant data of the officially adopted texts published in the Official Journal of the European Union, neither the Commission nor the EU Credit Sector Federations accepts any liability for possible omissions of relevant data or mistakes, and nor for any use the database or of the consolidated list. Only the information published in the Official Journal of the EU is deemed authentic.
Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
License information was derived automatically
Abstract:
Energy mix of the EDF Group's installed capacities worldwide (countries in which EDF is present). Units are expressed either in MWe, which corresponds to electrical power, or in MWth, which corresponds to heat power.
Data consolidated according to EDF's shareholding in Group companies, including investments in associates and joint ventures.
Detailed description:
EDF is a Group comprising a number of companies and affiliates. To consult the simplified organization chart of the Group, click here.
Also, when we want to get an overall view of the Group's energy production, for example, we have to carry out what is called a consolidation of all our affiliates’ production. For this purpose, two consolidation methods are possible:
Consolidation by full integration
Only the affiliates over which EDF has control are consolidated. In this financial approach, subsidiaries are consolidated at 100%, regardless of their ownership rate. Entities over which EDF does not have control are therefore not consolidated at all.
Net consolidation (or sometimes called Patrimonial)
All affiliates are consolidated, provided that EDF holds a stake in them. They are then consolidated according to EDF's share of ownership.
The Consolidated State Performance Report, 2012-13 (CSPR 2012-13), is part of the Consolidated State Performance Report (CSPR) program: a required annual reporting tool for each State, the Bureau of Indian Education, the District of Columbia, and Puerto Rico; program data is available since 2005-06 at . CSPR 2012-13 is a cross-sectional report that measures each state's progress towards implementation of the No Child Left Behind Act (NCLB) and the reporting instrument for state formula grant programs authorized by the Elementary and Secondary Education Act (ESEA) as amended by NCLB. The reporting was conducted using state education agencies' (SEAs) reports in the EDFacts online submission system. CSPR 2012-13 is a universe survey. The study's response rate is expected to be 100%. Key statistics include information on adequate yearly progress, state performance assessments, highly qualified teachers, public school choice and supplemental education services options.
Ridge and fen sites at Imnaviat Creek were monitored with identical sensor suites. Measurements include eddy covariance, carbon dioxide, water vapor, energy, wind speed, air temperature,
The Consolidated Human Activity Database (CHAD) is a resource for learning about human exposure and health studies and predictive models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States State Street Bank (STT): Consolidated Assets data was reported at 242.148 USD bn in Dec 2019. This records an increase from the previous number of 241.364 USD bn for Sep 2019. United States State Street Bank (STT): Consolidated Assets data is updated quarterly, averaging 167.260 USD bn from Mar 2001 (Median) to Dec 2019, with 76 observations. The data reached an all-time high of 289.425 USD bn in Jun 2015 and a record low of 62.663 USD bn in Mar 2001. United States State Street Bank (STT): Consolidated Assets data remains active status in CEIC and is reported by Federal Reserve Board. The data is categorized under Global Database’s United States – Table US.KB006: Commercial Banks: Consolidated Assets.
Licence Ouverte / Open Licence 1.0https://www.etalab.gouv.fr/wp-content/uploads/2014/05/Open_Licence.pdf
License information was derived automatically
Planners, communities, data producers: find here the complete documentation to reference your terminals. ## Context In order to establish a national recharging infrastructure directory for electric vehicles (EVRI), open and accessible to all, local authorities carrying a project to install EVRI must, as the stations are put into service, publish static data on the location and technical characteristics of these installations on the data.gouv.fr platform as defined in Order of 4 May 2021. Etalab consolidates all the datasets produced by the various territorial players on a consolidated dataset. It aims to be as exhaustive as possible and aims to bring together all French IRVE terminals. ## Versions A new version of data scheme was published on 17 October 2022 (v2.1.0). It simplifies version 2.0.3 by making certain fields optional. Version v1.0.3 of the schema is no longer consolidated but remains historised in this dataset. ## Consolidation For your data to be integrated into the national consolidated database, you must first produce and publish your own dataset according to the data scheme. If you don't find your data in the consolidated database, it’s because it probably contains errors compared to the expected schema. To get the error report, you can use the [Validata] tool(https://validata.fr/table-schema?schema_name=schema-datagouvfr.etalab%2Fschema-irve). For more information on the process from data production to consolidation, visit here From the production of your data to their consolidation in the national database
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
STT: Cumulative Assets as % of Consolidated Assets data was reported at 59.000 % in Dec 2019. This records an increase from the previous number of 56.000 % for Sep 2019. STT: Cumulative Assets as % of Consolidated Assets data is updated quarterly, averaging 57.000 % from Mar 2001 (Median) to Dec 2019, with 76 observations. The data reached an all-time high of 63.000 % in Mar 2010 and a record low of 46.000 % in Dec 2002. STT: Cumulative Assets as % of Consolidated Assets data remains active status in CEIC and is reported by Federal Reserve Board. The data is categorized under Global Database’s United States – Table US.KB006: Commercial Banks: Consolidated Assets.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The Security Council's set of sanctions serve as the foundation for most national sanctions lists.
The Consolidated Human Activity Database (CHAD) contains data obtained from human activity studies that were collected at city, state, and national levels. CHAD is intended to be an input file for exposure/intake dose modeling and/or statistical analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual two or more races student percentage from 2013 to 2023 for Manton Consolidated High School vs. Michigan and Manton Consolidated Schools School District
OCR is inevitably linked to NLP since its final output is in text. Advances in document intelligence are driving the need for a unified technology that integrates OCR with various NLP tasks, especially semantic parsing. Since OCR and semantic parsing have been studied as separate tasks so far, the datasets for each task on their own are rich, while those for the integrated post-OCR parsing tasks are relatively insufficient. In this study, we publish a consolidated dataset for receipt parsing as the first step towards post-OCR parsing tasks. The dataset consists of thousands of Indonesian receipts, which contains images and box/text annotations for OCR, and multi-level semantic labels for parsing. The proposed dataset can be used to address various OCR and parsing tasks.
http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj
The EURL ECVAM Genotoxicity and Carcinogenicity Consolidated Database is a structured master database that compiles available genotoxicity and carcinogenicity information, originating from different sources, for substances tested in the bacterial reverse mutation test (Ames test). The JRC presents here a new curated collection of 211 substances eliciting negative results in the Ames test. The collection adds to the previously published EURL ECVAM Genotoxicity and Carcinogenicity Consolidated Database of Ames positive substances (https://data.jrc.ec.europa.eu/dataset/jrc-eurl-ecvam-genotoxicity-carcinogenicity-ames) that has become, over the years, a reference for a number of activities in the area of genotoxicity testing across different product-type sectors, including regulatory initiatives, exploratory projects, development of testing strategies, and validation of new genotoxicity tests. Detailed information on the data and construction of the database are reported in Madia et al. 2020 https://doi.org/10.1016/j.mrgentox.2020.503199, recently published in Mutation Research - Genetic Toxicology and Environmental Mutagenesis journal.
On October 29, 2020, the database was updated. The overall call for Benzoin, in vivo UDS, should be [-]. Thus, in the table: Column CC, row 31: change “[+] in rat hepatocytes#Glauert et al. 1985” to “[+] in vitro rat hepatocytes#Glauert et al. 1985” . Column CD, row 31: change “[+]” to “[-]”.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Consolidated list of sanctioned entities designated by different countries and international organisations. This can include military, trade and travel restrictions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PNC: Domestic Assets as % of Consolidated Assets data was reported at 99.000 % in Dec 2019. This stayed constant from the previous number of 99.000 % for Sep 2019. PNC: Domestic Assets as % of Consolidated Assets data is updated quarterly, averaging 99.000 % from Mar 2001 (Median) to Dec 2019, with 76 observations. The data reached an all-time high of 99.000 % in Dec 2019 and a record low of 98.000 % in Jun 2007. PNC: Domestic Assets as % of Consolidated Assets data remains active status in CEIC and is reported by Federal Reserve Board. The data is categorized under Global Database’s United States – Table US.KB006: Commercial Banks: Consolidated Assets.
Dataset Card for YouTubeTranscriptData
Dataset Details
Dataset Description
This dataset contains transcripts of around 167K youtube videos that include coding lectures, podcasts, interviews, news videos, commentary and song lyrics. Also there are multiple files that have been generated using webscrapping.
Curated by: Shivendra Singh License: [none]
Dataset Sources
Repository: SmallLanguageModel Demo [optional]: [More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/shivendrra/consolidated-datasets.