100+ datasets found

Academic article descriptive statistics.
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noah Haber; Emily R. Smith; Ellen Moscoe; Kathryn Andrews; Robin Audy; Winnie Bell; Alana T. Brennan; Alexander Breskin; Jeremy C. Kane; Mahesh Karra; Elizabeth S. McClure; Elizabeth A. Suarez (2023). Academic article descriptive statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0196346.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0196346.t002
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Noah Haber; Emily R. Smith; Ellen Moscoe; Kathryn Andrews; Robin Audy; Winnie Bell; Alana T. Brennan; Alexander Breskin; Jeremy C. Kane; Mahesh Karra; Elizabeth S. McClure; Elizabeth A. Suarez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Academic article descriptive statistics.
A study of the impact of data sharing on article citations using journal...
plos.figshare.com
dataverse.harvard.edu
+1more
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Garret Christensen; Allan Dafoe; Edward Miguel; Don A. Moore; Andrew K. Rose (2023). A study of the impact of data sharing on article citations using journal policies as a natural experiment [Dataset]. http://doi.org/10.1371/journal.pone.0225883
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0225883
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Garret Christensen; Allan Dafoe; Edward Miguel; Don A. Moore; Andrew K. Rose
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study estimates the effect of data sharing on the citations of academic articles, using journal policies as a natural experiment. We begin by examining 17 high-impact journals that have adopted the requirement that data from published articles be publicly posted. We match these 17 journals to 13 journals without policy changes and find that empirical articles published just before their change in editorial policy have citation rates with no statistically significant difference from those published shortly after the shift. We then ask whether this null result stems from poor compliance with data sharing policies, and use the data sharing policy changes as instrumental variables to examine more closely two leading journals in economics and political science with relatively strong enforcement of new data policies. We find that articles that make their data available receive 97 additional citations (estimate standard error of 34). We conclude that: a) authors who share data may be rewarded eventually with additional scholarly citations, and b) data-posting policies alone do not increase the impact of articles published in a journal unless those policies are enforced.
Data articles in journals
zenodo.org
bin, csv, txt
Updated Sep 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Francisco Rubio; Francisco Rubio (2023). Data articles in journals [Dataset]. http://doi.org/10.5281/zenodo.3921974
Explore at:
csv, txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3921974
Dataset updated
Sep 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Francisco Rubio; Francisco Rubio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Last Version: 2

Author: Francisco Rubio, Universitat Politècnia de València.

Date of data collection: 2020/06/23

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:

- data_articles_journal_list_v2.xlsx: full list of 56 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v2.csv: full list of 56 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 2nd version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)

Total size: 32 KB

Version 1: Description

This dataset contains a list of journals that publish data articles, code, software articles and database articles.

The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals.
Acknowledgements:
Xaquín Lores Torres for his invaluable help in preparing this dataset.
Data from: Estimation of the Emission Characteristics of SVOCs from...
datasets.ai
catalog.data.gov
53
Updated Sep 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2024). Estimation of the Emission Characteristics of SVOCs from Household Articles Using Group-Contribution Methods [Dataset]. https://datasets.ai/datasets/estimation-of-the-emission-characteristics-of-svocs-from-household-articles-using-group-co
Explore at:
53Available download formats
Dataset updated
Sep 18, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Authors
U.S. Environmental Protection Agency
Description
The attached dataset, including data dictionary, provides all the EPA-generated data for the publication Estimation of the Emission Characteristics of SVOCs from Household Articles Using Group Contribution Methods” by Cody K. Addington, Katherine A. Phillips, and Kristin K. Isaacs.

This dataset is associated with the following publication: Addington, C., K. Phillips, and K. Isaacs. Estimation of the Emission Characteristics of SVOCs from Household Articles Using Group Contribution Methods. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 54(1): 110-119, (2020).
Z
Dataset: Shell Commands Used by Participants of Hands-on Cybersecurity...
data.niaid.nih.gov
zenodo.org
Updated Jul 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pavel Seda (2023). Dataset: Shell Commands Used by Participants of Hands-on Cybersecurity Training [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5137354
Explore at:
Dataset updated
Jul 18, 2023
Dataset provided by
Valdemar Švábenský
Pavel Čeleda
Jan Vykopal
Pavel Seda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains supplementary materials for the following journal paper:

Valdemar Švábenský, Jan Vykopal, Pavel Seda, Pavel Čeleda. Dataset of Shell Commands Used by Participants of Hands-on Cybersecurity Training. In Elsevier Data in Brief. 2021. https://doi.org/10.1016/j.dib.2021.107398

How to cite

If you use or build upon the materials, please use the BibTeX entry below to cite the original paper (not only this web link).

@article{Svabensky2021dataset, author = {\v{S}v\'{a}bensk\'{y}, Valdemar and Vykopal, Jan and Seda, Pavel and \v{C}eleda, Pavel}, title = {{Dataset of Shell Commands Used by Participants of Hands-on Cybersecurity Training}}, journal = {Data in Brief}, publisher = {Elsevier}, volume = {38}, year = {2021}, issn = {2352-3409}, url = {https://doi.org/10.1016/j.dib.2021.107398}, doi = {10.1016/j.dib.2021.107398}, }

The data were collected using a logging toolset referenced here.

Attached content

Dataset (data.zip). The collected data are attached here on Zenodo. A copy is also available in this repository.

Analytical tools (toolset.zip). To analyze the data, you can instantiate the toolset or this project for ELK.

Version history

Version 1 (https://zenodo.org/record/5137355) contains 13446 log records from 175 trainees. These data are precisely those that are described in the associated journal paper. Version 1 provides a snapshot of the state when the article was published.

Version 2 (https://zenodo.org/record/5517479) contains 13446 log records from 175 trainees. The data are unchanged from Version 1, but the analytical toolset includes a minor fix.

Version 3 (https://zenodo.org/record/6670113) contains 21762 log records from 275 trainees. It is a superset of Version 2, with newly collected data added to the dataset.

The current Version 4 (https://zenodo.org/record/8136017) contains 21459 log records from 275 trainees. Compared to Version 3, we cleaned 303 invalid/duplicate command records.
NCES Academic Library Survey Dataset 1996 - 2020 -- alsMERGE_2020.csv
figshare.com
txt
Updated Jan 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Starr Hoffman (2024). NCES Academic Library Survey Dataset 1996 - 2020 -- alsMERGE_2020.csv [Dataset]. http://doi.org/10.6084/m9.figshare.25007429.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25007429.v1
Dataset updated
Jan 16, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Starr Hoffman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data from the National Center for Education Statistics' Academic Library Survey, which was gathered every two years from 1996 - 2014, and annually in IPEDS starting in 2014 (this dataset has continued to only merge data every two years, following the original schedule). This data was merged, transformed, and used for research by Starr Hoffman and Samantha Godbey.This data was merged using R; R scripts for this merge can be made available upon request. Some variables changed names or definitions during this time; a view of these variables over time is provided in the related Figshare Project. Carnegie Classification changed several times during this period; all Carnegie classifications were crosswalked to the 2000 classification version; that information is also provided in the related Figshare Project. This data was used for research published in several articles, conference papers, and posters starting in 2018 (some of this research used an older version of the dataset which was deposited in the University of Nevada, Las Vegas's repository).SourcesAll data sources were downloaded from the National Center for Education Statistics website https://nces.ed.gov/. Individual datasets and years accessed are listed below.[dataset] U.S. Department of Education, National Center for Education Statistics, Academic Libraries component, Integrated Postsecondary Education Data System (IPEDS), (2020, 2018, 2016, 2014), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7[dataset] U.S. Department of Education, National Center for Education Statistics, Academic Libraries Survey (ALS) Public Use Data File, Library Statistics Program, (2012, 2010, 2008, 2006, 2004, 2002, 2000, 1998, 1996), https://nces.ed.gov/surveys/libraries/aca_data.asp[dataset] U.S. Department of Education, National Center for Education Statistics, Institutional Characteristics component, Integrated Postsecondary Education Data System (IPEDS), (2020, 2018, 2016, 2014), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7[dataset] U.S. Department of Education, National Center for Education Statistics, Fall Enrollment component, Integrated Postsecondary Education Data System (IPEDS), (2020, 2018, 2016, 2014, 2012, 2010, 2008, 2006, 2004, 2002, 2000, 1998, 1996), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7[dataset] U.S. Department of Education, National Center for Education Statistics, Human Resources component, Integrated Postsecondary Education Data System (IPEDS), (2020, 2018, 2016, 2014, 2012, 2010, 2008, 2006), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7[dataset] U.S. Department of Education, National Center for Education Statistics, Employees Assigned by Position component, Integrated Postsecondary Education Data System (IPEDS), (2004, 2002), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7[dataset] U.S. Department of Education, National Center for Education Statistics, Fall Staff component, Integrated Postsecondary Education Data System (IPEDS), (1999, 1997, 1995), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7
Data used by EPA researchers to generate illustrative figures for overview...
s.cnmilf.com
datasets.ai
+1more
Updated Nov 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Data used by EPA researchers to generate illustrative figures for overview article "Multiscale Modeling of Background Ozone: Research Needs to Inform and Improve Air Quality Management" [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/data-used-by-epa-researchers-to-generate-illustrative-figures-for-overview-article-multisc
Explore at:
Dataset updated
Nov 14, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Data sets used to prepare illustrative figures for the overview article “Multiscale Modeling of Background Ozone” Overview The CMAQ model output datasets used to create illustrative figures for this overview article were generated by scientists in EPA/ORD/CEMM and EPA/OAR/OAQPS. The EPA/ORD/CEMM-generated dataset consisted of hourly CMAQ output from two simulations. The first simulation was performed for July 1 – 31 over a 12 km modeling _domain covering the Western U.S. The simulation was configured with the Integrated Source Apportionment Method (ISAM) to estimate the contributions from 9 source categories to modeled ozone. ISAM source contributions for July 17 – 31 averaged over all grid cells located in Colorado were used to generate the illustrative pie chart in the overview article. The second simulation was performed for October 1, 2013 – August 31, 2014 over a 108 km modeling _domain covering the northern hemisphere. This simulation was also configured with ISAM to estimate the contributions from non-US anthropogenic sources, natural sources, stratospheric ozone, and other sources on ozone concentrations. Ozone ISAM results from this simulation were extracted along a boundary curtain of the 12 km modeling _domain specified over the Western U.S. for the time period January 1, 2014 – July 31, 2014 and used to generate the illustrative time-height cross-sections in the overview article. The EPA/OAR/OAQPS-generated dataset consisted of hourly gridded CMAQ output for surface ozone concentrations for the year 2016. The CMAQ simulations were performed over the northern hemisphere at a horizontal resolution of 108 km. NO2 and O3 data for July 2016 was extracted from these simulations generate the vertically-integrated column densities shown in the illustrative comparison to satellite-derived column densities. CMAQ Model Data The data from the CMAQ model simulations used in this research effort are very large (several terabytes) and cannot be uploaded to ScienceHub due to size restrictions. The model simulations are stored on the /asm archival system accessible through the atmos high-performance computing (HPC) system. Due to data management policies, files on /asm are subject to expiry depending on the template of the project. Files not requested for extension after the expiry date are deleted permanently from the system. The format of the files used in this analysis and listed below is ioapi/netcdf. Documentation of this format, including definitions of the geographical projection attributes contained in the file headers, are available at https://www.cmascenter.org/ioapi/ Documentation on the CMAQ model, including a description of the output file format and output model species can be found in the CMAQ documentation on the CMAQ GitHub site at https://github.com/USEPA/CMAQ. This dataset is associated with the following publication: Hogrefe, C., B. Henderson, G. Tonnesen, R. Mathur, and R. Matichuk. Multiscale Modeling of Background Ozone: Research Needs to Inform and Improve Air Quality Management. EM Magazine. Air and Waste Management Association, Pittsburgh, PA, USA, 1-6, (2020).
Expenditure and Consumption Survey, 2004 - West Bank and Gaza
catalog.ihsn.org
datacatalog.ihsn.org
+1more
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palestinian Central Bureau of Statistics (2019). Expenditure and Consumption Survey, 2004 - West Bank and Gaza [Dataset]. https://catalog.ihsn.org/index.php/catalog/3085
Explore at:
Dataset updated
Mar 29, 2019
Dataset authored and provided by
Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
Time period covered
2004 - 2005
Area covered
Palestine, West Bank
Description
Abstract

The basic goal of this survey is to provide the necessary database for formulating national policies at various levels. It represents the contribution of the household sector to the Gross National Product (GNP). Household Surveys help as well in determining the incidence of poverty, and providing weighted data which reflects the relative importance of the consumption items to be employed in determining the benchmark for rates and prices of items and services. Generally, the Household Expenditure and Consumption Survey is a fundamental cornerstone in the process of studying the nutritional status in the Palestinian territory.

The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality. Data is a public good, in the interest of the region, and it is consistent with the Economic Research Forum's mandate to make micro data available, aiding regional research on this important topic.

Geographic coverage

The survey data covers urban, rural and camp areas in West Bank and Gaza Strip.

Analysis unit

1- Household/families. 2- Individuals.

Universe

The survey covered all the Palestinian households who are a usual residence in the Palestinian Territory.

Kind of data

Sample survey data [ssd]

Sampling procedure

Sample and Frame:

The sampling frame consists of all enumeration areas which were enumerated in 1997; the enumeration area consists of buildings and housing units and is composed of an average of 120 households. The enumeration areas were used as Primary Sampling Units (PSUs) in the first stage of the sampling selection. The enumeration areas of the master sample were updated in 2003.

Sample Design:

The sample is a stratified cluster systematic random sample with two stages: First stage: selection of a systematic random sample of 299 enumeration areas. Second stage: selection of a systematic random sample of 12-18 households from each enumeration area selected in the first stage. A person (18 years and more) was selected from each household in the second stage.

Sample strata:

The population was divided by: 1- Governorate 2- Type of Locality (urban, rural, refugee camps)

Sample Size:

The calculated sample size is 3,781 households.

Target cluster size:

The target cluster size or "sample-take" is the average number of households to be selected per PSU. In this survey, the sample take is around 12 households.

Detailed information/formulas on the sampling design are available in the user manual.

Mode of data collection

Face-to-face [f2f]

Research instrument

The PECS questionnaire consists of two main sections:

First section: Certain articles / provisions of the form filled at the beginning of the month,and the remainder filled out at the end of the month. The questionnaire includes the following provisions:

Cover sheet: It contains detailed and particulars of the family, date of visit, particular of the field/office work team, number/sex of the family members.

Statement of the family members: Contains social, economic and demographic particulars of the selected family.

Statement of the long-lasting commodities and income generation activities: Includes a number of basic and indispensable items (i.e, Livestock, or agricultural lands).

Housing Characteristics: Includes information and data pertaining to the housing conditions, including type of shelter, number of rooms, ownership, rent, water, electricity supply, connection to the sewer system, source of cooking and heating fuel, and remoteness/proximity of the house to education and health facilities.

Monthly and Annual Income: Data pertaining to the income of the family is collected from different sources at the end of the registration / recording period.

Second section: The second section of the questionnaire includes a list of 54 consumption and expenditure groups itemized and serially numbered according to its importance to the family. Each of these groups contains important commodities. The number of commodities items in each for all groups stood at 667 commodities and services items. Groups 1-21 include food, drink, and cigarettes. Group 22 includes homemade commodities. Groups 23-45 include all items except for food, drink and cigarettes. Groups 50-54 include all of the long-lasting commodities. Data on each of these groups was collected over different intervals of time so as to reflect expenditure over a period of one full year.

Cleaning operations

Raw Data

Both data entry and tabulation were performed using the ACCESS and SPSS software programs. The data entry process was organized in 6 files, corresponding to the main parts of the questionnaire. A data entry template was designed to reflect an exact image of the questionnaire, and included various electronic checks: logical check, range checks, consistency checks and cross-validation. Complete manual inspection was made of results after data entry was performed, and questionnaires containing field-related errors were sent back to the field for corrections.

Harmonized Data

The Statistical Package for Social Science (SPSS) is used to clean and harmonize the datasets.

The harmonization process starts with cleaning all raw data files received from the Statistical Office.

Cleaned data files are then all merged to produce one data file on the individual level containing all variables subject to harmonization.

A country-specific program is generated for each dataset to generate/compute/recode/rename/format/label harmonized variables.

A post-harmonization cleaning process is run on the data.

Harmonized data is saved on the household as well as the individual level, in SPSS and converted to STATA format.

Response rate

The survey sample consists of about 3,781 households interviewed over a twelve-month period between January 2004 and January 2005. There were 3,098 households that completed the interview, of which 2,060 were in the West Bank and 1,038 households were in GazaStrip. The response rate was 82% in the Palestinian Territory.

Sampling error estimates

The calculations of standard errors for the main survey estimations enable the user to identify the accuracy of estimations and the survey reliability. Total errors of the survey can be divided into two kinds: statistical errors, and non-statistical errors. Non-statistical errors are related to the procedures of statistical work at different stages, such as the failure to explain questions in the questionnaire, unwillingness or inability to provide correct responses, bad statistical coverage, etc. These errors depend on the nature of the work, training, supervision, and conducting all various related activities. The work team spared no effort at different stages to minimize non-statistical errors; however, it is difficult to estimate numerically such errors due to absence of technical computation methods based on theoretical principles to tackle them. On the other hand, statistical errors can be measured. Frequently they are measured by the standard error, which is the positive square root of the variance. The variance of this survey has been computed by using the “programming package” CENVAR.
r
Dataset for "Do LiU researchers publish data – and where? Dataset analysis...
researchdata.se
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaori Hoshi Larsson (2025). Dataset for "Do LiU researchers publish data – and where? Dataset analysis using ODDPub" [Dataset]. http://doi.org/10.5281/ZENODO.15017715
Explore at:
Unique identifier
https://doi.org/10.5281/ZENODO.15017715
Dataset updated
Mar 18, 2025
Dataset provided by
Linköping University
Authors
Kaori Hoshi Larsson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the results from the ODDPubb text mining algorithm and the findings from manual analysis. Full-text PDFs of all articles parallel-published by Linköping University in 2022 were extracted from the institute's repository, DiVA. These were analyzed using the ODDPubb (https://github.com/quest-bih/oddpub) text mining algorithm to determine the extent of data sharing and identify the repositories where the data was shared. In addition to the results from ODDPubb, manual analysis was conducted to confirm the presence of data sharing statements, assess data availability, and identify the repositories used.
Statistical Performance Indicators
datacatalog1.worldbank.org
api, csv, excel +2
Updated Mar 24, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (2021). Statistical Performance Indicators [Dataset]. https://datacatalog1.worldbank.org/search/dataset/0037996/Statistical-Performance-Indicators
Explore at:
utf-8, csv, excel, api, stataAvailable download formats
Dataset updated
Mar 24, 2021
Dataset provided by
World Bankhttps://www.worldbank.org/
License
https://datacatalog1.worldbank.org/public-licenses?fragment=cchttps://datacatalog1.worldbank.org/public-licenses?fragment=cc
Description
National statistical systems are facing significant challenges. These challenges arise from increasing demands for high quality and trustworthy data to guide decision making, coupled with the rapidly changing landscape of the data revolution. To help create a mechanism for learning amongst national statistical systems, the World Bank has developed improved Statistical Performance Indicators (SPI) to monitor the statistical performance of countries. The SPI focuses on five key dimensions of a country’s statistical performance: (i) data use, (ii) data services, (iii) data products, (iv) data sources, and (v) data infrastructure. This will replace the Statistical Capacity Index (SCI) that the World Bank has regularly published since 2004.

The SPI focus on five key pillars of a country’s statistical performance: (i) data use, (ii) data services, (iii) data products, (iv) data sources, and (v) data infrastructure. The SPI are composed of more than 50 indicators and contain data for 186 countries. This set of countries covers 99 percent of the world population. The data extend from 2016-2023, with some indicators going back to 2004.

For more information, consult the academic article published in the journal Scientific Data. https://www.nature.com/articles/s41597-023-01971-0.
4
Kinematics and ball velocity of baseball pitchers
data.4tu.nl
figshare.com
zip
Updated Sep 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Larisa Gomaz; Erik van der Graaff (2021). Kinematics and ball velocity of baseball pitchers [Dataset]. http://doi.org/10.4121/16691947.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/16691947.v1
Dataset updated
Sep 29, 2021
Dataset provided by
4TU.ResearchData
Authors
Larisa Gomaz; Erik van der Graaff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data underlying the research of Individualised ball velocity prediction in baseball pitching based on IMU data.
Data set contains kinematics and ball velocity of baseball pitchers from the national U18 baseball team as well as six baseball academies in the Netherlands.
u
Amazon review data 2018
cseweb.ucsd.edu
nijianmo.github.io
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
Explore at:
Dataset authored and provided by
UCSD CSE Research Project
Description
Context

This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

More reviews:

The total number of reviews is 233.1 million (142.8 million in 2014).

New reviews:

Current data includes reviews in the range May 1996 - Oct 2018.

Metadata: - We have added transaction metadata for each review shown on the review page.

Added more detailed metadata of the product landing page.

Acknowledgements

If you publish articles based on this dataset, please cite the following paper:

Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
m
Data from: Over-dispersed Count data for number of journal articles...
data.mendeley.com
Updated Jul 1, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olumide Adesina (2019). Over-dispersed Count data for number of journal articles published by university lecturers [Dataset]. http://doi.org/10.17632/r2gfzjcfsd.4
Explore at:
Unique identifier
https://doi.org/10.17632/r2gfzjcfsd.4
Dataset updated
Jul 1, 2019
Authors
Olumide Adesina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data set contains count of number of articles published by Covenant University Lecturers, in Ota, Ogun State, Nigeria. The dataset contains a sample of 126 lecturers comprising 99 from College of Business and Social Sciences, and 27 from College of Leadership. The dataset include the number of articles published by the lecturers from 2013-2015. The response variable was the number of article produced by lecturers (NOP) which was obtained by counting. Predictors are Gender of lecturers (SEX), male was coded 1, and female as 0, marital status (MS), married was coded as 1 and single as 0, number of children each lecturer have (CHD), years of teaching/lecturing experience (EXP), cadre indicating whether senior or junior lecturer, Assistant lecturer and lecturer II are categorized as Lower cadre, and coded as 0, while lecturer I up to professor are categorize as higher cadre, and coded as 1. Another predictor is number of undergraduate course(s) taught within the period of observation (UGC), and number of postgraduate course(s) taught within the period of observation (UPC).
Z
CT-FAN: A Multilingual dataset for Fake News Detection
data.niaid.nih.gov
Updated Oct 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gautam Kishore Shahi (2022). CT-FAN: A Multilingual dataset for Fake News Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4714516
Explore at:
Dataset updated
Oct 23, 2022
Dataset provided by
Juliane Köhler
Julia Maria Struß
Melanie Siegel
Michael Wiegand
Thomas Mandl
Gautam Kishore Shahi
Description
By downloading the data, you agree with the terms & conditions mentioned below:

Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.

Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

Citation

Please cite our work as

@InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}

@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

False - The main claim made in an article is untrue.

Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

True - This rating indicates that the primary elements of the main claim are demonstrably true.

Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

Cross-Lingual Task (German)

Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

Input Data

The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

ID- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

our rating - class of the news article as false, partially false, true, other

Output data format

public_id- Unique identifier of the news article

predicted_rating- predicted class

Sample File

public_id, predicted_rating 1, false 2, true

IMPORTANT!

We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

Related Work

Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104

Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.

Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.

Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
d
Overtone Journalistic Content Bot/Human Indicator Dataset
datarade.ai
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Overtone (2023). Overtone Journalistic Content Bot/Human Indicator Dataset [Dataset]. https://datarade.ai/data-products/overtone-journalistic-content-bot-human-indicator-dataset-overtone
Explore at:
Dataset updated
Jan 23, 2023
Dataset authored and provided by
Overtone
Area covered
Finland, Aruba, Russian Federation, Australia, Virgin Islands (U.S.), Panama, Belarus, Belize, Falkland Islands (Malvinas), Brazil
Description
We indicate how likely a piece of content is computer generated or human written. Content: any text in English or Spanish, from a single sentence to articles of 1,000s words length.

Data uniqueness: we use custom built and trained NLP algorithms to assess human effort metrics that are inherent in text content. We focus on what's in the text, not metadata such as publication or engagement. Our AI algorithms are co-created by NLP & journalism experts. Our datasets have all been human-reviewed and labeled.

Dataset: CSV containing URL and/or body text, with attributed scoring as an integer and model confidence as a percentage. We ignore metadata such as author, publication, date, word count, shares and so on, to provide a clean and maximally unbiased assessment of how much human effort has been invested in content. Our data is provided in CSV/RSS/JSON format. One row = one scored article. CSV contains URL and/or body text, with attributed scoring as an integer and model confidence as a percentage.

Integrity indicators provided as integers on a 1–5 scale. We also have custom models with 35 categories that can be added on request.

Data sourcing: public websites, crawlers, scrapers, other partnerships where available. We generally can assess content behind paywalls as well as without paywalls. We source from ~4,000 news outlets, examples include: Bloomberg, CNN, BCC are one each. Countries: all English-speaking markets world-wide. Includes English-language content from non English majority regions, such as Germany, Scandinavia, Japan. Also available in Spanish on request.

Use-cases: assessing the implicit integrity and reliability of an article. There is correlation between integrity and human value: we have shown that articles scoring highly according to our scales show increased, sustained, ongoing end-user engagement. Clients also use this to assess journalistic output, publication relevance and to create datasets of 'quality' journalism.

Overtone provides a range of qualitative metrics for journalistic, newsworthy and long-form content. We find, highlight and synthesise content that shows added human effort and, by extension, added human value.
h
cc_news
huggingface.co
Updated Jul 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vladimir Blagojevic (2018). cc_news [Dataset]. https://huggingface.co/datasets/vblagoje/cc_news
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 3, 2018
Authors
Vladimir Blagojevic
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for CC-News

Dataset Summary

CC-News dataset contains news articles from news sites all over the world. The data is available on AWS S3 in the Common Crawl bucket at /crawl-data/CC-NEWS/. This version of the dataset has been prepared using news-please - an integrated web crawler and information extractor for news.It contains 708241 English language news articles published between Jan 2017 and December 2019. It represents a small portion of the English… See the full description on the dataset page: https://huggingface.co/datasets/vblagoje/cc_news.
d
Data release associated with the journal article "Solar and sensor geometry,...
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Data release associated with the journal article "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States" [Dataset]. https://catalog.data.gov/dataset/data-release-associated-with-the-journal-article-solar-and-sensor-geometry-not-vegetation-
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Western United States, United States
Description
This dataset supports the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States" (DOI:10.1016/j.rse.2020.112013). The data release allows users to replicate, test, or further explore results. The dataset consists of 4 separate items based on the analysis approach used in the original publication 1) the 'Phenocam' dataset uses images from a phenocam in a pinyon juniper ecosystem in Grand Canyon National Park to determine phenological patterns of multiple plant species. The 'Phenocam' dataset consists of scripts and tabular data developed while performing analyses and includes the final NDVI values for all areas of interest (AOIs) described in the associated publication. 2) the 'SolarSensorAnalysis' dataset uses downloaded tabular MODIS data to explore relationships between NDVI and multiple solar and sensor angles. The 'SolarSensorAnalysis' dataset consists of download and analysis scripts in Google Earth Engine and R. The source MODIS data used in the analysis are too large to include but are provided through MODIS providers and can be accessed through Google Earth Engine using the included script. A csv file includes solar and sensor angle information for the MODIS pixel closest to the phenocam as well as for a sample of 100 randomly selected MODIS pixels within the GRCA-PJ ecosystem. 3) the 'WinterPeakExtent' dataset includes final geotiffs showing the temporal frequency extent and associated vegetation physiognomic types experiencing winter NDVI peaks in the western US. 4) the "SensorComparison" dataset contains the NDVI time series at the phenocam location from 4 other satellites as well as the code used to download these data.
p
Data from: CLIP: A Dataset for Extracting Action Items for Physicians from...
physionet.org
Updated Jun 21, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Mullenbach; Yada Pruksachatkun; Sean Adler; Jennifer Seale; Jordan Swartz; T Greg McKelvey; Yi Yang; David Sontag (2021). CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes [Dataset]. http://doi.org/10.13026/kw00-z903
Explore at:
Unique identifier
https://doi.org/10.13026/kw00-z903
Dataset updated
Jun 21, 2021
Authors
James Mullenbach; Yada Pruksachatkun; Sean Adler; Jennifer Seale; Jordan Swartz; T Greg McKelvey; Yi Yang; David Sontag
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
We created a dataset of clinical action items annotated over MIMIC-III. This dataset, which we call CLIP, is annotated by physicians and covers 718 discharge summaries, representing 107,494 sentences. Annotations were collected as character-level spans to discharge summaries after applying surrogate generation to fill in the anonymized templates from MIMIC-III text with faked data. We release these spans, their aggregation into sentence-level labels, and the sentence tokenizer used to aggregate the spans and label sentences. We also release the surrogate data generator, and the document IDs used for training, validation, and test splits, to enable reproduction. The spans are annotated with 0 or more labels of 7 different types, representing the different actions that may need to be taken: Appointment, Lab, Procedure, Medication, Imaging, Patient Instructions, and Other. We encourage the community to use this dataset to develop methods for automatically extracting clinical action items from discharge summaries.
Z
Data from: A 24-hour dynamic population distribution dataset based on mobile...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Feb 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matti Manninen (2022). A 24-hour dynamic population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4724388
Explore at:
Dataset updated
Feb 16, 2022
Dataset provided by
Henrikki Tenkanen
Matti Manninen
Tuuli Toivonen
Claudia Bergroth
Olle Järv
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Helsinki Metropolitan Area, Finland
Description
Related article: Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39.

In this dataset:

We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon – Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning.

Please cite this dataset as:

Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4

Organization of data

The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files:

HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area.

HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area.

HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area.

target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS.

Column names

YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute.

H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as “Hx”, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period)

In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets.

License Creative Commons Attribution 4.0 International.

Related datasets

Järv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612

Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564
o
Trade Data for Wood, Articles Of Wood, and Wood Charcoal
qatar.opendatasoft.com
csv, excel, json
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Trade Data for Wood, Articles Of Wood, and Wood Charcoal [Dataset]. https://qatar.opendatasoft.com/explore/dataset/trade-data-for-wood-articles-of-wood-and-wood-charcoal/
Explore at:
json, csv, excelAvailable download formats
Dataset updated
May 28, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset presents annual trade statistics for wood, wood articles, and wood charcoal in the State of Qatar. It includes data on imports, exports, re-exports, net imports, and trade balance. The dataset supports market analysis, customs evaluation, and environmental or industrial policy development related to wood products.

Facebook

Twitter

Click to copy link

Link copied

Cite

Noah Haber; Emily R. Smith; Ellen Moscoe; Kathryn Andrews; Robin Audy; Winnie Bell; Alana T. Brennan; Alexander Breskin; Jeremy C. Kane; Mahesh Karra; Elizabeth S. McClure; Elizabeth A. Suarez (2023). Academic article descriptive statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0196346.t002

Academic article descriptive statistics.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0196346.t002

Dataset updated

May 30, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Noah Haber; Emily R. Smith; Ellen Moscoe; Kathryn Andrews; Robin Audy; Winnie Bell; Alana T. Brennan; Alexander Breskin; Jeremy C. Kane; Mahesh Karra; Elizabeth S. McClure; Elizabeth A. Suarez

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Academic article descriptive statistics.

Clear search

Close search

Google apps

Main menu

Academic article descriptive statistics.

A study of the impact of data sharing on article citations using journal...

Data articles in journals

Data from: Estimation of the Emission Characteristics of SVOCs from...

Dataset: Shell Commands Used by Participants of Hands-on Cybersecurity...

NCES Academic Library Survey Dataset 1996 - 2020 -- alsMERGE_2020.csv

Data used by EPA researchers to generate illustrative figures for overview...

Expenditure and Consumption Survey, 2004 - West Bank and Gaza

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Sample and Frame:

Sample Design:

Sample strata:

Sample Size:

Target cluster size:

Mode of data collection

Research instrument

Cleaning operations

Raw Data

Harmonized Data

Response rate

Sampling error estimates

Dataset for "Do LiU researchers publish data – and where? Dataset analysis...

Statistical Performance Indicators

Kinematics and ball velocity of baseball pitchers

Amazon review data 2018

Context

Acknowledgements

Data from: Over-dispersed Count data for number of journal articles...

CT-FAN: A Multilingual dataset for Fake News Detection

Overtone Journalistic Content Bot/Human Indicator Dataset

cc_news

Data release associated with the journal article "Solar and sensor geometry,...

Data from: CLIP: A Dataset for Extracting Action Items for Physicians from...

Data from: A 24-hour dynamic population distribution dataset based on mobile...

Trade Data for Wood, Articles Of Wood, and Wood Charcoal

Academic article descriptive statistics.