100+ datasets found

N
Granada, CO annual income distribution by work experience and gender...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Granada, CO annual income distribution by work experience and gender dataset: Number of individuals ages 15+ with income, 2023 // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/granada-co-income-by-gender/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Granada
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time, Number of males working full time for a given income bracket, Number of males working part time for a given income bracket, Number of females working full time for a given income bracket, Number of females working part time for a given income bracket
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the number of individuals for both the genders (Male and Female), within each income bracket we conducted an initial analysis and categorization of the American Community Survey data. Households are categorized, and median incomes are reported based on the self-identified gender of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within Granada. The dataset can be utilized to gain insights into gender-based income distribution within the Granada population, aiding in data analysis and decision-making..

Key observations

Employment patterns: Within Granada, among individuals aged 15 years and older with income, there were 165 men and 205 women in the workforce. Among them, 100 men were engaged in full-time, year-round employment, while 100 women were in full-time, year-round roles.

Annual income under $24,999: Of the male population working full-time, 1% fell within the income range of under $24,999, while 24% of the female population working full-time was represented in the same income bracket.

Annual income above $100,000: 10% of men in full-time roles earned incomes exceeding $100,000, while 4% of women in full-time positions earned within this income bracket.

Refer to the research insights for more key observations on more income brackets ( Annual income under $24,999, Annual income between $25,000 and $49,999, Annual income between $50,000 and $74,999, Annual income between $75,000 and $99,999 and Annual income above $100,000) and employment types (full-time year-round and part-time)

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Income brackets:

$1 to $2,499 or loss

$2,500 to $4,999

$5,000 to $7,499

$7,500 to $9,999

$10,000 to $12,499

$12,500 to $14,999

$15,000 to $17,499

$17,500 to $19,999

$20,000 to $22,499

$22,500 to $24,999

$25,000 to $29,999

$30,000 to $34,999

$35,000 to $39,999

$40,000 to $44,999

$45,000 to $49,999

$50,000 to $54,999

$55,000 to $64,999

$65,000 to $74,999

$75,000 to $99,999

$100,000 or more

Variables / Data Columns

Income Bracket: This column showcases 20 income brackets ranging from $1 to $100,000+..

Full-Time Males: The count of males employed full-time year-round and earning within a specified income bracket

Part-Time Males: The count of males employed part-time and earning within a specified income bracket

Full-Time Females: The count of females employed full-time year-round and earning within a specified income bracket

Part-Time Females: The count of females employed part-time and earning within a specified income bracket

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Granada median household income by race. You can refer the same here
N
De Soto, WI annual income distribution by work experience and gender...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). De Soto, WI annual income distribution by work experience and gender dataset: Number of individuals ages 15+ with income, 2023 // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/de-soto-wi-income-by-gender/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Wisconsin, De Soto
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time, Number of males working full time for a given income bracket, Number of males working part time for a given income bracket, Number of females working full time for a given income bracket, Number of females working part time for a given income bracket
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the number of individuals for both the genders (Male and Female), within each income bracket we conducted an initial analysis and categorization of the American Community Survey data. Households are categorized, and median incomes are reported based on the self-identified gender of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within De Soto. The dataset can be utilized to gain insights into gender-based income distribution within the De Soto population, aiding in data analysis and decision-making..

Key observations

Employment patterns: Within De Soto, among individuals aged 15 years and older with income, there were 259 men and 192 women in the workforce. Among them, 79 men were engaged in full-time, year-round employment, while 100 women were in full-time, year-round roles.

Annual income under $24,999: Of the male population working full-time, 2.53% fell within the income range of under $24,999, while 1% of the female population working full-time was represented in the same income bracket.

Annual income above $100,000: 13.92% of men in full-time roles earned incomes exceeding $100,000, while 2% of women in full-time positions earned within this income bracket.

Refer to the research insights for more key observations on more income brackets ( Annual income under $24,999, Annual income between $25,000 and $49,999, Annual income between $50,000 and $74,999, Annual income between $75,000 and $99,999 and Annual income above $100,000) and employment types (full-time year-round and part-time)

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Income brackets:

$1 to $2,499 or loss

$2,500 to $4,999

$5,000 to $7,499

$7,500 to $9,999

$10,000 to $12,499

$12,500 to $14,999

$15,000 to $17,499

$17,500 to $19,999

$20,000 to $22,499

$22,500 to $24,999

$25,000 to $29,999

$30,000 to $34,999

$35,000 to $39,999

$40,000 to $44,999

$45,000 to $49,999

$50,000 to $54,999

$55,000 to $64,999

$65,000 to $74,999

$75,000 to $99,999

$100,000 or more

Variables / Data Columns

Income Bracket: This column showcases 20 income brackets ranging from $1 to $100,000+..

Full-Time Males: The count of males employed full-time year-round and earning within a specified income bracket

Part-Time Males: The count of males employed part-time and earning within a specified income bracket

Full-Time Females: The count of females employed full-time year-round and earning within a specified income bracket

Part-Time Females: The count of females employed part-time and earning within a specified income bracket

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for De Soto median household income by race. You can refer the same here
Income of individuals by age group, sex and income source, Canada, provinces...
www150.statcan.gc.ca
open.canada.ca
+1more
Updated May 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas [Dataset]. http://doi.org/10.25318/1110023901-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1110023901-eng
Dataset updated
May 1, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.
Survey of Consumer Finances
federalreserve.gov
Updated Oct 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Board of Governors of the Federal Reserve Board (2023). Survey of Consumer Finances [Dataset]. http://doi.org/10.17016/8799
Explore at:
Unique identifier
https://doi.org/10.17016/8799
Dataset updated
Oct 18, 2023
Dataset provided by
Federal Reserve Board of Governors
Federal Reserve Systemhttp://www.federalreserve.gov/
Authors
Board of Governors of the Federal Reserve Board
Time period covered
1962 - 2023
Description
The Survey of Consumer Finances (SCF) is normally a triennial cross-sectional survey of U.S. families. The survey data include information on families' balance sheets, pensions, income, and demographic characteristics.
N
Grove Hill, AL annual income distribution by work experience and gender...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Grove Hill, AL annual income distribution by work experience and gender dataset: Number of individuals ages 15+ with income, 2023 // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/grove-hill-al-income-by-gender/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Grove Hill, Alabama
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time, Number of males working full time for a given income bracket, Number of males working part time for a given income bracket, Number of females working full time for a given income bracket, Number of females working part time for a given income bracket
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the number of individuals for both the genders (Male and Female), within each income bracket we conducted an initial analysis and categorization of the American Community Survey data. Households are categorized, and median incomes are reported based on the self-identified gender of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within Grove Hill. The dataset can be utilized to gain insights into gender-based income distribution within the Grove Hill population, aiding in data analysis and decision-making..

Key observations

Employment patterns: Within Grove Hill, among individuals aged 15 years and older with income, there were 861 men and 670 women in the workforce. Among them, 277 men were engaged in full-time, year-round employment, while 301 women were in full-time, year-round roles.

Annual income under $24,999: Of the male population working full-time, 22.38% fell within the income range of under $24,999, while 16.94% of the female population working full-time was represented in the same income bracket.

Annual income above $100,000: 26.35% of men in full-time roles earned incomes exceeding $100,000, while 1% of women in full-time positions earned within this income bracket.

Refer to the research insights for more key observations on more income brackets ( Annual income under $24,999, Annual income between $25,000 and $49,999, Annual income between $50,000 and $74,999, Annual income between $75,000 and $99,999 and Annual income above $100,000) and employment types (full-time year-round and part-time)

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Income brackets:

$1 to $2,499 or loss

$2,500 to $4,999

$5,000 to $7,499

$7,500 to $9,999

$10,000 to $12,499

$12,500 to $14,999

$15,000 to $17,499

$17,500 to $19,999

$20,000 to $22,499

$22,500 to $24,999

$25,000 to $29,999

$30,000 to $34,999

$35,000 to $39,999

$40,000 to $44,999

$45,000 to $49,999

$50,000 to $54,999

$55,000 to $64,999

$65,000 to $74,999

$75,000 to $99,999

$100,000 or more

Variables / Data Columns

Income Bracket: This column showcases 20 income brackets ranging from $1 to $100,000+..

Full-Time Males: The count of males employed full-time year-round and earning within a specified income bracket

Part-Time Males: The count of males employed part-time and earning within a specified income bracket

Full-Time Females: The count of females employed full-time year-round and earning within a specified income bracket

Part-Time Females: The count of females employed part-time and earning within a specified income bracket

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Grove Hill median household income by race. You can refer the same here
Data from: Porpoise Observation Database (NRM)
gbif.org
researchdata.se
+1more
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linnea Cervin; Linnea Cervin (2024). Porpoise Observation Database (NRM) [Dataset]. http://doi.org/10.15468/yrxfxp
Explore at:
Unique identifier
https://doi.org/10.15468/yrxfxp
Dataset updated
Dec 18, 2024
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Swedish Museum of Natural History
Authors
Linnea Cervin; Linnea Cervin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This data set contains observations of dead or alive harbor porpoises made by the public, mostly around the Swedish coast. A few observations are from Norwegian, Danish, Finish and German waters. Each observation of harbor porpoise is verified at the Swedish Museum of Natural History before it is approved and published on the web. The verification consists of controlling the accuracy of number of animals sighted, if the coordinates are correct and if pictures are attached that they really show a porpoise and not another species. If any of these three seem unlikely, the reporter is contacted and asked more detailed questions. The report is approved or denied depending on the answers given. Pictures and movies that can’t be uploaded to the database due to size problems are saved at the museum server and marked with the identification number given by the database. By the end of the year the data is submitted to HELCOM who then summarize all the member state’s data from the Baltic proper to the Kattegat basin. The porpoise is one of the smallest tooth whales in the world and the only whale species that breeds in Swedish waters. They are to be found in temperate water in the northern hemisphere where they live in small groups of 1-3 individuals. The females give birth to a calf in the summer months which then suckles for about 10 months before it is left on its own and she has a new calf. The porpoises around Sweden are divided in to three groups that don’t mix very often. The North Sea population is found on the west coast in Skagerrak down to the Falkenberg area. The Belt Sea population is to be found a bit north of Falkenberg down to Blekinge archipelago in the Baltic. The Baltic proper population is the smallest population and consists only of a few hundred animals and is considered as an endangered sub species. They are most commonly found from the Blekinge archipelago up to Åland Sea with a hot spot area south of Gotland at Hoburg’s bank and the Mid-Sea bank. The Porpoise Observation Database was started in 2005 at the request of the Swedish Environmental Protection Agency to get a better understanding of where to find porpoises with the idea to use the public to expand the “survey area”. The first year 26 sightings were reported, where 4 was from the Baltic Sea. The museum is particularly interested in sightings from the Baltic Sea due to the low numbers of animals and lack of data and knowledge about this group. In the beginning only live sightings were reported but later also found dead animals were added. Some of the animals that are reported dead are collected. Depending on where it is found and its state of decay, the animal can be subsampled in the field. A piece of blubber and some teeth are then send in by mail and stored in the Environmental Specimen Bank at the Swedish Museum of Natural History in Stockholm. If the whole animal is collected an autopsy is performed at the National Veterinary Institute in Uppsala to try and determine cause of death. Organs, teeth and parasites are sampled and saved at the Environmental Specimen Bank as well. Information about the animal i.e. location, founding date, sex, age, length, weight, blubber thickness as well as type of organ and the amount that is sampled is then added to the Specimen Bank database. If there is an interest in getting samples or data from the Specimen Bank, one have to send in an application to the Department of Environmental research and monitoring and state the purpose of the study and the amount of samples needed.
Meta data and supporting documentation
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Z
Dataset for paper "Mitigating the effect of errors in source parameters on...
data.niaid.nih.gov
Updated Sep 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nienke Blom; Phil-Simon Hardalupas; Nicholas Rawlinson (2022). Dataset for paper "Mitigating the effect of errors in source parameters on seismic (waveform) inversion" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6969601
Explore at:
Dataset updated
Sep 28, 2022
Dataset provided by
University of Cambridge
Authors
Nienke Blom; Phil-Simon Hardalupas; Nicholas Rawlinson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset corresponding to the journal article "Mitigating the effect of errors in source parameters on seismic (waveform) inversion" by Blom, Hardalupas and Rawlinson, accepted for publication in Geophysical Journal International. In this paper, we demonstrate the effect or errors in source parameters on seismic tomography, with a particular focus on (full) waveform tomography. We study effect both on forward modelling (i.e. comparing waveforms and measurements resulting from a perturbed vs. unperturbed source) and on seismic inversion (i.e. using a source which contains an (erroneous) perturbation to invert for Earth structure. These data were obtained using Salvus, a state-of-the-art (though proprietary) 3-D solver that can be used for wave propagation simulations (Afanasiev et al., GJI 2018).

This dataset contains:

The entire Salvus project. This project was prepared using Salvus version 0.11.x and 0.12.2 and should be fully compatible with the latter.

A number of Jupyter notebooks used to create all the figures, set up the project and do the data processing.

A number of Python scripts that are used in above notebooks.

two conda environment .yml files: one with the complete environment as used to produce this dataset, and one with the environment as supplied by Mondaic (the Salvus developers), on top of which I installed basemap and cartopy.

An overview of the inversion configurations used for each inversion experiment and the name of hte corresponding figures: inversion_runs_overview.ods / .csv .

Datasets corresponding to the different figures.

One dataset for Figure 1, showing the effect of a source perturbation in a real-world setting, as previously used by Blom et al., Solid Earth 2020

One dataset for Figure 2, showing how different methodologies and assumptions can lead to significantly different source parameters, notably including systematic shifts. This dataset was kindly supplied by Tim Craig (Craig, 2019).

A number of datasets (stored as pickled Pandas dataframes) derived from the Salvus project. We have computed:

travel-time arrival predictions from every source to all stations (df_stations...pkl)

misfits for different metrics for both P-wave centered and S-wave centered windows for all components on all stations, comparing every time waveforms from a reference source against waveforms from a perturbed source (df_misfits_cc.28s.pkl)

addition of synthetic waveforms for different (perturbed) moment tenors. All waveforms are stored in HDF5 (.h5) files of the ASDF (adaptable seismic data format) type

How to use this dataset:

To set up the conda environment:

make sure you have anaconda/miniconda

make sure you have access to Salvus functionality. This is not absolutely necessary, but most of the functionality within this dataset relies on salvus. You can do the analyses and create the figures without, but you'll have to hack around in the scripts to build workarounds.

Set up Salvus / create a conda environment. This is best done following the instructions on the Mondaic website. Check the changelog for breaking changes, in that case download an older salvus version.

Additionally in your conda env, install basemap and cartopy:

conda-env create -n salvus_0_12 -f environment.yml conda install -c conda-forge basemap conda install -c conda-forge cartopy

Install LASIF (https://github.com/dirkphilip/LASIF_2.0) and test. The project uses some lasif functionality.

To recreate the figures: This is extremely straightforward. Every figure has a corresponding Jupyter Notebook. Suffices to run the notebook in its entirety.

Figure 1: separate notebook, Fig1_event_98.py

Figure 2: separate notebook, Fig2_TimCraig_Andes_analysis.py

Figures 3-7: Figures_perturbation_study.py

Figures 8-10: Figures_toy_inversions.py

To recreate the dataframes in DATA: This can be done using the example notebook Create_perturbed_thrust_data_by_MT_addition.py and Misfits_moment_tensor_components.M66_M12.py . The same can easily be extended to the position shift and other perturbations you might want to investigate.

To recreate the complete Salvus project: This can be done using:

the notebook Prepare_project_Phil_28s_absb_M66.py (setting up project and running simulations)

the notebooks Moment_tensor_perturbations.py and Moment_tensor_perturbation_for_NS_thrust.py

For the inversions: using the notebook Inversion_SS_dip.M66.28s.py as an example. See the overview table inversion_runs_overview.ods (or .csv) as to naming conventions.

References:

Michael Afanasiev, Christian Boehm, Martin van Driel, Lion Krischer, Max Rietmann, Dave A May, Matthew G Knepley, Andreas Fichtner, Modular and flexible spectral-element waveform modelling in two and three dimensions, Geophysical Journal International, Volume 216, Issue 3, March 2019, Pages 1675–1692, https://doi.org/10.1093/gji/ggy469

Nienke Blom, Alexey Gokhberg, and Andreas Fichtner, Seismic waveform tomography of the central and eastern Mediterranean upper mantle, Solid Earth, Volume 11, Issue 2, 2020, Pages 669–690, 2020, https://doi.org/10.5194/se-11-669-2020

Tim J. Craig, Accurate depth determination for moderate-magnitude earthquakes using global teleseismic data. Journal of Geophysical Research: Solid Earth, 124, 2019, Pages 1759– 1780. https://doi.org/10.1029/2018JB016902
4
Difficulty and Time Perceptions of Preparatory Activities for Quitting...
data.4tu.nl
zip
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nele Albers; Mark A. Neerincx; Willem-Paul Brinkman, Difficulty and Time Perceptions of Preparatory Activities for Quitting Smoking: Dataset [Dataset]. http://doi.org/10.4121/5198f299-9c7a-40f8-8206-c18df93ee2a0.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/5198f299-9c7a-40f8-8206-c18df93ee2a0.v1
Dataset provided by
4TU.ResearchData
Authors
Nele Albers; Mark A. Neerincx; Willem-Paul Brinkman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 6, 2022 - Nov 16, 2022
Description
This dataset contains the data on 144 daily smokers each rating 44 preparatory activities for quitting smoking (e.g., envisioning one's desired future self after quitting smoking, tracking one's smoking behavior, learning about progressive muscle relaxation) on their perceived ease/difficulty and required completion time. Since becoming more physically active can make it easier to quit smoking, some activities were also about becoming more physically active (e.g., tracking one's physical activity behavior, learning about what physical activity is recommended, envisioning one's desired future self after becoming more physically active). Moreover, participants provided a free-text response on what makes some activities more difficult than others.

Study
The data was gathered during a study on the online crowdsourcing platform Prolific between 6 September and 16 November 2022. The Human Research Ethics Committee of Delft University of Technology granted ethical approval for the research (Letter of Approval number: 2338).
In this study, daily smokers who were contemplating or preparing to quit smoking first filled in a prescreening questionnaire and were then invited to a repertory grid study if they passed the prescreening. In the repertory grid study, participants were asked to divide sets of 3 preparatory activities for quitting smoking into two subgroups. Afterward, they rated all preparatory activities on the perceived ease of doing them and the perceived required time to do them. Participants also provided a free-text response on what makes some activities more difficult than others.
The study was pre-registered in the Open Science Framework (OSF): https://osf.io/cax6f. This pre-registration describes the study setup, measures, etc. Note that this dataset contains only part of the collected data: the data related to studying the perceived difficulty of preparatory activities.
The file "Preparatory_Activity_Formulations.xlsx" contains the formulations of the 44 preparatory activities used in this study.

Data
This dataset contains three types of data:
- Data from participants' Prolific profiles. This includes, for example, the age, gender, weekly exercise amount, and smoking frequency.
- Data from a prescreening questionnaire. This includes, for example, the stage of change for quitting smoking and whether people previously tried to quit smoking.
- Data from the repertory grid study. This includes the ratings of the 44 activities on ease and required time as well as the free-text responses on what makes some activities more difficult than others.
There is for each data file a file that explains each data column. For example, the file "prolific_profile_data_explanation.xlsx" contains the column explanations for the data gathered from participants' Prolific profiles.
Each data file contains a column called "rand_id" that can be used to link the data from the data files.

In the case of questions, please contact Nele Albers (n.albers@tudelft.nl) or Willem-Paul Brinkman (w.p.brinkman@tudelft.nl).
BIP! DB: A Dataset of Impact Measures for Research Products
data.europa.eu
unknown
Updated Sep 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). BIP! DB: A Dataset of Impact Measures for Research Products [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-13642957?locale=en
Explore at:
unknown(1269246782)Available download formats
Dataset updated
Sep 2, 2024
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains citation-based impact indicators (a.k.a, "measures") for ~209M distinct PIDs (persistent identifiers) that correspond to research products (scientific publications, datasets, etc). In particular, for each PID, we have calculated the following indicators (organized in categories based on the semantics of the impact aspect that they better capture): Influence indicators (i.e., indicators of the "total" impact of each research product; how established it is in general) Citation Count: The total number of citations of the product, the most well-known influence indicator. PageRank score: An influence indicator based on the PageRank [1], a popular network analysis method. PageRank estimates the influence of each product based on its centrality in the whole citation network. It alleviates some issues of the Citation Count indicator (e.g., two products with the same number of citations can have significantly different PageRank scores if the aggregated influence of the products citing them is very different - the product receiving citations from more influential products will get a larger score). Popularity indicators (i.e., indicators of the "current" impact of each research product; how popular the product is currently) RAM score: A popularity indicator based on the RAM [2] method. It is essentially a Citation Count where recent citations are considered as more important. This type of "time awareness" alleviates problems of methods like PageRank, which are biased against recently published products (new products need time to receive a number of citations that can be indicative for their impact). AttRank score: A popularity indicator based on the AttRank [3] method. AttRank alleviates PageRank's bias against recently published products by incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher's preference to examine products which received a lot of attention recently. Impulse indicators (i.e., indicators of the initial momentum that the research product received right after its publication) Incubation Citation Count (3-year CC): This impulse indicator is a time-restricted version of the Citation Count, where the time window length is fixed for all products and the time window depends on the publication date of the product, i.e., only citations 3 years after each product's publication are counted. More details about the aforementioned impact indicators, the way they are calculated and their interpretation can be found here and in the respective references (e.g., in [5]). From version 5.1 onward, the impact indicators are calculated in two levels: The PID level (assuming that each PID corresponds to a distinct research product). The OpenAIRE-id level (leveraging PID synonyms based on OpenAIRE's deduplication algorithm [4] - each distinct article has its own OpenAIRE id). Previous versions of the dataset only provided the scores at the PID level. From version 12 onward, two types of PIDs are included in the dataset: DOIs and PMIDs (before that version, only DOIs were included). Also, from version 7 onward, for each product in our files we also offer an impact class, which informs the user about the percentile into which the product score belongs compared to the impact scores of the rest products in the database. The impact classes are: C1 (in top 0.01%), C2 (in top 0.1%), C3 (in top 1%), C4 (in top 10%), and C5 (in bottom 90%). Finally, before version 10, the calculation of the impact scores (and classes) was based on a citation network having one node for each product with a distinct PID that we could find in our input data sources. However, from version 10 onward, the nodes are deduplicated using the most recent version of the OpenAIRE article deduplication algorithm. This enabled a correction of the scores (more specifically, we avoid counting citation links multiple times when they are made by multiple versions of the same product). As a result, each node in the citation network we build is a deduplicated product having a distinct OpenAIRE id. We still report the scores at PID level (i.e., we assign a score to each of the versions/instances of the product), however these PID-level scores are just the scores of the respective deduplicated nodes propagated accordingly (i.e., all version of the same deduplicated product will receive the same scores). We have removed a small number of instances (having a PID) that were assigned (by error) to multiple deduplicated records in the OpenAIRE Graph. For each calculation level (PID / OpenAIRE-id) we provide five (5) compressed CSV files (one for each measure/score provided) where each line follows the format "identifier
l
YouTube RPM by Niche (2025)
learningrevolution.net
html
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jawad Khan (2025). YouTube RPM by Niche (2025) [Dataset]. https://www.learningrevolution.net/how-much-money-does-youtube-pay-for-1-million-views/
Explore at:
htmlAvailable download formats
Dataset updated
Jan 2, 2025
Dataset provided by
Learning Revolution
Authors
Jawad Khan
Area covered
YouTube
Variables measured
Gaming, Travel, Finance, Education, Technology, Memes/Vlogs
Description
This dataset provides estimated YouTube RPM (Revenue Per Mille) ranges for different niches in 2025, based on ad revenue earned per 1,000 monetized views.
Meta Kaggle Code
kaggle.com
zip
Updated Oct 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(162197891554 bytes)Available download formats
Dataset updated
Oct 23, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Z
NewsUnravel Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anon (2024). NewsUnravel Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8344890
Explore at:
Dataset updated
Jul 11, 2024
Dataset authored and provided by
anon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
About the NUDA DatasetMedia bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.

General

This dataset was created through user feedback on automatically generated bias highlights on news articles on the website NewsUnravel made by ANON. Its goal is to improve the detection of linguistic media bias for analysis and to indicate it to the public. Support came from ANON. None of the funders played any role in the dataset creation process or publication-related decisions.

The dataset consists of text, namely biased sentences with binary bias labels (processed, biased or not biased) as well as metadata about the article. It includes all feedback that was given. The single ratings (unprocessed) used to create the labels with correlating User IDs are included.

For training, this dataset was combined with the BABE dataset. All data is completely anonymous. Some sentences might be offensive or triggering as they were taken from biased or more extreme news sources. The dataset does not identify sub-populations or can be considered sensitive to them, nor is it possible to identify individuals.

Description of the Data Files

This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain the following data:

NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labelsStatistics.png: contains all Umami statistics for NewsUnravel's usage dataFeedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasonsContent.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentence and the bias rating, and reason, if givenArticle.csv: holds the article ID, title, source, article metadata, article topic, and bias amount in %Participant.csv: holds the participant IDs and data processing consent

Collection Process

Data was collected through interactions with the Feedback Mechanism on NewsUnravel. A news article was displayed with automatically generated bias highlights. Each highlight could be selected, and readers were able to agree or disagree with the automatic label. Through a majority vote, labels were generated from those feedback interactions. Spammers were excluded through a spam detection approach.

Readers came to our website voluntarily through posts on LinkedIn and social media as well as posts on university boards. The data collection period lasted for one week, from March 4th to March 11th (2023). The landing page informed them about the goal and the data processing. After being informed, they could proceed to the article overview.

So far, the dataset has been used on top of BABE to train a linguistic bias classifier, adopting hyperparameter configurations from BABE with a pre-trained model from Hugging Face.The dataset will be open source. On acceptance, a link with all details and contact information will be provided. No third parties are involved.

The dataset will not be maintained as it captures the first test of NewsUnravel at a specific point in time. However, new datasets will arise from further iterations. Those will be linked in the repository. Please cite the NewsUnravel paper if you use the dataset and contact us if you're interested in more information or joining the project.
Q
Data for: The Pandemic Journaling Project, Phase One (PJP-1)
data.qdr.syr.edu
3gp +22
Updated Feb 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah S. Willen; Sarah S. Willen; Katherine A. Mason; Katherine A. Mason (2024). Data for: The Pandemic Journaling Project, Phase One (PJP-1) [Dataset]. http://doi.org/10.5064/F6PXS9ZK
Explore at:
jpeg(-1), jpeg(64787), png(-1), jpeg(2635904), jpeg(2809706), jpeg(3128025), jpeg(3522579), mp4a(609792), jpeg(2715246), jpeg(564843), mp4a(1607020), jpeg(29277), jpeg(411392), jpeg(3219184), html(64045635), jpeg(1455187), jpeg(3953592), jpeg(445647), jpeg(3079564), png(858132), jpeg(3262275), jpeg(5268315), jpeg(1173279), mp4a(4746585), mp4a(506955), jpeg(2228793), jpeg(2399356), jpeg(1847185), png(1487656), mp4a(3329780), mp4a(1503462), bin(-1), jpeg(3226310), mp4a(2843558), jpeg(3161075), jpeg(2535033), jpeg(1814204), mp4a(1403036), jpeg(6831581), jpeg(3500892), jpeg(2063706), jpeg(2867362), jpeg(36303), mp4a(608702), jpeg(2174907), jpeg(2775382), mpga(3119325), pdf(-1), html(28046914), jpeg(2571274), qt(642282), gif(-1), bin(1475326), jpeg(1669679), jpeg(288031), mp4(16611275), jpeg(3758294), mp4a(1316029), mp4a(2192000), jpeg(51905), mpga(3284435), jpeg(47621), jpeg(806714), jpeg(3720630), mp4a(2496251), jpeg(2320221), jpeg(4266931), jpeg(3779944), jpeg(2036741), jpeg(73283), jpeg(460192), jpeg(81002), jpeg(1794407), jpeg(843851), jpeg(134732), bin(1324105), mp4(-1), html(3785552), bin(446182), jpeg(126557), jpeg(112141), jpeg(99013), jpeg(2763037), jpeg(2904103), mp4a(3455446), jpeg(2690540), mpga(3655410), jpeg(2348580), mp4a(8043573), jpeg(4103780), mp4a(2090318), jpeg(3309302), xlsx(34600), jpeg(3101557), qt(-1), jpeg(2597912), jpeg(197952), jpeg(528533), jpeg(2484777), jpeg(17026260), jpeg(31091), jpeg(1143472), jpeg(2705547), jpeg(4634609), mp4a(2427794), mp4a(865561), qt(6530289), jpeg(2750981), mp4a(431473), jpeg(4477949), jpeg(5588285), mp4a(1258547), jpeg(44679), jpeg(5718836), jpeg(2169748), mp4a(4727052), jpeg(4410466), jpeg(359020), jpeg(319878), jpeg(3348421), jpeg(2742034), jpeg(479908), jpeg(2871901), jpeg(754914), mpga(3369080), audio/vnd.dlna.adts(2291450), bin(925606), mp4a(1468479), mp4a(3505956), mp4a(934968), jpeg(94576), mp4a(954136), png(1217841), png(259675), jpeg(2768465), jpeg(7435869), mp4a(558160), jpeg(452676), jpeg(2614435), jpeg(2295874), jpeg(2985176), jpeg(2382774), jpeg(1836889), mp4a(714107), jpeg(3058184), png(4809397), png(291188), jpeg(476581), bin(315174), mp4a(963668), mp4a(1691796), jpeg(305566), jpeg(2340053), mp4a(1416194), jpeg(2187251), mp4a(1480696), jpeg(1224621), jpeg(799339), jpeg(2106618), mp4a(2234556), html(59903646), jpeg(1502693), jpeg(496111), mp4a(710717), pdf(791867), jpeg(2320307), mp4a(2723319), jpeg(2588596), qt(6524117), jpeg(706630), jpeg(1797399), jpeg(3578041), png(34340), jpeg(413917), jpeg(2018007), mp4a(1822023), mp4a(546214), jpeg(104863), png(505848), jpeg(3999644), jpeg(2202086), jpeg(1779668), webm(2501579), jpeg(3644901), mpga(61021), xlsx(19458121), jpeg(3678114), jpeg(3195259), mp4a(5998805), mp4a(1089264), mpga(1223745), png(79931), ogv(921344), mp4a(5290770), mp4a(537339), mp4a(2522582), mp4a(2757638), mp4a(902919), mp4a(3664250), jpeg(293524), jpeg(1611225), jpeg(78426), audio/vnd.dlna.adts(3577011), jpeg(1425684), jpeg(2114989), png(2239184), jpeg(3532208), jpeg(2599799), jpeg(4051592), mp4a(766677), bin(1140735), mp4a(1950073), jpeg(2482637), mp4a(9461846), mp4a(886225), mp4a(2275458), jpeg(3964175), png(7323654), mp4a(3407172), jpeg(1662239), jpeg(2738720), jpeg(2680408), jpeg(875989), mp4a(1135778), jpeg(3063173), mp4a(1044083), mp4a(3068302), jpeg(4586435), jpeg(944028), jpeg(65604), jpeg(803886), mp4a(3207845), jpeg(9303719), jpeg(1178560), mpga(1096992), mp4a(273265), jpeg(37593), jpeg(148529), jpeg(516395), html(799294), mp4a(1064123), jpeg(647105), jpeg(3412037), bin(3742158), jpeg(2343745), jpeg(2242087), jpeg(1153242), mp4a(700840), mp4a(614290), png(674974), mp4a(462181), mp4a(3341713), mp4a(5455315), bin(1700382), png(7882498), jpeg(3098020), jpeg(2781328), mp4a(3763168), jpeg(4431416), mp4a(1614389), jpeg(287296), jpeg(2681973), jpeg(2107304), pdf(332485), jpeg(2635452), audio/vnd.dlna.adts(3058005), mp4a(2448226), mp4a(1805349), mp4a(4150285), mp4a(204164), jpeg(2606693), jpeg(2626157), mp4a(1459294), jpeg(566696), jpeg(2543785), mp4a(369050), mp4(30391500), jpeg(4579297), jpeg(5172226), jpeg(1548860), mp4a(944403), html(640739), jpeg(147544), jpeg(3964519), jpeg(1776724), mp4a(2984325), bin(1595391), jpeg(320684), bin(48838), jpeg(4079596), jpeg(2144716), mp4a(1642287), bin(616420), jpeg(4110243), html(799551), png(1792687), mp4a(962844), jpeg(2625613), jpeg(2666985), jpeg(2722455), jpeg(36852), jpeg(40164), jpeg(111950), mp4a(1235641), mp4a(101692), mp4a(489606), mp4a(1202077), mp4a(4721088), jpeg(63112), jpeg(3627878), mp4a(2368173), jpeg(6463999), mp4a(558864), jpeg(2818575), jpeg(950258), jpeg(4870478), jpeg(4661936), mp4a(828006), png(135414), jpeg(1511423), mpga(2579649), mpga(6283555), jpeg(39553), pdf(141529), bin(1084358), jpeg(379064), jpeg(1305368), mpga(625262), jpeg(4847317), bin(116966), wav(3184824), png(166019), jpeg(804562), jpeg(443742), jpeg(2216857), jpeg(539445), jpeg(2166243), png(1796101), jpeg(1875257), png(1640881), jpeg(2545361), png(441607), jpeg(2890369), mp4a(441334), jpeg(3591325), jpeg(130755), png(170479), mp4a(2620611), mp4a(4518524), mp4a(6386348), jpeg(2467582), mp4a(1084240), jpeg(95788), jpeg(2619585), mp4(8919033), jpeg(4410537), bin(1049901), jpeg(4145168), jpeg(1015520), png(108417), jpeg(11074031), mp4a(1034473), html(479151), jpeg(2543166), jpeg(1867990), jpeg(1688053), html(640918), jpeg(3761476), mp4a(2043016), mp4a(1327650), bin(443069), mp4a(8236358), jpeg(3333029), mp4a(4192934), jpeg(1964105), jpeg(3303164), jpeg(7390050), jpeg(3982230), jpeg(3033149), mp4a(705651), jpeg(45398), jpeg(1013777), jpeg(3386166), jpeg(3610339), jpeg(79582), jpeg(2749667), jpeg(3103944), jpeg(197437), jpeg(1240130), mp4a(3140356), mp4a(2218267), jpeg(5765324), jpeg(103691), jpeg(83984), jpeg(4445333), mp4a(634555), png(2280208), jpeg(3823557), jpeg(704279), mp4a(1632575), jpeg(2986691), bin(481830), jpeg(2921224), docx(-1), mp4a(5352815), ogv(650885), jpeg(421521), jpeg(3832698), html(3025837), audio/vnd.dlna.adts(3763036), bin(161414), jpeg(3634921), jpeg(175071), png(156532), jpeg(38705), jpeg(2969378), png(1059022), mp4a(1110381), bin(1812775), jpeg(1434922), bin(1048366), audio/vnd.dlna.adts(1787003), mp4a(795300), jpeg(2146419), jpeg(3113325), png(2690433), jpeg(2955817), jpeg(1950597), jpeg(180961), jpeg(2921263), png(1187248), jpeg(3661093), bin(1638526), mp4a(3258141), mp4a(2299616), audio/vnd.dlna.adts(6828390), png(4625953), jpeg(1806678), mp4a(1442751), jpeg(3484297), mp4a(581212), jpeg(2358438), jpeg(5251366), mp4a(856519), jpeg(895955), mp4a(225192), jpeg(1857109), png(396961), jpeg(6504102), jpeg(3550057), bin(642950), bin(726730), jpeg(2937002), jpeg(2241215), jpeg(2848793), jpeg(114301), jpeg(6851150), jpeg(5412996), jpeg(5099807), jpeg(2352338), mp4a(1108249), jpeg(59955), jpeg(597941), png(822965), png(279993), mp4a(649729), jpeg(5327907), html(41982439), jpeg(3926818), jpeg(3811126), mpga(3150075), mp4a(851987), jpeg(2161975), jpeg(3049221), mp4(14723059), mp4a(1166746), jpeg(3929963), jpeg(32386), bin(647846), jpeg(943529), png(3558483), mp4a(496459), jpeg(554775), jpeg(673727), jpeg(1234744), mp4a(1614229), bin(1077286), jpeg(2321955), mp4(15102498), jpeg(1138223), jpeg(2821667), mp4a(4957829), jpeg(5267053), jpeg(3746852), xlsx(66430625), png(1781350), mp4(13377154), jpeg(2521556), jpeg(4363031), jpeg(38838), jpeg(1177161), jpeg(5648135), jpeg(3860593), jpeg(3191081), jpeg(4074964), jpeg(2592942), jpeg(70743), jpeg(47092), jpeg(17155), mp4a(5461865), jpeg(317565), jpeg(154225), jpeg(2641570), jpeg(1432979), jpeg(2996468), jpeg(2537158), jpeg(2126839), mp4a(3445663), jpeg(524301), jpeg(2577631), mp4a(999933), jpeg(212728), jpeg(3050628), jpeg(67402), jpeg(4528980), jpeg(48108), jpeg(2849620), mp4a(799189), jpeg(977868), mp4a(1114948), mp4a(1538194), jpeg(3539999), jpeg(732964), mp4a(1159815), jpeg(177432), png(5221994), mp4a(120084), jpeg(4880331), jpeg(2634063), jpeg(1018097), webp(-1), bin(878982), jpeg(5596898), png(356862), jpeg(33015), mp4a(1665024), jpeg(1110786), xlsx(27165), jpeg(2034603), jpeg(2410690), mp4a(2172212), jpeg(287142), jpeg(865631), jpeg(4371438), mp4a(505909), bin(2410811), mp4a(416617), qt(5205385), jpeg(1642459), jpeg(1864894), mp4a(1275342), jpeg(4389684), mp4a(1216743), jpeg(1645086), mp4a(1917929), jpeg(2202466), jpeg(3415224), mp4a(2687040), jpeg(4168896), jpeg(3608610), mp4a(847604), jpeg(2952649), jpeg(1632186), jpeg(482523), jpeg(3260717), wav(2205734), ogv(332111), mp4a(3028452), jpeg(5449171), jpeg(2190017), html(646595), jpeg(2046616), jpeg(363257), bin(2539604), audio/vnd.dlna.adts(13530010), html(8779436), mp4a(3988517), html(710893), bin(2108773), mp4a(938780), mp4a(1632058), mp4a(1781328), jpeg(6006498), mp4a(2011577), png(1867628), jpeg(3578276), qt(1377580), bin(498661), jpeg(3959637), jpeg(3553188), mp4a(1566800), html(9536819), jpeg(1795067), bin(593638), jpeg(68405), jpeg(937156), jpeg(4183531), mpga(1488238), jpeg(864405), jpeg(1365686), docx(12339), jpeg(578317), xlsx(52077), html(523486), jpeg(7547441), mp4a(1930783), jpeg(58628), mp4a(1145760), jpeg(3167708), mp4(31660079), jpeg(2489302), mp4a(1666611), xlsx(82776), jpeg(1827086), jpeg(1844434), jpeg(4555773), jpeg(3299756), mp4a(1140725), mp4a(531377), mp4a(3139464), mp4(24994984), ogv(408137), jpeg(2440831), png(497108), xlsx(88927), jpeg(859100), jpeg(3121852), png(3396851), mp4a(337657), jpeg(1938676), mpga(3748682), jpeg(3010539), png(618010), jpeg(120170), mp4a(691616), jpeg(4782980), jpeg(1882397), mp4a(847950), mp4a(579012), jpeg(3477933), jpeg(3332206), jpeg(1777340), jpeg(1779300), jpeg(3324446), bin(2111272), jpeg(134273), jpeg(2327041), mp4a(2112621), jpeg(2028706), jpeg(2253098), jpeg(87256), jpeg(4748410), jpeg(2262473), mp4a(3061773), jpeg(3853660), jpeg(489701), jpeg(2016316), mp4(48601545), jpeg(4110324), mp4a(750884), mp4a(1666390), jpeg(2729939), jpeg(887373), pdf(122363), mp4a(760877), jpeg(5047594), jpeg(3513429), mp4a(701592), mp4a(24233), jpeg(3878593), jpeg(955964), jpeg(1959028), mp4a(573738), jpeg(1607988), jpeg(121889), mp4a(1115213), bin(1173798), jpeg(6732180), jpeg(1945789), jpeg(5423032), jpeg(252261), jpeg(3546392), jpeg(1587693), jpeg(1303230), jpeg(1050632), mp4a(2957441), mp4a(2682346), bin(564582), jpeg(117534), jpeg(417971), jpeg(3639631), jpeg(3283728), bin(234118), png(2037576), jpeg(3095107), png(1185912), jpeg(3003672), mp4a(1307438), jpeg(142223), jpeg(6401219), bin(2429287), jpeg(3129315), jpeg(111760), jpeg(749493), mpga(5172750), jpeg(67155), mp4a(1303543), audio/vnd.dlna.adts(4340557), jpeg(3978187), jpeg(2696452), mp4a(1505002), jpeg(1750030), jpeg(7505927), jpeg(2638934), jpeg(3812323), bin(818310), jpeg(571235), jpeg(3256481), mp4a(1374945), png(357625), jpeg(5542820), mp4a(1981377), mp4a(2469218), jpeg(4044906), jpeg(37019), jpeg(1134103), bin(632006), jpeg(85234), mp4(11623573), bin(1030438), audio/vnd.dlna.adts(11278413), mp4a(6956199), xlsx(48995), mp4a(10021109), xlsx(224948556), jpeg(41894), jpeg(85137), bin(3540340), jpeg(1280936), xlsx(189425), bin(546822), html(1075544), png(1790553), mp4a(8341651), mp4a(1347344), jpeg(1837571), qt(2398526), jpeg(488375), png(652644), bin(709318), mp4a(512559), jpeg(1660933), mp4a(903487), jpeg(2355965), jpeg(3175474), mp4a(3235128), pdf(213974), jpeg(3105125), mp4a(1264503), jpeg(817070), jpeg(2858948), bin(1019282), jpeg(3172013), jpeg(2118129), png(856929), jpeg(3172905), mp4a(2083812), jpeg(3950185), 3gp(4189257), webp(13654), jpeg(3985986), jpeg(22928), html(496815), jpeg(2221272), jpeg(4526887), jpeg(3917797), jpeg(1579597), jpeg(4260674), jpeg(3155291), jpeg(939502), jpeg(3169133), jpeg(68283), jpeg(145275), audio/vnd.dlna.adts(4820134), mp4a(1195465), html(1694054), jpeg(155887), mp4a(3274925), mp4a(4613589), mpga(2386117), jpeg(41185), mp4a(1086359), mp4a(1151555), bin(1960531), jpeg(2149916), jpeg(2564893), wmv(50197262), mp4(26601787), jpeg(1997912), jpeg(2729245), mp4a(729599), mpga(3484030), jpeg(4728142), jpeg(5043578), mp4a(873556), mp4a(660082), jpeg(13696858), mp4a(1555980), jpeg(45747), jpeg(3178887), qt(28706733), jpeg(4509448), bin(381126), mp4a(661507), jpeg(495339), jpeg(138394), jpeg(85114), mpga(1449626), mp4a(3615513), jpeg(6130051), mp4a(13214859), mp4a(1702996), mp4a(562777), jpeg(2551565), mp4a(1176775), jpeg(16753), mpga(1784266), jpeg(377428), jpeg(3136525), mp4a(1115669), jpeg(64481), mp4a(2548754), jpeg(32021), bin(3983879), jpeg(1629680), pdf(121390), jpeg(2243229), jpeg(3134307), html(38240607), jpeg(8644181), jpeg(4566822), mpga(379781), mp4a(2068903), jpeg(599871), mp4a(8995283), jpeg(2507441), bin(1544294), jpeg(254462), jpeg(1915392), jpeg(1595555), mp4a(1073809), jpeg(40514), jpeg(535219), mp4a(1617110), xlsx(20756300), bin(1869989), jpeg(2381586), jpeg(35883), mpga(4061915), jpeg(917468), jpeg(3052078), mp4a(1901851), jpeg(131612), jpeg(1507898), jpeg(130590), jpeg(133876), jpeg(180752), jpeg(3552912), jpeg(172352), mp4a(2419697), mp4a(331293), jpeg(1583799), jpeg(840041), mp4a(1611680), bin(328166), jpeg(219612), jpeg(1656656), jpeg(4653342), mp4a(5608105), jpeg(2201474), wav(2818960), mp4a(936086), pdf(91460), mp4a(1601130), jpeg(659500), jpeg(100391), jpeg(2812452), mp4a(5629529), jpeg(1816312), jpeg(71716), pdf(295280), jpeg(2911219), jpeg(2471054), docx(31188), jpeg(4659509), png(105272), mp4a(959231), mp4a(1516084), mpga(5970561), jpeg(3668632), mp4a(1739564), jpeg(2058883), jpeg(1901789), mp4a(3134928), mp4a(1152026), jpeg(3523727), mp4a(760909), mp4a(1248111), mp4a(984328), audio/vnd.dlna.adts(934543), jpeg(2193720), jpeg(1401200), bin(919270), jpeg(529647), mp4a(1608171), mp4a(5154628), jpeg(1040846), mp4a(2360919), mp4a(1273706), jpeg(1766662), mp4a(291843), jpeg(3199783), jpeg(4440461), mp4a(2354743), html(983166), jpeg(4653818), jpeg(3216327), jpeg(12340), png(24722), jpeg(68398), audio/vnd.dlna.adts(9495356), mp4a(1911363), jpeg(363586), jpeg(3277514), jpeg(2684588), png(795810), mp4a(1244456), jpeg(59161), jpeg(1603743), mp4a(611153), jpeg(2500101), jpeg(3468457), mp4a(843462), jpeg(4005962), mp4a(912224), 3gp(5920182), jpeg(1714504), jpeg(2280388), mpga(4640203), jpeg(3332571), mp4a(1269110), jpeg(1788844), mp4a(4350631), mp4a(1496135), bin(1772535), mpga(371534), jpeg(4221720), mp4a(1486515), mp4a(3758180), jpeg(3413660), jpeg(3451347), mp4(6993330), bin(152038), jpeg(3535829), jpeg(3234324), tiff(-1), jpeg(2251269), jpeg(2600986), bin(1606725), bin(1615540), jpeg(629961), mp4a(1364069), jpeg(849628), jpeg(2384630), jpeg(854035), jpeg(1059910), mp4a(432261), jpeg(6803436), qt(2010499), mp4a(1222788), png(252350), mp4a(561403), mp4a(1301355), jpeg(78430), jpeg(153294), jpeg(3111015), jpeg(3506560), mp4a(1614765), mp4a(4359255), mp4a(1609908), jpeg(3129756), jpeg(1440858), jpeg(24096), mpga(6606764), mp4a(219517), wav(16120364), mp4a(1071439), jpeg(3293381), jpeg(112899), jpeg(2875869), jpeg(4948125), mp4a(1615299), png(3496115), mp4a(1986411), png(586680), jpeg(1897709), jpeg(2273020), jpeg(4022260), jpeg(377213), mp4a(1702687), html(4191543), jpeg(1398077), jpeg(2079488), jpeg(31946), jpeg(1243971), jpeg(2389859), qt(574596), mp4a(532776), jpeg(2730221), mp4a(510562), jpeg(2968414), mp4a(2145487), jpeg(496123), jpeg(4274950), png(548620), jpeg(2124741), png(5709270), jpeg(5322032), mp4a(304846), jpeg(2969836), jpeg(5084546), jpeg(173417), mpga(2814171), pdf(308146), png(7879), png(2155793), jpeg(1568444), jpeg(107669), jpeg(3844552), jpeg(5050854), mp4(59931145), jpeg(26777), bin(3681626), mp4a(1124596), txt(186920), jpeg(520311), bin(416102), mp4a(7284061), jpeg(40281), jpeg(657555), png(1437413), jpeg(2534845), jpeg(445866), jpeg(1237900), jpeg(4250838), bin(156966), tsv(733), qt(3177780), bin(864966), jpeg(11690), mp4a(3045602), mp4a(2449349), bin(748148), jpeg(1825738), jpeg(1990482), mpga(1190436), mp4a(5845364), mp4a(1448064), jpeg(3171202), bin(2501650), jpeg(2273265), mp4a(619603), jpeg(951877), jpeg(63914), mp4a(1271334), jpeg(1976245), mpga(4817983), jpeg(331201), jpeg(129869), jpeg(7445743), jpeg(5717518), jpeg(2968114), mp4a(693312), mp4a(264471), jpeg(5399866), jpeg(71431), jpeg(1519243), jpeg(1593696), mp4(4106014), mp4a(705329), mp4a(1148157), jpeg(6046515), mp4a(916096), jpeg(333207), jpeg(3138702), jpeg(417572), mpga(5269701), jpeg(145637), mp4a(802505), png(1017305), jpeg(17907), jpeg(3598845), jpeg(1155643), jpeg(2638302), mp4a(822545), bin(1493618), bin(906790), jpeg(154930), jpeg(953837), zip(11659935), mp4a(1214837), mp4a(1016151), mp4a(3515351), mp4a(3839771), mp4a(1256085), jpeg(4031381), mpga(3309399), jpeg(290224), png(459262), jpeg(48326), jpeg(4736590), jpeg(1964763), jpeg(2042850), jpeg(14911972), jpeg(981139), mp4(8726495), jpeg(455010), mp4a(2202351), jpeg(72668), mpga(970535), jpeg(12825578), mp4a(1931894), jpeg(1726579), jpeg(3996799), jpeg(2413680), jpeg(2299059), png(1038072), mp4a(1467032), jpeg(732955), jpeg(145129), jpeg(4057705), jpeg(1575841), mpga(4266613), jpeg(3444896), mp4a(1095447), jpeg(2423812), 3gp(11381321), png(477408), mp4a(1358807), pdf(155079), jpeg(822164), mp4a(3978276), png(316363), jpeg(3336796), bin(1495558), jpeg(874390), jpeg(278529), jpeg(942247), pdf(129862), jpeg(4954268), jpeg(2572775), jpeg(3062482), qt(89399945), jpeg(2128499), jpeg(2849921), png(1019045), mp4a(3170368), mpga(4747435), jpeg(1371393), jpeg(3550211), mp4a(942819), jpeg(2313418), jpeg(4887470), jpeg(91125), mp4a(2439271), jpeg(2764753), mp4a(3002959), bin(729766), jpeg(798303), bin(2204684)Available download formats
Unique identifier
https://doi.org/10.5064/F6PXS9ZK
Dataset updated
Feb 15, 2024
Dataset provided by
Qualitative Data Repository
Authors
Sarah S. Willen; Sarah S. Willen; Katherine A. Mason; Katherine A. Mason
License
https://qdr.syr.edu/policies/qdr-restricted-access-conditionshttps://qdr.syr.edu/policies/qdr-restricted-access-conditions
Time period covered
May 29, 2020 - May 31, 2022
Area covered
Mexico, Central America, Europe, Canada, United States
Description
Project Summary This dataset contains all qualitative and quantitative data collected in the first phase of the Pandemic Journaling Project (PJP). PJP is a combined journaling platform and interdisciplinary, mixed-methods research study developed by two anthropologists, with support from a team of colleagues and students across the social sciences, humanities, and health fields. PJP launched in Spring 2020 as the COVID-19 pandemic was emerging in the United States. PJP was created in order to “pre-design an archive” of COVID-19 narratives and experiences open to anyone around the world. The project is rooted in a commitment to democratizing knowledge production, in the spirit of “archival activism” and using methods of “grassroots collaborative ethnography” (Willen et al. 2022; Wurtz et al. 2022; Zhang et al 2020; see also Carney 2021). The motto on the PJP website encapsulates these commitments: “Usually, history is written only by the powerful. When the history of COVID-19 is written, let’s make sure that doesn’t happen.” (A version of this Project Summary with links to the PJP website and other relevant sites is included in the public documentation of the project at QDR.) In PJP’s first phase (PJP-1), the project provided a digital space where participants could create weekly journals of their COVID-19 experiences using a smartphone or computer. The platform was designed to be accessible to as wide a range of potential participants as possible. Anyone aged 15 or older, living anywhere in the world, could create journal entries using their choice of text, images, and/or audio recordings. The interface was accessible in English and Spanish, but participants could submit text and audio in any language. PJP-1 ran on a weekly basis from May 2020 to May 2022. Data Overview This Qualitative Data Repository (QDR) project contains all journal entries and closed-ended survey responses submitted during PJP-1, along with accompanying descriptive and explanatory materials. The dataset includes individual journal entries and accompanying quantitative survey responses from more than 1,800 participants in 55 countries. Of nearly 27,000 journal entries in total, over 2,700 included images and over 300 are audio files. All data were collected via the Qualtrics survey platform. PJP-1 was approved as a research study by the Institutional Review Board (IRB) at the University of Connecticut. Participants were introduced to the project in a variety of ways, including through the PJP website as well as professional networks, PJP’s social media accounts (on Facebook, Instagram, and Twitter) , and media coverage of the project. Participants provided a single piece of contact information — an email address or mobile phone number — which was used to distribute weekly invitations to participate. This contact information has been stripped from the dataset and will not be accessible to researchers. PJP uses a mixed-methods research approach and a dynamic cohort design. After enrolling in PJP-1 via the project’s website, participants received weekly invitations to contribute to their journals via their choice of email or SMS (text message). Each weekly invitation included a link to that week’s journaling prompts and accompanying survey questions. Participants could join at any point, and they could stop participating at any point as well. They also could stop participating and later restart. Retention was encouraged with a monthly raffle of three $100 gift cards. All individuals who had contributed that month were eligible. Regardless of when they joined, all participants received the project’s narrative prompts and accompanying survey questions in the same order. In Week 1, before contributing their first journal entries, participants were presented with a baseline survey that collected demographic information, including political leanings, as well as self-reported data about COVID-19 exposure and physical and mental health status. Some of these survey questions were repeated at periodic intervals in subsequent weeks, providing quantitative measures of change over time that can be analyzed in conjunction with participants' qualitative entries. Surveys employed validated questions where possible. The core of PJP-1 involved two weekly opportunities to create journal entries in the format of their choice (text, image, and/or audio). Each week, journalers received a link with an invitation to create one entry in response to a recurring narrative prompt (“How has the COVID-19 pandemic affected your life in the past week?”) and a second journal entry in response to their choice of two more tightly focused prompts. Typically the pair of prompts included one focusing on subjective experience (e.g., the impact of the pandemic on relationships, sense of social connectedness, or mental health) and another with an external focus (e.g., key sources of scientific information, trust in government, or COVID-19’s economic impact). Each week,...
Z
Data from: A large-scale COVID-19 Twitter chatter dataset for open...
data.niaid.nih.gov
zenodo.org
Updated Apr 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Banda, Juan M.; Tekumalla, Ramya; Wang, Guanyu; Yu, Jingyuan; Liu, Tuo; Ding, Yuning; Artemova, Katya; Tutubalina, Elena; Chowell, Gerardo (2023). A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3723939
Explore at:
Dataset updated
Apr 17, 2023
Dataset provided by
KFU
Georgia State University
NRU HSE
Universität Duisburg-Essen
University of Missouri
Universitat Autònoma de Barcelona
Carl von Ossietzky Universität Oldenburg
Authors
Banda, Juan M.; Tekumalla, Ramya; Wang, Guanyu; Yu, Jingyuan; Liu, Tuo; Ding, Yuning; Artemova, Katya; Tutubalina, Elena; Chowell, Gerardo
Description
Version 162 of the dataset. NOTES: Data for 3/15 - 3/18 was not extracted due to unexpected and unannounced downtime of our university infrastructure. We will try to backfill those days by next release. FUTURE CHANGES: Due to the imminent paywalling of Twitter's API access this might be the last full update of this dataset. If the API access is not blocked, we will be stopping updates for this dataset with release 165 - a bit more than 3 years after our initial release. It's been a joy seeing all the work that uses this resource and we are glad that so many found it useful.

The dataset files: full_dataset.tsv.gz and full_dataset_clean.tsv.gz have been split in 1 GB parts using the Linux utility called Split. So make sure to join the parts before unzipping. We had to make this change as we had huge issues uploading files larger than 2GB's (hence the delay in the dataset releases). The peer-reviewed publication for this dataset has now been published in Epidemiologia an MDPI journal, and can be accessed here: https://doi.org/10.3390/epidemiologia2030024. Please cite this when using the dataset.

Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. Since version 20 we have included language and place location for all tweets.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (1,395,222,801 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (361,748,721 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the full_dataset-statistics.tsv and full_dataset-clean-statistics.tsv files. For more statistics and some visualizations visit: http://www.panacealab.org/covid19/

More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter) and our pre-print about the dataset (https://arxiv.org/abs/2004.03688)

As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. They need to be hydrated to be used.
d
August 2024 data-update for "Updated science-wide author databases of...
elsevier.digitalcommonsdata.com
Updated Sep 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John P.A. Ioannidis (2024). August 2024 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.7
Explore at:
Unique identifier
https://doi.org/10.17632/btchxktzyw.7
Dataset updated
Sep 16, 2024
Authors
John P.A. Ioannidis
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added in the most recent iteration. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2023 and single recent year data pertain to citations received during calendar year 2023. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2024 snapshot from Scopus, updated to end of citation year 2023. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2024. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
Z
Data from: Aircraft Marshaling Signals Dataset of FMCW Radar and Event-Based...
data.niaid.nih.gov
Updated Dec 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leon Müller; Manolis Sifalakis; Sherif Eissa; Amirreza Yousefzadeh; Sander Stuijk; Federico Corradi; Paul Detterer (2023). Aircraft Marshaling Signals Dataset of FMCW Radar and Event-Based Camera for Sensor Fusion [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7656910
Explore at:
Dataset updated
Dec 11, 2023
Dataset provided by
Eindhoven University of Technology
IMEC
Authors
Leon Müller; Manolis Sifalakis; Sherif Eissa; Amirreza Yousefzadeh; Sander Stuijk; Federico Corradi; Paul Detterer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Introduction The advent of neural networks capable of learning salient features from variance in the radar data has expanded the breadth of radar applications, often as an alternative sensor or a complementary modality to camera vision. Gesture recognition for command control is arguably the most commonly explored application. Nevertheless, more suitable benchmarking datasets than currently available are needed to assess and compare the merits of the different proposed solutions and explore a broader range of scenarios than simple hand-gesturing a few centimeters away from a radar transmitter/receiver. Most current publicly available radar datasets used in gesture recognition provide limited diversity, do not provide access to raw ADC data, and are not significantly challenging. To address these shortcomings, we created and make available a new dataset that combines FMCW radar and dynamic vision camera of 10 aircraft marshalling signals (whole body) at several distances and angles from the sensors, recorded from 13 people. The two modalities are hardware synchronized using the radar's PRI signal. Moreover, in the supporting publication we propose a sparse encoding of the time domain (ADC) signals that achieve a dramatic data rate reduction (>76%) while retaining the efficacy of the downstream FFT processing (<2% accuracy loss on recognition tasks), and can be used to create an sparse event-based representation of the radar data. In this way the dataset can be used as a two-modality neuromorphic dataset. Synchronization of the two modalities The PRI pulses from the radar have been hard-wired to the event stream of the DVS sensor, and timestamped using the DVS clock. Based on this signal the DVS event stream has been segmented such that groups of events (time-bins) of the DVS are mapped with individual radar pulses (chirps). Data storage DVS events (x,y coords and timestamps) are stored in structured arrays, and one such structured array object is associated with the data of a radar transmission (pulse/chirp). A radar transmission is a vector of 512 ADC levels that correspond to sampling points of chirping signal (FMCW radar) that lasts about ~1.3ms. Every 192 radar transmissions are stacked in a matrix called a radar frame (each transmission is a row in that matrix). A data capture (recording) consisting of some thousands of continuous radar transmissions is therefore segmented in a number of radar frames. Finally radar frames and the corresponding DVS structured arrays are stored in separate containers in a custom-made multi-container file format (extension .rad). We provide a (rad file) parser for extracting the data out of these files. There is one file per capture of continuous gesture recording of about 10s. Note the number of 192 transmissions per radar frame is an ad-hoc segmentation that suits the purpose of obtaining sufficient signal resolution in a 2D FFT typical in radar signal processing, for the range resolution of the specific radar. It also served the purpose of fast streaming storing of the data during capture. For extracting individual data points for the dataset however, one can pool together (concat) all the radar frames from a single capture file and re-segment them according to liking. The data loader that we provide offers this, with a default of re-segmenting every 769 transmissions (about 1s of gesturing). Data captures directory organization (radar8Ghz-DVS-marshaling_signals_20220901_publication_anonymized.7z) The dataset captures (recordings) are organized in a common directory structure which encompasses additional metadata information about the captures. dataset_dir///--/ofxRadar8Ghz_yyyy-mm-dd_HH-MM-SS.rad Identifiers

stage [train, test]. room: [conference_room, foyer, open_space]. subject: [0-9]. Note that 0 stands for no person, and 1 for an unlabeled, random person (only present in test). gesture: ['none', 'emergency_stop', 'move_ahead', 'move_back_v1', 'move_back_v2', 'slow_down' 'start_engines', 'stop_engines', 'straight_ahead', 'turn_left', 'turn_right']. distance: 'xxx', '100', '150', '200', '250', '300', '350', '400', '450'. Note that xxx is used for none gestures when there is no person present in front of the radar (i.e. background samples), or when a person is walking in front of the radar with varying distances but performing no gesture. The test data captures contain both subjects that appear in the train data as well as previously unseen subjects. Similarly the test data contain captures from the spaces that train data were recorded at, as well as from a new unseen open space. Files List radar8Ghz-DVS-marshaling_signals_20220901_publication_anonymized.7z This is the actual archive bundle with the data captures (recordings). rad_file_parser_2.py Parser for individual .rad files, which contain capture data. loader.py A convenience PyTorch Dataset loader (partly Tonic compatible). You practically only need this to quick-start if you don't want to delve too much into code reading. When you init a DvsRadarAircraftMarshallingSignals class object it automatically downloads the dataset archive and the .rad file parser, unpacks the archive, and imports the .rad parser to load the data. One can then request from it a training set, a validation set and a test set as torch.Datasets to work with.
aircraft_marshalling_signals_howto.ipynb Jupyter notebook for exemplary basic use of loader.py Contact For further information or questions try contacting first M. Sifalakis or F. Corradi.
State of Nature layers for Water Availability and Water Pollution to support...
zenodo.org
zip
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Camargo; Rafael Camargo; Sara Walker; Elizabeth Saccoccia; Richard McDowell; Richard McDowell; Allen Townsend; Ariane Laporte-Bisquit; Samantha McCraine; Varsha Vijay; Sara Walker; Elizabeth Saccoccia; Allen Townsend; Ariane Laporte-Bisquit; Samantha McCraine; Varsha Vijay (2024). State of Nature layers for Water Availability and Water Pollution to support SBTN Step 1: Assess and Step 2: Interpret & Prioritize [Dataset]. http://doi.org/10.5281/zenodo.7797979
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7797979
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rafael Camargo; Rafael Camargo; Sara Walker; Elizabeth Saccoccia; Richard McDowell; Richard McDowell; Allen Townsend; Ariane Laporte-Bisquit; Samantha McCraine; Varsha Vijay; Sara Walker; Elizabeth Saccoccia; Allen Townsend; Ariane Laporte-Bisquit; Samantha McCraine; Varsha Vijay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
There are multiple well-recognized and peer-reviewed global datasets that can be used to assess water availability and water pollution. Each of these datasets are based on different inputs, modeling approaches, and assumptions. Therefore, in SBTN Step 1: Assess and Step 2: Interpret & Prioritize, companies are required to consult different global datasets for a robust and comprehensive State of Nature (SoN) assessment for water availability and water pollution.

To streamline this process, WWF, the World Resources Institute (WRI), and SBTN worked together to develop two ready-to-use unified layers of SoN – one for water availability and one for water pollution – in line with the Technical Guidance for Steps 1: Assess and Step 2: Interpret & Prioritize. The result is a single file (shapefile) containing the maximum value both for water availability and for water pollution, as well as the datasets’ raw values (as references). This data is publicly available for download from this repository.

These unified layers will make it easier for companies to implement a robust approach, and they will lead to more aligned and comparable results between companies. A temporary App is available at https://arcg.is/0z9mOD0 to help companies assess the SoN for water availability and water pollution around their operations and supply chain locations. In the future, these layers will become available both in the WRI’s Aqueduct and in the WWF Risk Filter Suite.

For the SoN for water availability, the following datasets were considered:

Baseline water stress (Hofste et al. 2019), data available here

Water depletion (Brauman et al. 2016), data available here

Blue water scarcity (Mekonnen & Hoekstra 2016), data upon request to the authors

For the SoN for water pollution, the following datasets were considered:

Coastal Eutrophication Potential (Hofste et al. 2019), data available here

Nitrate-Nitrite Concentration (Damania et al. 2019), data available here

Periphyton Growth Potential (McDowell et al. 2020), data available here

In general, the same processing steps were performed for all datasets:

Compute the area-weighted median of each dataset at a common spatial resolution, i.e. HydroSHEDS HydroBasins Level 6 in this case.

Classify datasets to a common range as reclassifying raw values to 1-5 values, where 0 (zero) was used for cells or features with no data. See the documentation for more details.

Identify the maximum value between the classified datasets, separately, for Water Availability and for Water Pollution.

For transparency and reproducibility, the code is publicly available at https://github.com/rafaexx/sbtn-SoN-water
z
Data from: Lending Club loan dataset for granting models
zenodo.org
produccioncientifica.ucm.es
+1more
csv
Updated May 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miller Janny Ariza-Garzón; Miller Janny Ariza-Garzón; Mario Sanz-Guerrero; Mario Sanz-Guerrero; Javier Arroyo Gallardo; Javier Arroyo Gallardo (2024). Lending Club loan dataset for granting models [Dataset]. http://doi.org/10.5281/zenodo.11295916
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11295916
Dataset updated
May 27, 2024
Dataset provided by
Universidad Complutense de Madrid
Authors
Miller Janny Ariza-Garzón; Miller Janny Ariza-Garzón; Mario Sanz-Guerrero; Mario Sanz-Guerrero; Javier Arroyo Gallardo; Javier Arroyo Gallardo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Lending Club offers peer-to-peer (P2P) loans through a technological platform for various personal finance purposes and is today one of the companies that dominate the US P2P lending market. The original dataset is publicly available on Kaggle and corresponds to all the loans issued by Lending Club between 2007 and 2018. The present version of the dataset is for constructing a granting model, that is, a model designed to make decisions on whether to grant a loan based on information available at the time of the loan application. Consequently, our dataset only has a selection of variables from the original one, which are the variables known at the moment the loan request is made. Furthermore, the target variable of a granting model represents the final status of the loan, that are "default" or "fully paid". Thus, we filtered out from the original dataset all the loans in transitory states. Our dataset comprises 1,347,681 records or obligations (approximately 60% of the original) and it was also cleaned for completeness and consistency (less than 1% of our dataset was filtered out).

TARGET VARIABLE

The dataset includes a target variable based on the final resolution of the credit: the default category corresponds to the event charged off and the non-default category to the event fully paid. It does not consider other values in the loan status variable since this variable represents the state of the loan at the end of the considered time window. Thus, there are no loans in transitory states. The original dataset includes the target variable “loan status”, which contains several categories ('Fully Paid', 'Current', 'Charged Off', 'In Grace Period', 'Late (31-120 days)', 'Late (16-30 days)', 'Default'). However, in our dataset, we just consider loans that are either “Fully Paid” or “Default” and transform this variable into a binary variable called “Default”, with a 0 for fully paid loans and a 1 for defaulted loans.

EXPLANATORY VARIABLES

The explanatory variables that we use correspond only to the information available at the time of the application. Variables such as the interest rate, grade, or subgrade are generated by the company as a result of a credit risk assessment process, so they were filtered out from the dataset as they must not be considered in risk models to predict the default in granting of credit.

FULL LIST OF VARIABLES

Loan identification variables:

id: Loan id (unique identifier).

issue_d: Month and year in which the loan was approved.

Quantitative variables:

revenue: Borrower's self-declared annual income during registration.

dti_n: Indebtedness ratio for obligations excluding mortgage. Monthly information. This ratio has been calculated considering the indebtedness of the whole group of applicants. It is estimated as the ratio calculated using the co-borrowers’ total payments on the total debt obligations divided by the co-borrowers’ combined monthly income.

loan_amnt: Amount of credit requested by the borrower.

fico_n: Defined between 300 and 850, reported by Fair Isaac Corporation as a risk measure based on historical credit information reported at the time of application. This value has been calculated as the average of the variables “fico_range_low” and “fico_range_high” in the original dataset.

experience_c: Binary variable that indicates whether the borrower is new to the entity. This variable is constructed from the credit date of the previous obligation in LC and the credit date of the current obligation; if the difference between dates is positive, it is not considered as a new experience with LC.

Categorical variables:

emp_length: Categorical variable with the employment length of the borrower (includes the no information category)

purpose: Credit purpose category for the loan request.

home_ownership_n: Homeownership status provided by the borrower in the registration process. Categories defined by LC: “mortgage”, “rent”, “own”, “other”, “any”, “none”. We merged the categories “other”, “any” and “none” as “other”.

addr_state: Borrower's residence state from the USA.

zip_code: Zip code of the borrower's residence.

Textual variables

title: Title of the credit request description provided by the borrower.

desc: Description of the credit request provided by the borrower.

We cleaned the textual variables. First, we removed all those descriptions that contained the default description provided by Lending Club on its web form (“Tell your story. What is your loan for?”). Moreover, we removed the prefix “Borrower added on DD/MM/YYYY >” from the descriptions to avoid any temporal background on them. Finally, as these descriptions came from a web form, we substituted all the HTML elements by their character (e.g. “&” was substituted by “&”, “<” was substituted by “<”, etc.).

RELATED WORKS

This dataset has been used in the following academic articles:

Sanz-Guerrero, M. Arroyo, J. (2024). Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending. arXiv preprint arXiv:2401.16458. https://doi.org/10.48550/arXiv.2401.16458

Ariza-Garzón, M.J., Arroyo, J., Caparrini, A., Segovia-Vargas, M.J. (2020). Explainability of a machine learning granting scoring model in peer-to-peer lending. IEEE Access 8, 64873 - 64890. https://doi.org/10.1109/ACCESS.2020.2984412
Person action detection dataset
kaggle.com
zip
Updated Sep 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iustina Ivanova (2019). Person action detection dataset [Dataset]. https://www.kaggle.com/datasets/yustiks/person-action-detection
Explore at:
zip(11937311 bytes)Available download formats
Dataset updated
Sep 8, 2019
Authors
Iustina Ivanova
Description
Context

The dataset is created as a task to detect the activity of patients in a clinic. Namely, the idea is to detect when a person stands up after sleeping. The motivation behind this task is that the patients in the hospital need to be tracked in order to give them more careful medical help. Some of the patients are very shy and do not want to ask for the help, although, it is important for them to get the assistance when they wake up, otherwise they can fall and break their bones.

Content

The dataset has video examples of standing of a person, and every video file has correlated csv file with annotations made for every frames generated from video. The annotations are made as a part of task. In every frame, the system (pose estimation algorithm OpenPose) could detect up to 2 persons. If there is only one person detected, then the second person is filled with 0 value.

For each corresponding csv file of video: frame_id - number of generated frame num_person - number of people detected pose_1 - indicates whether person 1 is detected (1 if yes, 0 if not) in_bed_1 - indicates whether this detected person is in laying in bed (1 if yes, 0 if not) pose_2 - indicates whether person 2 is detected (1 if yes, 0 if not) in_bed_2 - indicates whether person 2 is in bed (1 if yes, 0 if not) anomaly - indicates that this frame is anomaly and assistance is needed (1 if yes, 0 if not)

The assistance is needed when a patience is alone in a bed, and he was in a bed laying for some time before the current moment. The moment when he started to stand up is considered to be anomaly.

The task is to detect on a test video whether the frame is anomaly or not.

Acknowledgements

Dataplex.ai company.

Inspiration

We would like to help patients and make their life easier.

Facebook

Twitter

Click to copy link

Link copied

Cite

Neilsberg Research (2025). Granada, CO annual income distribution by work experience and gender dataset: Number of individuals ages 15+ with income, 2023 // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/granada-co-income-by-gender/

Granada, CO annual income distribution by work experience and gender dataset: Number of individuals ages 15+ with income, 2023 // 2025 Edition

Explore at:

json, csvAvailable download formats

Dataset updated

Feb 27, 2025

Dataset authored and provided by

Neilsberg Research

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Granada

Variables measured

Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time, Number of males working full time for a given income bracket, Number of males working part time for a given income bracket, Number of females working full time for a given income bracket, Number of females working part time for a given income bracket

Measurement technique

The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the number of individuals for both the genders (Male and Female), within each income bracket we conducted an initial analysis and categorization of the American Community Survey data. Households are categorized, and median incomes are reported based on the self-identified gender of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com

Dataset funded by

Neilsberg Research

Description

About this dataset

Context

The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within Granada. The dataset can be utilized to gain insights into gender-based income distribution within the Granada population, aiding in data analysis and decision-making..

Key observations

Employment patterns: Within Granada, among individuals aged 15 years and older with income, there were 165 men and 205 women in the workforce. Among them, 100 men were engaged in full-time, year-round employment, while 100 women were in full-time, year-round roles.
Annual income under $24,999: Of the male population working full-time, 1% fell within the income range of under $24,999, while 24% of the female population working full-time was represented in the same income bracket.
Annual income above $100,000: 10% of men in full-time roles earned incomes exceeding $100,000, while 4% of women in full-time positions earned within this income bracket.
Refer to the research insights for more key observations on more income brackets ( Annual income under $24,999, Annual income between $25,000 and $49,999, Annual income between $50,000 and $74,999, Annual income between $75,000 and $99,999 and Annual income above $100,000) and employment types (full-time year-round and part-time)

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Income brackets:

$1 to $2,499 or loss
$2,500 to $4,999
$5,000 to $7,499
$7,500 to $9,999
$10,000 to $12,499
$12,500 to $14,999
$15,000 to $17,499
$17,500 to $19,999
$20,000 to $22,499
$22,500 to $24,999
$25,000 to $29,999
$30,000 to $34,999
$35,000 to $39,999
$40,000 to $44,999
$45,000 to $49,999
$50,000 to $54,999
$55,000 to $64,999
$65,000 to $74,999
$75,000 to $99,999
$100,000 or more

Variables / Data Columns

Income Bracket: This column showcases 20 income brackets ranging from $1 to $100,000+..
Full-Time Males: The count of males employed full-time year-round and earning within a specified income bracket
Part-Time Males: The count of males employed part-time and earning within a specified income bracket
Full-Time Females: The count of females employed full-time year-round and earning within a specified income bracket
Part-Time Females: The count of females employed part-time and earning within a specified income bracket

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.
Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Granada median household income by race. You can refer the same here

Clear search

Close search

Google apps

Main menu

Granada, CO annual income distribution by work experience and gender...

About this dataset

Content

Inspiration

Recommended for further research

De Soto, WI annual income distribution by work experience and gender...

About this dataset

Content

Inspiration

Recommended for further research

Income of individuals by age group, sex and income source, Canada, provinces...

Survey of Consumer Finances

Grove Hill, AL annual income distribution by work experience and gender...

About this dataset

Content

Inspiration

Recommended for further research

Data from: Porpoise Observation Database (NRM)

Meta data and supporting documentation

Dataset for paper "Mitigating the effect of errors in source parameters on...

Difficulty and Time Perceptions of Preparatory Activities for Quitting...

BIP! DB: A Dataset of Impact Measures for Research Products

YouTube RPM by Niche (2025)

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

NewsUnravel Dataset

Data for: The Pandemic Journaling Project, Phase One (PJP-1)

Data from: A large-scale COVID-19 Twitter chatter dataset for open...

August 2024 data-update for "Updated science-wide author databases of...

Data from: Aircraft Marshaling Signals Dataset of FMCW Radar and Event-Based...

State of Nature layers for Water Availability and Water Pollution to support...

Data from: Lending Club loan dataset for granting models

FULL LIST OF VARIABLES

RELATED WORKS

Person action detection dataset

Context

Content

Acknowledgements

Inspiration

Granada, CO annual income distribution by work experience and gender dataset: Number of individuals ages 15+ with income, 2023 // 2025 EditionSee More Versions

About this dataset

Content

Inspiration

Recommended for further research

Granada, CO annual income distribution by work experience and gender dataset: Number of individuals ages 15+ with income, 2023 // 2025 Edition