16 datasets found
  1. MERGE Dataset

    • zenodo.org
    zip
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Lima Louro; Pedro Lima Louro; Hugo Redinho; Hugo Redinho; Ricardo Santos; Ricardo Santos; Ricardo Malheiro; Ricardo Malheiro; Renato Panda; Renato Panda; Rui Pedro Paiva; Rui Pedro Paiva (2025). MERGE Dataset [Dataset]. http://doi.org/10.5281/zenodo.13939205
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pedro Lima Louro; Pedro Lima Louro; Hugo Redinho; Hugo Redinho; Ricardo Santos; Ricardo Santos; Ricardo Malheiro; Ricardo Malheiro; Renato Panda; Renato Panda; Rui Pedro Paiva; Rui Pedro Paiva
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The MERGE dataset is a collection of audio, lyrics, and bimodal datasets for conducting research on Music Emotion Recognition. A complete version is provided for each modality. The audio datasets provide 30-second excerpts for each sample, while full lyrics are provided in the relevant datasets. The amount of available samples in each dataset is the following:

    • MERGE Audio Complete: 3554
    • MERGE Audio Balanced: 3232
    • MERGE Lyrics Complete: 2568
    • MERGE Lyrics Balanced: 2400
    • MERGE Bimodal Complete: 2216
    • MERGE Bimodal Balanced: 2000

    Additional Contents

    Each dataset contains the following additional files:

    • av_values: File containing the arousal and valence values for each sample sorted by their identifier;
    • tvt_dataframes: Train, validate, and test splits for each dataset. Both a 70-15-15 and a 40-30-30 split are provided.

    Metadata

    A metadata spreadsheet is provided for each dataset with the following information for each sample, if available:

    • Song (Audio and Lyrics datasets) - Song identifiers. Identifiers starting with MT were extracted from the AllMusic platform, while those starting with A or L were collected from private collections;
    • Quadrant - Label corresponding to one of the four quadrants from Russell's Circumplex Model;
    • AllMusic Id - For samples starting with A or L, the matching AllMusic identifier is also provided. This was used to complement the available information for the samples originally obtained from the platform;
    • Artist - First performing artist or band;
    • Title - Song title;
    • Relevance - AllMusic metric representing the relevance of the song in relation to the query used;
    • Duration - Song length in seconds;
    • Moods - User-generated mood tags extracted from the AllMusic platform and available in Warriner's affective dictionary;
    • MoodsAll - User-generated mood tags extracted from the AllMusic platform;
    • Genres - User-generated genre tags extracted from the AllMusic platform;
    • Themes - User-generated theme tags extracted from the AllMusic platform;
    • Styles - User-generated style tags extracted from the AllMusic platform;
    • AppearancesTrackIDs - All AllMusic identifiers related with a sample;
    • Sample - Availability of the sample in the AllMusic platform;
    • SampleURL - URL to the 30-second excerpt in AllMusic;
    • ActualYear - Year of song release.

    Citation

    If you use some part of the MERGE dataset in your research, please cite the following article:

    Louro, P. L. and Redinho, H. and Santos, R. and Malheiro, R. and Panda, R. and Paiva, R. P. (2024). MERGE - A Bimodal Dataset For Static Music Emotion Recognition. arxiv. URL: https://arxiv.org/abs/2407.06060.

    BibTeX:

    @misc{louro2024mergebimodaldataset,
    title={MERGE -- A Bimodal Dataset for Static Music Emotion Recognition},
    author={Pedro Lima Louro and Hugo Redinho and Ricardo Santos and Ricardo Malheiro and Renato Panda and Rui Pedro Paiva},
    year={2024},
    eprint={2407.06060},
    archivePrefix={arXiv},
    primaryClass={cs.SD},
    url={https://arxiv.org/abs/2407.06060},
    }

    Acknowledgements

    This work is funded by FCT - Foundation for Science and Technology, I.P., within the scope of the projects: MERGE - DOI: 10.54499/PTDC/CCI-COM/3171/2021 financed with national funds (PIDDAC) via the Portuguese State Budget; and project CISUC - UID/CEC/00326/2020 with funds from the European Social Fund, through the Regional Operational Program Centro 2020.

    Renato Panda was supported by Ci2 - FCT UIDP/05567/2020.

  2. NSF/NCAR GV HIAPER 1 Minute Data Merge

    • data.ucar.edu
    ascii
    Updated Dec 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gao Chen; Jennifer R. Olson; Michael Shook (2024). NSF/NCAR GV HIAPER 1 Minute Data Merge [Dataset]. http://doi.org/10.26023/R1RA-JHKZ-W913
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    University Corporation for Atmospheric Research
    Authors
    Gao Chen; Jennifer R. Olson; Michael Shook
    Time period covered
    May 18, 2012 - Jun 30, 2012
    Area covered
    Description

    This data set contains NSF/NCAR GV HIAPER 1 Minute Data Merge data collected during the Deep Convective Clouds and Chemistry Experiment (DC3) from 18 May 2012 through 30 June 2012. These are updated merges from the NASA DC3 archive that were made available 13 June 2014. In most cases, variable names have been kept identical to those submitted in the raw data files. However, in some cases, names have been changed (e.g., to eliminate duplication). Units have been standardized throughout the merge. In addition, a "grand merge" has been provided. This includes data from all the individual merged flights throughout the mission. This grand merge will follow the following naming convention: "dc3-mrg60-gV_merge_YYYYMMdd_R5_thruYYYYMMdd.ict" (with the comment "_thruYYYYMMdd" indicating the last flight date included). This data set is in ICARTT format. Please see the header portion of the data files for details on instruments, parameters, quality assurance, quality control, contact information, and data set comments.

  3. NASA DC-8 SAGAAERO Data Merge

    • data.ucar.edu
    ascii
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gao Chen; Jennifer R. Olson; Michael Shook (2024). NASA DC-8 SAGAAERO Data Merge [Dataset]. http://doi.org/10.26023/ANQE-HZRR-P30K
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    University Corporation for Atmospheric Research
    Authors
    Gao Chen; Jennifer R. Olson; Michael Shook
    Time period covered
    May 18, 2012 - Jun 22, 2012
    Area covered
    Description

    This data set contains NASA DC-8 SAGAAERO Data Merge data collected during the Deep Convective Clouds and Chemistry Experiment (DC3) from 18 May 2012 through 22 June 2012. These merge files were updated by NASA. The data have been merged to SAGAAero file timeline. In most cases, variable names have been kept identical to those submitted in the raw data files. However, in some cases, names have been changed (e.g., to eliminate duplication). Units have been standardized throughout the merge. In addition, a "grand merge" has been provided. This includes data from all the individual merged flights throughout the mission. This grand merge will follow the following naming convention: "dc3-mrgSAGAAero-dc8_merge_YYYYMMdd_R*_thruYYYYMMdd.ict" (with the comment "_thruYYYYMMdd" indicating the last flight date included). This data set is in ICARTT format. Please see the header portion of the data files for details on instruments, parameters, quality assurance, quality control, contact information, and data set comments.

  4. d

    Health and Retirement Study (HRS)

    • search.dataone.org
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D

  5. NASA DC-8 1 Minute Data Merge

    • data.ucar.edu
    ascii
    Updated Dec 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gao Chen; Jennifer R. Olson; Michael Shook (2024). NASA DC-8 1 Minute Data Merge [Dataset]. http://doi.org/10.26023/VM9C-1C16-H003
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    University Corporation for Atmospheric Research
    Authors
    Gao Chen; Jennifer R. Olson; Michael Shook
    Time period covered
    May 1, 2012 - Jun 30, 2012
    Area covered
    Description

    This dataset contains NASA DC-8 1 Minute Data Merge data collected during the Deep Convective Clouds and Chemistry Experiment (DC3) from 18 May 2012 through 22 June 2012. This dataset contains updated data provided by NASA. In most cases, variable names have been kept identical to those submitted in the raw data files. However, in some cases, names have been changed (e.g., to eliminate duplication). Units have been standardized throughout the merge. In addition, a "grand merge" has been provided. This includes data from all the individual merged flights throughout the mission. This grand merge will follow the following naming convention: "dc3-mrg60-dc8_merge_YYYYMMdd_R5_thruYYYYMMdd.ict" (with the comment "_thruYYYYMMdd" indicating the last flight date included). This dataset is in ICARTT format. Please see the header portion of the data files for details on instruments, parameters, quality assurance, quality control, contact information, and dataset comments. For more information on updates to this dataset, please see the readme file.

  6. Z

    Data from: A dataset to model Levantine landcover and land-use change...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kempf, Michael (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10396147
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset authored and provided by
    Kempf, Michael
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Levant
    Description

    Overview

    This dataset is the repository for the following paper submitted to Data in Brief:

    Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

    The Data in Brief article contains the supplement information and is the related data paper to:

    Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

    Description/abstract

    The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

    Folder structure

    The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

    “code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

    “MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

    “mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

    “yield_productivity” contains .csv files of yield information for all countries listed above.

    “population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

    “GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

    “built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

    Code structure

    1_MODIS_NDVI_hdf_file_extraction.R

    This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.

    2_MERGE_MODIS_tiles.R

    In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").

    3_CROP_MODIS_merged_tiles.R

    Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS. The repository provides the already clipped and merged NDVI datasets.

    4_TREND_analysis_NDVI.R

    Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.

    5_BUILT_UP_change_raster.R

    Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.

    6_POPULATION_numbers_plot.R

    For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.

    7_YIELD_plot.R

    In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.

    8_GLDAS_read_extract_trend

    The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection). Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.

    (9_workflow_diagramme) this simple code can be used to plot a workflow diagram and is detached from the actual analysis.

    Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization, Supervision, Project administration, and Funding acquisition: Michael

  7. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  8. Data from: KORUS-AQ Aircraft Merge Data Files

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Jul 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA/LARC/SD/ASDC (2025). KORUS-AQ Aircraft Merge Data Files [Dataset]. https://catalog.data.gov/dataset/korus-aq-aircraft-merge-data-files
    Explore at:
    Dataset updated
    Jul 3, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    KORUSAQ_Merge_Data are pre-generated merge data files combining various products collected during the KORUS-AQ field campaign. This collection features pre-generated merge files for the DC-8 aircraft. Data collection for this product is complete.The KORUS-AQ field study was conducted in South Korea during May-June, 2016. The study was jointly sponsored by NASA and Korea’s National Institute of Environmental Research (NIER). The primary objectives were to investigate the factors controlling air quality in Korea (e.g., local emissions, chemical processes, and transboundary transport) and to assess future air quality observing strategies incorporating geostationary satellite observations. To achieve these science objectives, KORUS-AQ adopted a highly coordinated sampling strategy involved surface and airborne measurements including both in-situ and remote sensing instruments.Surface observations provided details on ground-level air quality conditions while airborne sampling provided an assessment of conditions aloft relevant to satellite observations and necessary to understand the role of emissions, chemistry, and dynamics in determining air quality outcomes. The sampling region covers the South Korean peninsula and surrounding waters with a primary focus on the Seoul Metropolitan Area. Airborne sampling was primarily conducted from near surface to about 8 km with extensive profiling to characterize the vertical distribution of pollutants and their precursors. The airborne observational data were collected from three aircraft platforms: the NASA DC-8, NASA B-200, and Hanseo King Air. Surface measurements were conducted from 16 ground sites and 2 ships: R/V Onnuri and R/V Jang Mok.The major data products collected from both the ground and air include in-situ measurements of trace gases (e.g., ozone, reactive nitrogen species, carbon monoxide and dioxide, methane, non-methane and oxygenated hydrocarbon species), aerosols (e.g., microphysical and optical properties and chemical composition), active remote sensing of ozone and aerosols, and passive remote sensing of NO2, CH2O, and O3 column densities. These data products support research focused on examining the impact of photochemistry and transport on ozone and aerosols, evaluating emissions inventories, and assessing the potential use of satellite observations in air quality studies.

  9. g

    KORUS-AQ Aircraft Merge Data Files | gimi9.com

    • gimi9.com
    Updated Sep 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). KORUS-AQ Aircraft Merge Data Files | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_korus-aq-aircraft-merge-data-files/
    Explore at:
    Dataset updated
    Sep 30, 2018
    Description

    KORUSAQ_Merge_Data are pre-generated merge data files combining various products collected during the KORUS-AQ field campaign. This collection features pre-generated merge files for the DC-8 aircraft. Data collection for this product is complete.The KORUS-AQ field study was conducted in South Korea during May-June, 2016. The study was jointly sponsored by NASA and Korea’s National Institute of Environmental Research (NIER). The primary objectives were to investigate the factors controlling air quality in Korea (e.g., local emissions, chemical processes, and transboundary transport) and to assess future air quality observing strategies incorporating geostationary satellite observations. To achieve these science objectives, KORUS-AQ adopted a highly coordinated sampling strategy involved surface and airborne measurements including both in-situ and remote sensing instruments.Surface observations provided details on ground-level air quality conditions while airborne sampling provided an assessment of conditions aloft relevant to satellite observations and necessary to understand the role of emissions, chemistry, and dynamics in determining air quality outcomes. The sampling region covers the South Korean peninsula and surrounding waters with a primary focus on the Seoul Metropolitan Area. Airborne sampling was primarily conducted from near surface to about 8 km with extensive profiling to characterize the vertical distribution of pollutants and their precursors. The airborne observational data were collected from three aircraft platforms: the NASA DC-8, NASA B-200, and Hanseo King Air. Surface measurements were conducted from 16 ground sites and 2 ships: R/V Onnuri and R/V Jang Mok.The major data products collected from both the ground and air include in-situ measurements of trace gases (e.g., ozone, reactive nitrogen species, carbon monoxide and dioxide, methane, non-methane and oxygenated hydrocarbon species), aerosols (e.g., microphysical and optical properties and chemical composition), active remote sensing of ozone and aerosols, and passive remote sensing of NO2, CH2O, and O3 column densities. These data products support research focused on examining the impact of photochemistry and transport on ozone and aerosols, evaluating emissions inventories, and assessing the potential use of satellite observations in air quality studies.

  10. NASA DC-8 10 Second Data Merge

    • data.ucar.edu
    archive
    Updated Dec 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gao Chen; Jennifer R. Olson; Michael Shook (2024). NASA DC-8 10 Second Data Merge [Dataset]. http://doi.org/10.26023/CHJ0-RYQ4-GR10
    Explore at:
    archiveAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    University Corporation for Atmospheric Research
    Authors
    Gao Chen; Jennifer R. Olson; Michael Shook
    Time period covered
    May 18, 2012 - Jun 22, 2012
    Area covered
    Description

    This data set contains NASA DC-8 10 Second Data Merge data collected during the Deep Convective Clouds and Chemistry Experiment (DC3) from 18 May 2012 through 22 June 2012. These merges are an updated version provided by NASA. In most cases, variable names have been kept identical to those submitted in the raw data files. However, in some cases, names have been changed (e.g., to eliminate duplication). Units have been standardized throughout the merge. In addition, a "grand merge" has been provided. This includes data from all the individual merged flights throughout the mission. This grand merge will follow the following naming convention: "dc3-mrg10-dc8_merge_YYYYMMdd_R*_thruYYYYMMdd.ict" (with the comment "_thruYYYYMMdd" indicating the last flight date included). This data set is in ICARTT format. Please see the header portion of the data files for details on instruments, parameters, quality assurance, quality control, contact information, and data set comments. For the latest information on the updates to this dataset, please see the readme file.

  11. Updated Australian bathymetry: merged 250m bathyTopo

    • data.csiro.au
    • researchdata.edu.au
    Updated Sep 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian O'Grady; Claire Trenham; Ron Hoeke (2021). Updated Australian bathymetry: merged 250m bathyTopo [Dataset]. http://doi.org/10.25919/cm17-xc81
    Explore at:
    Dataset updated
    Sep 15, 2021
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Julian O'Grady; Claire Trenham; Ron Hoeke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2009 - Aug 31, 2021
    Area covered
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Description

    Accurate coastal wave and hydrodynamic modelling relies on quality bathymetric input. Many national scale modelling studies, hindcast and forecast products, have, or are currently using a 2009 digital elevation model (DEM), which does not include recently available bathymetric surveys and is now out of date. There are immediate needs for an updated national product, preceding the delivery of the AusSeabed program’s Global Multi-Resolution Topography for Australian coastal and ocean models. There are also challenges in stitching coarse resolution DEMs, which are often too shallow where they meet high-resolution information (e.g. LiDAR surveys) and require supervised/manual modifications (e.g. NSW, Perth, and Portland VIC bathymetries). This report updates the 2009 topography and bathymetry with a selection of nearshore surveys and demonstrates where the 2009 dataset and nearshore bathymetries do not matchup. Lineage: All of the datasets listed in Table 1 (see supporting files) were used in previous CSIRO internal projects or download from online data portals and processed using QGIS and R’s ‘raster’ package. The Perth LiDAR surveys were provided as points and gridded in R using raster::rasterFromXYZ(). The Macquarie Harbour contour lines were regridded in QGIS using the TIN interpolator. Each dataset was mapped with an accompanying Type Identifier (TID) following the conventions of the GEBCO dataset. The mapping went through several iterations, at each iteration the blending was checked for inconstancy, i.e., where the GA250m DEM was too shallow when it met the high-resolution LiDAR surveys. QGIS v3.16.4 was used to draw masks over inconstant blending and GA250 values falling within the mask and between two depths were assigned NA (no-data). LiDAR datasets were projected to +proj=longlat +datum=WGS84 +no_defs using raster::projectRaster(), resampled to the GA250 grid using raster::resample() and then merged with raster::merge(). Nearest neighbour resampling was performed for all datasets except for GEBCO ~500m product, which used the bilinear method. The order of the mapping overlay is sequential from TID = 1 being the base, through to 107, where 0 is the gap filled values.

    Permissions are required for all code and internal datasets (Contact Julian OGrady).

  12. NSF/NCAR GV HIAPER 10 Second Data Merge

    • data.ucar.edu
    ascii
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gao Chen; Jennifer R. Olson; Michael Shook (2024). NSF/NCAR GV HIAPER 10 Second Data Merge [Dataset]. http://doi.org/10.26023/XJSD-81JE-6J0C
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    University Corporation for Atmospheric Research
    Authors
    Gao Chen; Jennifer R. Olson; Michael Shook
    Time period covered
    May 18, 2012 - Jun 30, 2012
    Area covered
    Description

    This data set contains NSF/NCAR GV HIAPER 10 Second Data Merge data collected during the Deep Convective Clouds and Chemistry Experiment (DC3) from 18 May 2012 through 30 June 2012. These are updatd merges available through the NASA DC3 archive as of 13 June 2014. In most cases, variable names have been kept identical to those submitted in the raw data files. However, in some cases, names have been changed (e.g., to eliminate duplication). Units have been standardized throughout the merge. In addition, a "grand merge" has been provided. This includes data from all the individual merged flights throughout the mission. This grand merge will follow the following naming convention: "dc3-mrg10-gV_merge_YYYYMMdd_R5_thruYYYYMMdd.ict" (with the comment "_thruYYYYMMdd" indicating the last flight date included). This data set is in ICARTT format. Please see the header portion of the data files for details on instruments, parameters, quality assurance, quality control, contact information, and data set comments.

  13. DLR Falcon 1 Second Data Merge

    • data.ucar.edu
    ascii
    Updated Dec 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gao Chen; Jennifer R. Olson; Michael Shook (2024). DLR Falcon 1 Second Data Merge [Dataset]. http://doi.org/10.26023/PYCX-YRVR-AB0W
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    University Corporation for Atmospheric Research
    Authors
    Gao Chen; Jennifer R. Olson; Michael Shook
    Time period covered
    May 29, 2012 - Jun 14, 2012
    Area covered
    Description

    This data set contains DLR Falcon 1 Second Data Merge data collected during the Deep Convective Clouds and Chemistry Experiment (DC3) from 29 May 2012 through 14 June 2012. These merges were created using data in the NASA DC3 archive as of September 25, 2013. In most cases, variable names have been kept identical to those submitted in the raw data files. However, in some cases, names have been changed (e.g., to eliminate duplication). Units have been standardized throughout the merge. In addition, a "grand merge" has been provided. This includes data from all the individual merged flights throughout the mission. This grand merge will follow the following naming convention: "dc3-mrg01-falcon_merge_YYYYMMdd_R1_thruYYYYMMdd.ict" (with the comment "_thruYYYYMMdd" indicating the last flight date included). This data set is in ICARTT format. Please see the header portion of the data files for details on instruments, parameters, quality assurance, quality control, contact information, and data set comments.

  14. R code, data, and analysis documentation for Colour biases in learned...

    • figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wyatt Toure; Simon M. Reader (2023). R code, data, and analysis documentation for Colour biases in learned foraging preferences in Trinidadian guppies [Dataset]. http://doi.org/10.6084/m9.figshare.14404868.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Wyatt Toure; Simon M. Reader
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary---------------This is the repository containing the R code and data to produce the analyses and figures in the manuscript ‘Colour biases in learned foraging preferences in Trinidadian guppies’. R version 3.6.2 was used for this project. Here, we explain how to reproduce the results, provides the location of the metadata for the data sheets, and gives descriptions of the root directory contents and folder contents. This material is adapted from the README file of the project, README.md which is located in the root directory.How to reproduce the results-------------------------------------------This project uses the renv package from RStudio to manage package dependencies and ensure reproducibility through time. To ensure results are reproduced based on the versions of the packages used at the time this project was created, you will need to install renv using install.packages("renv") in R.If you want to reproduce the results it is best to download the entire repository onto your system. This can be done by clicking the Download button on the FigShare repository (DOI: 10.6084/m9.figshare.14404868). This will download a zip file of the entire repository. Unzip the zip file to get access to the project files.Once the repository is downloaded onto your system, navigate to the root directory and open guppy-colour-learning-project.Rproj. It is important to open the project using the .Rproj file to ensure the working directory is set correctly. Then install the package dependencies onto your system using renv::restore(). Running renv::restore() will install the correct versions of all the packages needed to reproduce our results. Packages are installed in a stand-alone library for this project and will not affect your installed R packages anywhere else.If you want to reproduce specific results from the analyses you can open either analysis-experiment-1.Rmd for results from experiment 1 or analysis-experiment-2.Rmd for results from experiment 2. Both are located in the root directory. You can select the Run All option under the Code option in the navbar of RStudio to execute all the code chunks. You can also run all chunks independently as well though we advise that you do so sequentially since variables necessary for the analysis are created as the script progresses.Metadata--------------Data are available in the data/ directory. - colour-learning-experiment-1-data.csv are the data for experiment 1- colour-learning-experiment-2-full-data.csv are the data for experiment 2We provide the variable descriptions for the data sets in the file metadata.md located in the data/ directory. The packages required to conduct the analyses and construct the website as well as their versions and citations are provided in the file required-r-packages.md.Directory structure---------------------------- - data/ contains the raw data used to conduct the analyses - docs/ contains the reader-friendly html write-up of the analyses, the GitHub pages site is built from this folder - R/ contains custom R functions used in the analysis - references/ contains reference information and formatting for citations used in the project - renv/ contains an activation script and configuration files for the renv package manager - figs/ contains the individual files for the figures and residual diagnostic plots produced by the analysis scripts. This directory is created and populated by running analysis-experiment-1.Rmd, analysis-experiment-2.Rmd and combined-figures.RmdRoot directory contents------------------------------------The root directory contains Rmd scripts used to conduct the analyses, create figures, and render the website pages. Below we describe the contents of these files as well as the additional files contained in the root directory. - analysis-experiment-1.Rmd is the R code and documentation for the experiment 1 data preparation and analysis. This script generates the Analysis 1 page of the website. - analysis-experiment-2.Rmd is the R code and documentation for the experiment 2 data preparation and analysis. This script generates the Analysis 2 page of the website. - protocols.Rmd contains the protocols used to conduct the experiments and generate the data. This script generates the Protocols page of the website. - index.Rmd creates the Homepage of the project site. - combined-figures.Rmd is the R code used to create figures that combine data from experiments 1 and 2. Not used in the project site. - treatment-object-side-assignment.Rmd is the R code used to assign treatments and object sides during trials for experiment 2. Not used in the project site. - renv.lock is a JSON formatted plain text file which contains package information for the project. renv will install the packages listed in this file upon executing renv::restore() - required-r-packages.md is a plain text file containing the versions and sources of the packages required for the project. - styles.css contains the CSS formatting for the rendered html pages - LICENSE.md contains the license indicating the conditions upon which the code can be reused - guppy-colour-learning-project.Rproj is the R project file which sets the working directory of the R instance to the root directory of this repository. If trying to run the code in this repository to reproduce results it is important to open R by clicking on this .Rproj file.

  15. m

    Short-range Early Phase COVID-19 Forecasting R-Project and Data

    • data.mendeley.com
    Updated Oct 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Lynch (2023). Short-range Early Phase COVID-19 Forecasting R-Project and Data [Dataset]. http://doi.org/10.17632/cytrb8p42g.3
    Explore at:
    Dataset updated
    Oct 13, 2023
    Authors
    Christopher Lynch
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This R-Project and its data files are provided in support of ongoing research efforts for forecasting COVID-19 cumulative case growth at varied geographic levels. All code and data files are provided to facilitate reproducibility of current research findings. Seven forecasting methods are evaluated with respect to their effectiveness at forecasting one-, three-, and seven-day cumulative COVID-19 cases, including: (1) a Naïve approach; (2) Holt-Winters exponential smoothing; (3) growth rate; (4) moving average (MA); (5) autoregressive (AR); (6) autoregressive moving average (ARMA); and (7) autoregressive integrated moving average (ARIMA). This package is developed to be directly opened and run in RStudio through the provided RProject file. Code developed using R version 3.6.3.

    This software generates the findings of the article entitled "Short-range forecasting of coronavirus disease 2019 (COVID-19) during early onset at county, health district, and state geographic levels: Comparative forecasting approach using seven forecasting methods" using cumulative case counts reported by The New York Times up to April 22, 2020. This package provides two avenues for reproducing results: 1) Regenerate the forecasts from scratch using the provided code and data files and then run the analyses; or 2) Load the saved forecast data and run the analyses on the existing data

    Two related publications listed in "Related Links".

    License info can be viewed in "License Info.txt". The "RProject" folder contains the RProject file which opens the project in RStudio with the desired working directory set. README files in each sub-folder provide additional detail on each folders' contents.

    Copyright (c) 2020 Christopher J. Lynch and Ross Gore Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

    Except as contained in this notice, the name(s) of the above copyright holders shall not be used in advertising or otherwise to promote the sale, use, or other dealings in this Software without prior written authorization.

  16. Designing Types for R, Empirically (Dataset)

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, pdf
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexi Turcotte; Aviral Goel; Filip Krikava; Jan Vitek; Alexi Turcotte; Aviral Goel; Filip Krikava; Jan Vitek (2024). Designing Types for R, Empirically (Dataset) [Dataset]. http://doi.org/10.5281/zenodo.4091818
    Explore at:
    pdf, application/gzipAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexi Turcotte; Aviral Goel; Filip Krikava; Jan Vitek; Alexi Turcotte; Aviral Goel; Filip Krikava; Jan Vitek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is intended to accompany the paper "Designing Types for R, Empirically" (@ OOPSLA'20, link to paper). This data was obtained by running the Typetracer (aka propagatr) dynamic analysis tool (link to tool) on the test, example, and vignette code of a corpus of >400 extensively used R packages.

    Specifically, this dataset contains:

    1. function type traces for >400 R packages (raw-traces.tar.gz);
    2. trace data processed into a more readable/usable form (processed-traces.tar.gz), which was used in obtaining results in the paper;
    3. inferred type declarations for the >400 R packages using various strategies to merge the processed traces (see type-declarations-* directories), and finally;
    4. contract assertion data from running the reverse dependencies of these packages and checking function usage against the declared types (contract-assertion-reverse-dependencies.tar.gz).

    A preprint of the paper is also included, which summarizes our findings.

    Fair warning Re: data size: the raw traces, once uncompressed, take up nearly 600GB. The already processed traces are in the 10s of GB, which should be more manageable for a consumer-grade computer.

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Pedro Lima Louro; Pedro Lima Louro; Hugo Redinho; Hugo Redinho; Ricardo Santos; Ricardo Santos; Ricardo Malheiro; Ricardo Malheiro; Renato Panda; Renato Panda; Rui Pedro Paiva; Rui Pedro Paiva (2025). MERGE Dataset [Dataset]. http://doi.org/10.5281/zenodo.13939205
Organization logo

MERGE Dataset

Explore at:
zipAvailable download formats
Dataset updated
Feb 7, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pedro Lima Louro; Pedro Lima Louro; Hugo Redinho; Hugo Redinho; Ricardo Santos; Ricardo Santos; Ricardo Malheiro; Ricardo Malheiro; Renato Panda; Renato Panda; Rui Pedro Paiva; Rui Pedro Paiva
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The MERGE dataset is a collection of audio, lyrics, and bimodal datasets for conducting research on Music Emotion Recognition. A complete version is provided for each modality. The audio datasets provide 30-second excerpts for each sample, while full lyrics are provided in the relevant datasets. The amount of available samples in each dataset is the following:

  • MERGE Audio Complete: 3554
  • MERGE Audio Balanced: 3232
  • MERGE Lyrics Complete: 2568
  • MERGE Lyrics Balanced: 2400
  • MERGE Bimodal Complete: 2216
  • MERGE Bimodal Balanced: 2000

Additional Contents

Each dataset contains the following additional files:

  • av_values: File containing the arousal and valence values for each sample sorted by their identifier;
  • tvt_dataframes: Train, validate, and test splits for each dataset. Both a 70-15-15 and a 40-30-30 split are provided.

Metadata

A metadata spreadsheet is provided for each dataset with the following information for each sample, if available:

  • Song (Audio and Lyrics datasets) - Song identifiers. Identifiers starting with MT were extracted from the AllMusic platform, while those starting with A or L were collected from private collections;
  • Quadrant - Label corresponding to one of the four quadrants from Russell's Circumplex Model;
  • AllMusic Id - For samples starting with A or L, the matching AllMusic identifier is also provided. This was used to complement the available information for the samples originally obtained from the platform;
  • Artist - First performing artist or band;
  • Title - Song title;
  • Relevance - AllMusic metric representing the relevance of the song in relation to the query used;
  • Duration - Song length in seconds;
  • Moods - User-generated mood tags extracted from the AllMusic platform and available in Warriner's affective dictionary;
  • MoodsAll - User-generated mood tags extracted from the AllMusic platform;
  • Genres - User-generated genre tags extracted from the AllMusic platform;
  • Themes - User-generated theme tags extracted from the AllMusic platform;
  • Styles - User-generated style tags extracted from the AllMusic platform;
  • AppearancesTrackIDs - All AllMusic identifiers related with a sample;
  • Sample - Availability of the sample in the AllMusic platform;
  • SampleURL - URL to the 30-second excerpt in AllMusic;
  • ActualYear - Year of song release.

Citation

If you use some part of the MERGE dataset in your research, please cite the following article:

Louro, P. L. and Redinho, H. and Santos, R. and Malheiro, R. and Panda, R. and Paiva, R. P. (2024). MERGE - A Bimodal Dataset For Static Music Emotion Recognition. arxiv. URL: https://arxiv.org/abs/2407.06060.

BibTeX:

@misc{louro2024mergebimodaldataset,
title={MERGE -- A Bimodal Dataset for Static Music Emotion Recognition},
author={Pedro Lima Louro and Hugo Redinho and Ricardo Santos and Ricardo Malheiro and Renato Panda and Rui Pedro Paiva},
year={2024},
eprint={2407.06060},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2407.06060},
}

Acknowledgements

This work is funded by FCT - Foundation for Science and Technology, I.P., within the scope of the projects: MERGE - DOI: 10.54499/PTDC/CCI-COM/3171/2021 financed with national funds (PIDDAC) via the Portuguese State Budget; and project CISUC - UID/CEC/00326/2020 with funds from the European Social Fund, through the Regional Operational Program Centro 2020.

Renato Panda was supported by Ci2 - FCT UIDP/05567/2020.

Search
Clear search
Close search
Google apps
Main menu