62 datasets found
  1. Example of how to manually extract incubation bouts from interactive plots...

    • figshare.com
    txt
    Updated Jan 22, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Bulla (2016). Example of how to manually extract incubation bouts from interactive plots of raw data - R-CODE and DATA [Dataset]. http://doi.org/10.6084/m9.figshare.2066784.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 22, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Martin Bulla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    {# General information# The script runs with R (Version 3.1.1; 2014-07-10) and packages plyr (Version 1.8.1), XLConnect (Version 0.2-9), utilsMPIO (Version 0.0.25), sp (Version 1.0-15), rgdal (Version 0.8-16), tools (Version 3.1.1) and lattice (Version 0.20-29)# --------------------------------------------------------------------------------------------------------# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# -------------------------------------------------------------------------------------------------------- # Data collection and how the individual variables were derived is described in: #Steiger, S.S., et al., When the sun never sets: diverse activity rhythms under continuous daylight in free-living arctic-breeding birds. Proceedings of the Royal Society B: Biological Sciences, 2013. 280(1764): p. 20131016-20131016. # Dale, J., et al., The effects of life history and sexual selection on male and female plumage colouration. Nature, 2015. # Data are available as Rdata file # Missing values are NA. # --------------------------------------------------------------------------------------------------------# For better readability the subsections of the script can be collapsed # --------------------------------------------------------------------------------------------------------}{# Description of the method # 1 - data are visualized in an interactive actogram with time of day on x-axis and one panel for each day of data # 2 - red rectangle indicates the active field, clicking with the mouse in that field on the depicted light signal generates a data point that is automatically (via custom made function) saved in the csv file. For this data extraction I recommend, to click always on the bottom line of the red rectangle, as there is always data available due to a dummy variable ("lin") that creates continuous data at the bottom of the active panel. The data are captured only if greenish vertical bar appears and if new line of data appears in R console). # 3 - to extract incubation bouts, first click in the new plot has to be start of incubation, then next click depict end of incubation and the click on the same stop start of the incubation for the other sex. If the end and start of incubation are at different times, the data will be still extracted, but the sex, logger and bird_ID will be wrong. These need to be changed manually in the csv file. Similarly, the first bout for a given plot will be always assigned to male (if no data are present in the csv file) or based on previous data. Hence, whenever a data from a new plot are extracted, at a first mouse click it is worth checking whether the sex, logger and bird_ID information is correct and if not adjust it manually. # 4 - if all information from one day (panel) is extracted, right-click on the plot and choose "stop". This will activate the following day (panel) for extraction. # 5 - If you wish to end extraction before going through all the rectangles, just press "escape". }{# Annotations of data-files from turnstone_2009_Barrow_nest-t401_transmitter.RData dfr-- contains raw data on signal strength from radio tag attached to the rump of female and male, and information about when the birds where captured and incubation stage of the nest1. who: identifies whether the recording refers to female, male, capture or start of hatching2. datetime_: date and time of each recording3. logger: unique identity of the radio tag 4. signal_: signal strength of the radio tag5. sex: sex of the bird (f = female, m = male)6. nest: unique identity of the nest7. day: datetime_ variable truncated to year-month-day format8. time: time of day in hours9. datetime_utc: date and time of each recording, but in UTC time10. cols: colors assigned to "who"--------------------------------------------------------------------------------------------------------m-- contains metadata for a given nest1. sp: identifies species (RUTU = Ruddy turnstone)2. nest: unique identity of the nest3. year_: year of observation4. IDfemale: unique identity of the female5. IDmale: unique identity of the male6. lat: latitude coordinate of the nest7. lon: longitude coordinate of the nest8. hatch_start: date and time when the hatching of the eggs started 9. scinam: scientific name of the species10. breeding_site: unique identity of the breeding site (barr = Barrow, Alaska)11. logger: type of device used to record incubation (IT - radio tag)12. sampling: mean incubation sampling interval in seconds--------------------------------------------------------------------------------------------------------s-- contains metadata for the incubating parents1. year_: year of capture2. species: identifies species (RUTU = Ruddy turnstone)3. author: identifies the author who measured the bird4. nest: unique identity of the nest5. caught_date_time: date and time when the bird was captured6. recapture: was the bird capture before? (0 - no, 1 - yes)7. sex: sex of the bird (f = female, m = male)8. bird_ID: unique identity of the bird9. logger: unique identity of the radio tag --------------------------------------------------------------------------------------------------------}

  2. Z

    Data from: Dataset from : Browsing is a strong filter for savanna tree...

    • data.niaid.nih.gov
    Updated Oct 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wayne Twine (2021). Dataset from : Browsing is a strong filter for savanna tree seedlings in their first growing season [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4972083
    Explore at:
    Dataset updated
    Oct 1, 2021
    Dataset provided by
    Nicola Stevens
    Wayne Twine
    Archibald, Sally
    Craddock Mthabini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data presented here were used to produce the following paper:

    Archibald, Twine, Mthabini, Stevens (2021) Browsing is a strong filter for savanna tree seedlings in their first growing season. J. Ecology.

    The project under which these data were collected is: Mechanisms Controlling Species Limits in a Changing World. NRF/SASSCAL Grant number 118588

    For information on the data or analysis please contact Sally Archibald: sally.archibald@wits.ac.za

    Description of file(s):

    File 1: cleanedData_forAnalysis.csv (required to run the R code: "finalAnalysis_PostClipResponses_Feb2021_requires_cleanData_forAnalysis_.R"

    The data represent monthly survival and growth data for ~740 seedlings from 10 species under various levels of clipping.

    The data consist of one .csv file with the following column names:

    treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes)

    File 2: Herbivory_SurvivalEndofSeason_march2017.csv (required to run the R code: "FinalAnalysisResultsSurvival_requires_Herbivory_SurvivalEndofSeason_march2017.R"

    The data consist of one .csv file with the following column names:

    treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes) genus Genus MAR Mean Annual Rainfall for that Species distribution (mm) rainclass High/medium/low

    File 3: allModelParameters_byAge.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"

    Consists of a .csv file with the following column headings

    Age.of.plant Age in days species_code Species pred_SD_mm Predicted stem diameter in mm pred_SD_up top 75th quantile of stem diameter in mm pred_SD_low bottom 25th quantile of stem diameter in mm treatdate date when clipped pred_surv Predicted survival probability pred_surv_low Predicted 25th quantile survival probability pred_surv_high Predicted 75th quantile survival probability species_code species code Bite.probability Daily probability of being eaten max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species duiker_sd standard deviation of bite diameter for a duiker for this species max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species kudu_sd standard deviation of bite diameter for a kudu for this species mean_bite_diam_duiker_mm mean etc duiker_mean_sd standard devaition etc mean_bite_diameter_kudu_mm mean etc kudu_mean_sd standard deviation etc genus genus rainclass low/med/high

    File 4: EatProbParameters_June2020.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"

    Consists of a .csv file with the following column headings

    shtspec species name species_code species code genus genus rainclass low/medium/high seed mass mass of seed (g per 1000seeds)
    Surv_intercept coefficient of the model predicting survival from age of clip for this species Surv_slope coefficient of the model predicting survival from age of clip for this species GR_intercept coefficient of the model predicting stem diameter from seedling age for this species GR_slope coefficient of the model predicting stem diameter from seedling age for this species species_code species code max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species duiker_sd standard deviation of bite diameter for a duiker for this species max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species kudu_sd standard deviation of bite diameter for a kudu for this species mean_bite_diam_duiker_mm mean etc duiker_mean_sd standard devaition etc mean_bite_diameter_kudu_mm mean etc kudu_mean_sd standard deviation etc AgeAtEscape_duiker[t] age of plant when its stem diameter is larger than a mean duiker bite AgeAtEscape_duiker_min[t] age of plant when its stem diameter is larger than a min duiker bite AgeAtEscape_duiker_max[t] age of plant when its stem diameter is larger than a max duiker bite AgeAtEscape_kudu[t] age of plant when its stem diameter is larger than a mean kudu bite AgeAtEscape_kudu_min[t] age of plant when its stem diameter is larger than a min kudu bite AgeAtEscape_kudu_max[t] age of plant when its stem diameter is larger than a max kudu bite

  3. RDS and Tab-Separated-Values Formats of Discover Life bee species guide and...

    • zenodo.org
    bin, tsv
    Updated Jan 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorrit Poelen; Katja Seltmann; Katja Seltmann; Jorrit Poelen (2024). RDS and Tab-Separated-Values Formats of Discover Life bee species guide and world checklist (Hymenoptera: Apoidea: Anthophila) as derived from Dorey et al. 2023 hash://sha256/56e0b3d68f221ee79d61f6da7bfdfad927d63ab86700b856d0a42a133841779c hash://md5/3bb89ee91f9b9a834e8b2725c3729cf5 [Dataset]. http://doi.org/10.5281/zenodo.10463762
    Explore at:
    bin, tsvAvailable download formats
    Dataset updated
    Jan 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jorrit Poelen; Katja Seltmann; Katja Seltmann; Jorrit Poelen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    World
    Description

    To help extend the life and reach of the bee taxonomy embedded in Dorey et al. 2023, this publication includes a versioned copy of the original data as obtained on 2023-01-05 via https://open.flinders.edu.au/ndownloader/files/43331472 , as well as a tab-separated-values file as converted using an R script `rds2tsv.R` containing:

    write.table(readRDS(xzfile('/dev/stdin')), sep='\t', na='', row.names=F, quote=F)

    Citation

    Please cite the original work as well as DiscoverLife when using this derived and re-packaged dataset:

    Dorey, J.B., Fischer, E.E., Chesshire, P.R. et al. A globally synthesised and flagged bee occurrence dataset and cleaning workflow. Sci Data 10, 747 (2023). https://doi.org/10.1038/s41597-023-02626-w

    Ascher, J. S. and J. Pickering. 2022. Discover Life bee species guide and world checklist (Hymenoptera: Apoidea: Anthophila). http://www.discoverlife.org/mp/20q?guide=Apoidea_species Draft-56, 21 August, 2022

    Content

    filenamecontent id
    bee-taxonomy.rdshash://sha256/76d8bac0e8ba193afa3278108d1aed0e08d4de1497d27ff22e5aaee3195232b4
    bee-taxonomy.rdshash://md5/9cd3653a3553202eb9a3fdd684a86b6e
    bee-taxonomy.tsvhash://sha256/b512043ddf994537ae1ed8068e44bf3a5cb9ab5e44ddf257cc82e67fe034e0e6
    bee-taxonomy.tsvhash://md5/ef136f270301830126deb3ced4da2383

    Content Sample

    Below are the first 10 rows of bee-taxonomy.tsv tab-delimited file output that can be downloaded below. The majority of the rows are derived from the Discover Life bee species guide and world checklist (Hymenoptera: Apoidea: Anthophila) (Asscher & Pickering 2022). Rows in the file include scientific name, taxonomic status, and higher taxonomy including subgenus.

    flagstaxonomic_statussourceaccididkingdomphylumclassorderfamilysubfamilytribesubtribevalidNamecanonicalcanonical_withFlagsgenussubgenusspeciesinfraspeciesauthorshiptaxon_rankvalidnotes
    acceptedDiscoverLife04AnimaliaArthropodaInsectaHymenopteraAndrenidaePanurginaeCalliopsini Acamptopoeum argentinum (Friese, 1906)Acamptopoeum argentinumAcamptopoeum argentinumAcamptopoeum argentinum (Friese, 1906)SpeciesTRUE
    synonymDiscoverLife45AnimaliaArthropodaInsectaHymenopteraAndrenidaePanurginaePanurginiPerditinaPerdita argentina Friese, 1906Perdita argentinaPerdita argentinaPerdita argentina Friese, 1906SpeciesFALSE
    acceptedDiscoverLife06AnimaliaArthropodaInsectaHymenopteraAndrenidaePanurginaeCalliopsini Acamptopoeum calchaqui Compagnucci, 2004Acamptopoeum calchaquiAcamptopoeum calchaquiAcamptopoeum calchaqui Compagnucci, 2004SpeciesTRUE
    acceptedDiscoverLife07AnimaliaArthropodaInsectaHymenopteraAndrenidaePanurginaeCalliopsini Acamptopoeum colombiense Shinn, 1965Acamptopoeum colombienseAcamptopoeum colombienseAcamptopoeum colombiense Shinn, 1965SpeciesTRUE
    synonymDiscoverLife78AnimaliaArthropodaInsectaHymenopteraAndrenidaePanurginaeCalliopsini Acamptopoeum colombiensis_sic Shinn, 1965Acamptopoeum colombiensisAcamptopoeum colombiensis_sicAcamptopoeum colombiensis Shinn, 1965SpeciesFALSEspecies _sic
    acceptedDiscoverLife09AnimaliaArthropodaInsectaHymenopteraAndrenidaePanurginaeCalliopsini Acamptopoeum fernandezi Gonzalez, 2004Acamptopoeum fernandeziAcamptopoeum fernandeziAcamptopoeum fernandezi Gonzalez, 2004SpeciesTRUE
    acceptedDiscoverLife010AnimaliaArthropodaInsectaHymenopteraAndrenidaePanurginaeCalliopsini Acamptopoeum inauratum (Cockerell, 1926)Acamptopoeum inauratumAcamptopoeum inauratumAcamptopoeum inauratum (Cockerell, 1926)SpeciesTRUE
    synonymDiscoverLife1011AnimaliaArthropodaInsectaHymenopteraAndrenidaePanurginaePanurginiCamptopoeinaCamptopoeum (Acamptopoeum) inauratum Cockerell, 1926Camptopoeum (Acamptopoeum) inauratumCamptopoeum (Acamptopoeum) inauratumCamptopoeumAcamptopoeuminauratum Cockerell, 1926SpeciesFALSE
    acceptedDiscoverLife012AnimaliaArthropodaInsectaHymenopteraAndrenidaePanurginaeCalliopsini Acamptopoeum melanogaster Compagnucci, 2004Acamptopoeum melanogasterAcamptopoeum melanogasterAcamptopoeum melanogaster Compagnucci, 2004SpeciesTRUE
    acceptedDiscoverLife013AnimaliaArthropodaInsectaHymenopteraAndrenidaePanurginaeCalliopsini Acamptopoeum nigritarse (Vachal, 1909)Acamptopoeum nigritarseAcamptopoeum nigritarseAcamptopoeum nigritarse (Vachal, 1909)SpeciesTRUE

    Please also credit the original dataset Dorey et al. 2023 when using this derived product.

    Provenance

    The versioned workflow was captured by Preston (Elliot et al. 2020, 2023) with identifier hash://sha256/56e0b3d68f221ee79d61f6da7bfdfad927d63ab86700b856d0a42a133841779c and history

    preston ls --anchor hash://sha256/56e0b3d68f221ee79d61f6da7bfdfad927d63ab86700b856d0a42a133841779c

    https://preston.guoda.bio http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/prov#SoftwareAgent urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc .
    https://preston.guoda.bio http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/prov#Agent urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc .
    https://preston.guoda.bio http://purl.org/dc/terms/description "Preston is a software program that finds, archives and provides access to biodiversity datasets."@en urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc .
    urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/prov#Activity urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc .
    urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc http://purl.org/dc/terms/description "An activity that assigns an alias to a content hash"@en urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc .
    urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc http://www.w3.org/ns/prov#startedAtTime "2024-01-06T00:02:37.973Z"^^http://www.w3.org/2001/XMLSchema#dateTime urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc .
    urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc http://www.w3.org/ns/prov#wasStartedBy https://preston.guoda.bio urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc .
    https://doi.org/10.5281/zenodo.1410543 http://www.w3.org/ns/prov#usedBy urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc urn:uuid:96a86205-ef3f-4a0f-9c72-d8c1552c9fbc .
    https://doi.org/10.5281/zenodo.1410543 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://purl.org/dc/dcmitype/Software

  4. WoSIS snapshot - December 2023

    • search.dataone.org
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ISRIC – World Soil Information (2025). WoSIS snapshot - December 2023 [Dataset]. https://search.dataone.org/view/sha256%3Aae94fefb74f928a3d482eee20abf33cf04d988555ef2beef2977eba7d5504bd7
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    International Soil Reference and Information Centre
    Time period covered
    Jan 1, 1918 - Dec 1, 2022
    Area covered
    Description

    ABSTRACT: The World Soil Information Service (WoSIS) provides quality-assessed and standardized soil profile data to support digital soil mapping and environmental applications at broad scale levels. Since the release of the ‘WoSIS snapshot 2019’ many new soil data were shared with us, registered in the ISRIC data repository, and subsequently standardized in accordance with the licenses specified by the data providers. The source data were contributed by a wide range of data providers, therefore special attention was paid to the standardization of soil property definitions, soil analytical procedures and soil property values (and units of measurement). We presently consider the following soil chemical properties (organic carbon, total carbon, total carbonate equivalent, total Nitrogen, Phosphorus (extractable-P, total-P, and P-retention), soil pH, cation exchange capacity, and electrical conductivity) and physical properties (soil texture (sand, silt, and clay), bulk density, coarse fragments, and water retention), grouped according to analytical procedures (aggregates) that are operationally comparable. For each profile we provide the original soil classification (FAO, WRB, USDA, and version) and horizon designations as far as these have been specified in the source databases. Three measures for 'fitness-for-intended-use' are provided: positional uncertainty (for site locations), time of sampling/description, and a first approximation for the uncertainty associated with the operationally defined analytical methods. These measures should be considered during digital soil mapping and subsequent earth system modelling that use the present set of soil data. DATA SET DESCRIPTION: The 'WoSIS 2023 snapshot' comprises data for 228k profiles from 217k geo-referenced sites that originate from 174 countries. The profiles represent over 900k soil layers (or horizons) and over 6 million records. The actual number of measurements for each property varies (greatly) between profiles and with depth, this generally depending on the objectives of the initial soil sampling programmes. The data are provided in TSV (tab separated values) format and as GeoPackage. The zip-file (446 Mb) contains the following files: - Readme_WoSIS_202312_v2.pdf: Provides a short description of the dataset, file structure, column names, units and category values (this file is also available directly under 'online resources'). The pdf includes links to tutorials for downloading the TSV files into R respectively Excel. See also 'HOW TO READ TSV FILES INTO R AND PYTHON' in the next section. - wosis_202312_observations.tsv: This file lists the four to six letter codes for each observation, whether the observation is for a site/profile or layer (horizon), the unit of measurement and the number of profiles respectively layers represented in the snapshot. It also provides an estimate for the inferred accuracy for the laboratory measurements. - wosis_202312_sites.tsv: This file characterizes the site location where profiles were sampled. - wosis_2023112_profiles: Presents the unique profile ID (i.e. primary key), site_id, source of the data, country ISO code and name, positional uncertainty, latitude and longitude (WGS 1984), maximum depth of soil described and sampled, as well as information on the soil classification system and edition. Depending on the soil classification system used, the number of fields will vary . - wosis_202312_layers: This file characterises the layers (or horizons) per profile, and lists their upper and lower depths (cm). - wosis_202312_xxxx.tsv : This type of file presents results for each observation (e.g. “xxxx” = “BDFIOD” ), as defined under “code” in file wosis_202312_observation.tsv. (e.g. wosis_202311_bdfiod.tsv). - wosis_202312.gpkg: Contains the above datafiles in GeoPackage format (which stores the files within an SQLite database). HOW TO READ TSV FILES INTO R AND PYTHON: A) To read the data in R, please uncompress the ZIP file and specify the uncompressed folder. setwd("/YourFolder/WoSIS_2023_December/") ## For example: setwd('D:/WoSIS_2023_December/') Then use read_tsv to read the TSV files, specifying the data types for each column (c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time). observations = readr::read_tsv('wosis_202312_observations.tsv', col_types='cccciid') observations ## show columns and first 10 rows sites = readr::read_tsv('wosis_202312_sites.tsv', col_types='iddcccc') sites profiles = readr::read_tsv('wosis_202312_profiles.tsv', col_types='icciccddcccccciccccicccci') profiles layers = readr::read_tsv('wosis_202312_layers.tsv', col_types='iiciciiilcc') layers ## Do this for each observation 'XXXX', e.g. file 'Wosis_202312_orgc.tsv': orgc = readr::read_tsv('wosis_202312_orgc.tsv', col_types='... Visit https://dataone.org/datasets/sha256%3Aae94fefb74f928a3d482eee20abf33cf04d988555ef2beef2977eba7d5504bd7 for complete metadata about this dataset.

  5. FIRST Catalog of FR I Radio Galaxies - Dataset - NASA Open Data Portal

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). FIRST Catalog of FR I Radio Galaxies - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/first-catalog-of-fr-i-radio-galaxies
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The authors have built a catalog of 219 Fanaroff and Riley class I edge-darkened radio galaxies (FR Is), called FRICAT, that is selected from a published sample and obtained by combining observations from the NVSS, FIRST, and SDSS surveys. They included in the catalog the sources with an edge-darkened radio morphology, redshift <= 0.15, and extending (at the sensitivity of the FIRST images) to a radius r larger than 30 kpc from the center of the host. The authors also selected an additional sample (sFRICAT) of 14 smaller (10 < r < 30 kpc) FR Is, limiting to z < 0.05. The hosts of the FRICAT sources are all luminous (-21 >~ Mr >~ 24), red early-type galaxies with black hole masses in the range 108 <~ MBH <~ 3 x 109 solar masses); the spectroscopic classification based on the optical emission line ratios indicates that they are all low excitation galaxies. Sources in the FRICAT are then indistinguishable from the FR Is belonging to the Third Cambridge Catalogue of Radio Sources (3C) on the basis of their optical properties. Conversely, while the 3C-FR Is show a strong positive trend between radio and [O III] emission line luminosity, these two quantities are unrelated in the FRICAT sources; at a given line luminosity, they show radio luminosities spanning about two orders of magnitude and extending to much lower ratios between radio and line power than 3C-FR Is. The authors' main conclusion is that the 3C-FR Is represent just the tip of the iceberg of a much larger and diverse population of FR Is. This HEASARC table contains both the 219 radio galaxies in the main FRICAT sample listed in Table B.1 of the reference paper and the 14 radio galaxies in the additional sFRICAT sample listed in Table B.2 of the reference paper. To enable users to distinguish from which sample an entry has been taken, the HEASARC created a parameter galaxy_sample which is set to 'M' for galaxies from the main sample, and to 'S' for galaxies from the supplementary sFRICAT sample. Throughout the paper, the authors adopted a cosmology with H0 = 67.8 km s-1 Mpc-1, OmegaM = 0.308, and OmegaLambda = 0.692 (Planck Collaboration XIII 2016). This table was created by the HEASARC in February 2017 based on electronic versions of Tables B.1 and B.2 that were obtained from the Astronomy & Astrophysics website. This is a service provided by NASA HEASARC .

  6. d

    Replication Data for: \"A Topic-based Segmentation Model for Identifying...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert (2024). Replication Data for: \"A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews\" [Dataset]. http://doi.org/10.7910/DVN/EE3DE2
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert
    Description

    We provide instructions, codes and datasets for replicating the article by Kim, Lee and McCulloch (2024), "A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews." This repository provides a user-friendly R package for any researchers or practitioners to apply A Topic-based Segmentation Model with Unstructured Texts (latent class regression with group variable selection) to their datasets. First, we provide a R code to replicate the illustrative simulation study: see file 1. Second, we provide the user-friendly R package with a very simple example code to help apply the model to real-world datasets: see file 2, Package_MixtureRegression_GroupVariableSelection.R and Dendrogram.R. Third, we provide a set of codes and instructions to replicate the empirical studies of customer-level segmentation and restaurant-level segmentation with Yelp reviews data: see files 3-a, 3-b, 4-a, 4-b. Note, due to the dataset terms of use by Yelp and the restriction of data size, we provide the link to download the same Yelp datasets (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/6). Fourth, we provided a set of codes and datasets to replicate the empirical study with professor ratings reviews data: see file 5. Please see more details in the description text and comments of each file. [A guide on how to use the code to reproduce each study in the paper] 1. Full codes for replicating Illustrative simulation study.txt -- [see Table 2 and Figure 2 in main text]: This is R source code to replicate the illustrative simulation study. Please run from the beginning to the end in R. In addition to estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships, you will get dendrograms of selected groups of variables in Figure 2. Computing time is approximately 20 to 30 minutes 3-a. Preprocessing raw Yelp Reviews for Customer-level Segmentation.txt: Code for preprocessing the downloaded unstructured Yelp review data and preparing DV and IVs matrix for customer-level segmentation study. 3-b. Instruction for replicating Customer-level Segmentation analysis.txt -- [see Table 10 in main text; Tables F-1, F-2, and F-3 and Figure F-1 in Web Appendix]: Code for replicating customer-level segmentation study with Yelp data. You will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 3 to 4 hours. 4-a. Preprocessing raw Yelp reviews_Restaruant Segmentation (1).txt: R code for preprocessing the downloaded unstructured Yelp data and preparing DV and IVs matrix for restaurant-level segmentation study. 4-b. Instructions for replicating restaurant-level segmentation analysis.txt -- [see Tables 5, 6 and 7 in main text; Tables E-4 and E-5 and Figure H-1 in Web Appendix]: Code for replicating restaurant-level segmentation study with Yelp. you will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 10 to 12 hours. [Guidelines for running Benchmark models in Table 6] Unsupervised Topic model: 'topicmodels' package in R -- after determining the number of topics(e.g., with 'ldatuning' R package), run 'LDA' function in the 'topicmodels'package. Then, compute topic probabilities per restaurant (with 'posterior' function in the package) which can be used as predictors. Then, conduct prediction with regression Hierarchical topic model (HDP): 'gensimr' R package -- 'model_hdp' function for identifying topics in the package (see https://radimrehurek.com/gensim/models/hdpmodel.html or https://gensimr.news-r.org/). Supervised topic model: 'lda' R package -- 'slda.em' function for training and 'slda.predict' for prediction. Aggregate regression: 'lm' default function in R. Latent class regression without variable selection: 'flexmix' function in 'flexmix' R package. Run flexmix with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, conduct prediction of dependent variable per each segment. Latent class regression with variable selection: 'Unconstraind_Bayes_Mixture' function in Kim, Fong and DeSarbo(2012)'s package. Run the Kim et al's model (2012) with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, we can do prediction of dependent variables per each segment. The same R package ('KimFongDeSarbo2012.zip') can be downloaded at: https://sites.google.com/scarletmail.rutgers.edu/r-code-packages/home 5. Instructions for replicating Professor ratings review study.txt -- [see Tables G-1, G-2, G-4 and G-5, and Figures G-1 and H-2 in Web Appendix]: Code to replicate the Professor ratings reviews study. Computing time is approximately 10 hours. [A list of the versions of R, packages, and computer...

  7. e

    First Look Survey: NOAO R-band Mosaic - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Oct 28, 2004
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2004). First Look Survey: NOAO R-band Mosaic - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/59313378-cff9-5510-9437-2afb0f41fd66
    Explore at:
    Dataset updated
    Oct 28, 2004
    Description

    We present R-band images covering more than 11 square degrees of sky that were obtained in preparation for the Spitzer Space Telescope First-Look Survey (FLS). The FLS was designed to characterize the mid-infrared sky at depths 2 orders of magnitude deeper than previous surveys. The extragalactic component is the first cosmological survey done with Spitzer. Source catalogs extracted from the R-band images are also presented. The R-band images were obtained using the Mosaic-1 camera on the 4m Mayall Telescope of the Kitt Peak National Observatory. Two relatively large regions of the sky were observed to modest depth: the main FLS extragalactic field (17h18m00s, +59{deg}30'00.0"[J2000]; l=88.3{deg}, b=+34.9{deg}) and the ELAIS-N1 field (16h10m01s, +54{deg}30'36.0"; l=84.2{deg}, b=+44.9{deg}). While both these fields were in early plans for the FLS, only a single deep-pointing test observation was made at the ELAIS-N1 location. The larger Legacy program SWIRE will include this region among its surveyed areas. The data products of our KPNO imaging (images and object catalogs) are made available to the community through the World Wide Web (via the Spitzer Science Center and NOAO Science Archive, http://ssc.spitzer.caltech.edu/fls/). The overall quality of the images is high. The measured positions of sources detected in the images have rms uncertainties in their absolute positions on the order of 0.35" with possible systematic offsets on the order of 0.1", depending on the reference frame of comparison. The relative astrometric accuracy is much better than 1/10 of an arcsecond. Typical delivered image quality in the images is 1.1" full width at half-maximum. The images are relatively deep, since they reach a median 5{sigma} depth limiting magnitude of R=25.5 (Vega) as measured within a 1.35 FWHM aperture, for which the signal-to-noise ratio (S/N) is maximal. Catalogs have been extracted with SExtractor, using thresholds in area and flux for which the number of false detections is below 1% at R=25. Only sources with S/N>3 have been retained in the final catalogs. Comparing the galaxy number counts from our images with those of deeper R-band surveys, we estimate that our observations are 50% complete at R=24.5. These limits in depth are sufficient to identify a substantial fraction of the infrared sources that will be detected by Spitzer. Use of the data: Use of these data must be accompanied by citation of the paper and acknowledgment: "The National Optical Astronomy Observatory (NOAO) is operated by the Association of Universities for Research in Astronomy (AURA), Inc. under cooperative agreement with the National Science Foundation."

  8. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  9. u

    English Longitudinal Study of Ageing: Waves 0-11, 1998-2024

    • beta.ukdataservice.ac.uk
    Updated 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Banks; G. David Batty; J. Breedvelt; K. Coughlin; Crawford, R., Institute for Fiscal Studies (IFS); M. Marmot; J. Nazroo; Oldfield, Z., Institute for Fiscal Studies (IFS); N. Steel; A. Steptoe; M. Wood; P. Zaninotto (2025). English Longitudinal Study of Ageing: Waves 0-11, 1998-2024 [Dataset]. http://doi.org/10.5255/ukda-sn-5050-34
    Explore at:
    Dataset updated
    2025
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    datacite
    Authors
    J. Banks; G. David Batty; J. Breedvelt; K. Coughlin; Crawford, R., Institute for Fiscal Studies (IFS); M. Marmot; J. Nazroo; Oldfield, Z., Institute for Fiscal Studies (IFS); N. Steel; A. Steptoe; M. Wood; P. Zaninotto
    Description

    The English Longitudinal Study of Ageing (ELSA) is a longitudinal survey of ageing and quality of life among older people that explores the dynamic relationships between health and functioning, social networks and participation, and economic position as people plan for, move into and progress beyond retirement. The main objectives of ELSA are to:

    • construct waves of accessible and well-documented panel data;
    • provide these data in a convenient and timely fashion to the scientific and policy research community;
    • describe health trajectories, disability and healthy life expectancy in a representative sample of the English population aged 50 and over;
    • examine the relationship between economic position and health;
    • investigate the determinants of economic position in older age;
    • describe the timing of retirement and post-retirement labour market activity; and
    • understand the relationships between social support, household structure and the transfer of assets.

    Further information may be found on the "https://www.elsa-project.ac.uk/"> ELSA project website, the or Natcen Social Research: ELSA web pages.

    Wave 11 data has been deposited - May 2025

    For the 45th edition (May 2025) ELSA Wave 11 core and pension grid data and documentation were deposited. Users should note this dataset version does not contain the survey weights. A version with the survey weights along with IFS and financial derived datasets will be deposited in due course. In the meantime, more information about the data collection or the data collected during this wave of ELSA can be found in the Wave 11 Technical Report or the User Guide.

    Wave 10 Accelerometry data has been deposited - August 2025

    For the 46th edition (August 2025) ELSA Wave 10 Accelerometry data and documentation, along with a new version of the Wave 10 Technical Report, have been deposited. Between June 2021 and October 2022, approximately 75% of ELSA households (including core members and partners) were randomly selected and invited to wear an Axivity AX3 tri-axial accelerometer for eight days and nights. Accelerometer has been used to objectively measure movement behaviours for the first time in ELSA. Four datasets including data collected by accelerometers were deposited. Datasets include: output from the Biobank accelerometer analysis (bbaa), and 24 hour movement behaviours; the step count data; overnight sleep and sleep stage data.

    Wave 10 HCAP2 End of Life data has been deposited - September 2025:

    For the 47th edition (September 2025), the HCAP2 (Wave 10) End of Life interview data and questionnaire documentation were deposited. The End of Life interview completes the information collected at previous waves of ELSA by interviewing a close friend or relative of the deceased ELSA sample member after their death. Previous End of Life interviews were carried out alongside Waves 2, 3, 4, and 6 of ELSA. The fieldwork for HCAP2 (Wave 10) End of Life took place between 2022-2024. For more information please refer to the questionnaire documentation. The End of Life User Guide will be updated at a later date.

    Health conditions research with ELSA - June 2021

    The ELSA Data team have found some issues with historical data measuring health conditions. If you are intending to do any analysis looking at the following health conditions, then please read the ELSA User Guide or if you still have questions contact elsadata@natcen.ac.uk for advice on how you should approach your analysis. The affected conditions are: eye conditions (glaucoma; diabetic eye disease; macular degeneration; cataract), CVD conditions (high blood pressure; angina; heart attack; Congestive Heart Failure; heart murmur; abnormal heart rhythm; diabetes; stroke; high cholesterol; other heart trouble) and chronic health conditions (chronic lung disease; asthma; arthritis; osteoporosis; cancer; Parkinson's Disease; emotional, nervous or psychiatric problems; Alzheimer's Disease; dementia; malignant blood disorder; multiple sclerosis or motor neurone disease).

    For information on obtaining data from ELSA that are not held at the UKDS, see the ELSA Genetic data access and Accessing ELSA data webpages.

    Harmonized dataset:

    Users of the Harmonized dataset who prefer to use the Stata version will need access to Stata MP software, as the version G3 file contains 11,779 variables (the limit for the standard Stata 'Intercooled' version is 2,047).

    ELSA COVID-19 study:
    A separate ad-hoc study conducted with ELSA respondents, measuring the socio-economic effects/psychological impact of the lockdown on the aged 50+ population of England, is also available under SN 8688, English Longitudinal Study of Ageing COVID-19 Study.

  10. t

    ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture...

    • researchdata.tuwien.ac.at
    • b2find.eudat.eu
    zip
    Updated Sep 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo (2025). ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture from merged multi-satellite observations [Dataset]. http://doi.org/10.48436/3fcxr-cde10
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 5, 2025
    Dataset provided by
    TU Wien
    Authors
    Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    This dataset was produced with funding from the European Space Agency (ESA) Climate Change Initiative (CCI) Plus Soil Moisture Project (CCN 3 to ESRIN Contract No: 4000126684/19/I-NB "ESA CCI+ Phase 1 New R&D on CCI ECVS Soil Moisture"). Project website: https://climate.esa.int/en/projects/soil-moisture/

    This dataset contains information on the Surface Soil Moisture (SM) content derived from satellite observations in the microwave domain.

    Dataset Paper (Open Access)

    A description of this dataset, including the methodology and validation results, is available at:

    Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: an independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data, 17, 4305–4329, https://doi.org/10.5194/essd-17-4305-2025, 2025.

    Abstract

    ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations coming from 19 satellites (as of v09.1) operating in the microwave domain. The wealth of satellite information, particularly over the last decade, facilitates the creation of a data record with the highest possible data consistency and coverage.
    However, data gaps are still found in the record. This is particularly notable in earlier periods when a limited number of satellites were in operation, but can also arise from various retrieval issues, such as frozen soils, dense vegetation, and radio frequency interference (RFI). These data gaps present a challenge for many users, as they have the potential to obscure relevant events within a study area or are incompatible with (machine learning) software that often relies on gap-free inputs.
    Since the requirement of a gap-free ESA CCI SM product was identified, various studies have demonstrated the suitability of different statistical methods to achieve this goal. A fundamental feature of such gap-filling method is to rely only on the original observational record, without need for ancillary variable or model-based information. Due to the intrinsic challenge, there was until present no global, long-term univariate gap-filled product available. In this version of the record, data gaps due to missing satellite overpasses and invalid measurements are filled using the Discrete Cosine Transform (DCT) Penalized Least Squares (PLS) algorithm (Garcia, 2010). A linear interpolation is applied over periods of (potentially) frozen soils with little to no variability in (frozen) soil moisture content. Uncertainty estimates are based on models calibrated in experiments to fill satellite-like gaps introduced to GLDAS Noah reanalysis soil moisture (Rodell et al., 2004), and consider the gap size and local vegetation conditions as parameters that affect the gapfilling performance.

    Summary

    • Gap-filled global estimates of volumetric surface soil moisture from 1991-2023 at 0.25° sampling
    • Fields of application (partial): climate variability and change, land-atmosphere interactions, global biogeochemical cycles and ecology, hydrological and land surface modelling, drought applications, and meteorology
    • Method: Modified version of DCT-PLS (Garcia, 2010) interpolation/smoothing algorithm, linear interpolation over periods of frozen soils. Uncertainty estimates are provided for all data points.
    • More information: See Preimesberger et al. (2025) and https://doi.org/10.5281/zenodo.8320869" target="_blank" rel="noopener">ESA CCI SM Algorithm Theoretical Baseline Document [Chapter 7.2.9] (Dorigo et al., 2023)

    Programmatic Download

    You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Download on Linux or macOS systems.

    #!/bin/bash

    # Set download directory
    DOWNLOAD_DIR=~/Downloads

    base_url="https://researchdata.tuwien.at/records/3fcxr-cde10/files"

    # Loop through years 1991 to 2023 and download & extract data
    for year in {1991..2023}; do
    echo "Downloading $year.zip..."
    wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
    unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
    rm "$DOWNLOAD_DIR/$year.zip"
    done

    Data details

    The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:

    ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_GAPFILLED-YYYYMMDD000000-fv09.1r1.nc

    Data Variables

    Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:

    • sm: (float) The Soil Moisture variable reflects estimates of daily average volumetric soil moisture content (m3/m3) in the soil surface layer (~0-5 cm) over a whole grid cell (0.25 degree).
    • sm_uncertainty: (float) The Soil Moisture Uncertainty variable reflects the uncertainty (random error) of the original satellite observations and of the predictions used to fill observation data gaps.
    • sm_anomaly: Soil moisture anomalies (reference period 1991-2020) derived from the gap-filled values (`sm`)
    • sm_smoothed: Contains DCT-PLS predictions used to fill data gaps in the original soil moisture field. These values are also provided for cases where an observation was initially available (compare `gapmask`). In this case, they provided a smoothed version of the original data.
    • gapmask: (0 | 1) Indicates grid cells where a satellite observation is available (1), and where the interpolated (smoothed) values are used instead (0) in the 'sm' field.
    • frozenmask: (0 | 1) Indicates grid cells where ERA5 soil temperature is <0 °C. In this case, a linear interpolation over time is applied.

    Additional information for each variable is given in the netCDF attributes.

    Version Changelog

    Changes in v9.1r1 (previous version was v09.1):

    • This version uses a novel uncertainty estimation scheme as described in Preimesberger et al. (2025).

    Software to open netCDF files

    These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:

    References

    • Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: an independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data, 17, 4305–4329, https://doi.org/10.5194/essd-17-4305-2025, 2025.
    • Dorigo, W., Preimesberger, W., Stradiotti, P., Kidd, R., van der Schalie, R., van der Vliet, M., Rodriguez-Fernandez, N., Madelon, R., & Baghdadi, N. (2023). ESA Climate Change Initiative Plus - Soil Moisture Algorithm Theoretical Baseline Document (ATBD) Supporting Product Version 08.1 (version 1.1). Zenodo. https://doi.org/10.5281/zenodo.8320869
    • Garcia, D., 2010. Robust smoothing of gridded data in one and higher dimensions with missing values. Computational Statistics & Data Analysis, 54(4), pp.1167-1178. Available at: https://doi.org/10.1016/j.csda.2009.09.020
    • Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., Arsenault, K., Cosgrove, B., Radakovich, J., Bosilovich, M., Entin, J. K., Walker, J. P., Lohmann, D., and Toll, D.: The Global Land Data Assimilation System, Bulletin of the American Meteorological Society, 85, 381 – 394, https://doi.org/10.1175/BAMS-85-3-381, 2004.

    Related Records

    The following records are all part of the ESA CCI Soil Moisture science data records community

    1

    ESA CCI SM MODELFREE Surface Soil Moisture Record

    <a href="https://doi.org/10.48436/svr1r-27j77" target="_blank"

  11. d

    Data from: Reference transcriptomics of porcine peripheral immune cells...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. https://catalog.data.gov/dataset/data-from-reference-transcriptomics-of-porcine-peripheral-immune-cells-created-through-bul-e667c
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

  12. CODE-15%: a large scale annotated dataset of 12-lead ECGs

    • zenodo.org
    csv, zip
    Updated Jan 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antônio H. Ribeiro; Antônio H. Ribeiro; Gabriela M.M. Paixao; Gabriela M.M. Paixao; Emilly M. Lima; Emilly M. Lima; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Marcelo M. Pinto Filho; Marcelo M. Pinto Filho; Paulo R. Gomes; Paulo R. Gomes; Derick M. Oliveira; Derick M. Oliveira; Wagner Meira Jr; Wagner Meira Jr; Thömas B Schon; Thömas B Schon; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro (2025). CODE-15%: a large scale annotated dataset of 12-lead ECGs [Dataset]. http://doi.org/10.5281/zenodo.4916206
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jan 8, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antônio H. Ribeiro; Antônio H. Ribeiro; Gabriela M.M. Paixao; Gabriela M.M. Paixao; Emilly M. Lima; Emilly M. Lima; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Marcelo M. Pinto Filho; Marcelo M. Pinto Filho; Paulo R. Gomes; Paulo R. Gomes; Derick M. Oliveira; Derick M. Oliveira; Wagner Meira Jr; Wagner Meira Jr; Thömas B Schon; Thömas B Schon; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A dataset of 12-lead ECGs with annotations. The dataset contains 345 779 exams from 233 770 patients. It was obtained through stratified sampling from the CODE dataset ( 15% of the patients). The data was collected by the Telehealth Network of Minas Gerais in the period between 2010 and 2016.

    This repository contains the files `exams.csv` and the files `exams_part{i}.zip` for i = 0, 1, 2, ... 17.

    • "exams.csv": is a comma-separated values (csv) file containing the columns
      • "exam_id": id used for identifying the exam;
      • "age": patient age in years at the moment of the exam;
      • "is_male": true if the patient is male;
      • "nn_predicted_age": age predicted by a neural network to the patient. As described in the paper "Deep neural network estimated electrocardiographic-age as a mortality predictor" bellow.
      • "1dAVb": Whether or not the patient has 1st degree AV block;
      • "RBBB": Whether or not the patient has right bundle branch block;
      • "LBBB": Whether or not the patient has left bundle branch block;
      • "SB": Whether or not the patient has sinus bradycardia;
      • "AF": Whether or not the patient has atrial fibrillation;
      • "ST": Whether or not the patient has sinus tachycardia;
      • "patient_id": id used for identifying the patient;
      • "normal_ecg": True if automatic annotation system say it is a normal ECG;
      • "death": true if the patient dies in the follow-up time. This data is available only in the first exam of the patient. Other exams will have this as an empty field;
      • "timey": if the patient dies it is the time to the death of the patient. If not, it is the follow-up time. This data is available only in the first exam of the patient. Other exams will have this as an empty field;
      • "trace_file": identify in which hdf5 file the file corresponding to this patient is located.
    • "exams_part{i}.hdf5": The HDF5 file containing two datasets named `tracings` and other named `exam_id`. The `exam_id` is a tensor of dimension `(N,)` containing the exam id (the same as in the csv file) and the dataset `tracings` is a `(N, 4096, 12)` tensor containing the ECG tracings in the same order. The first dimension corresponds to the different exams; the second dimension corresponds to the 4096 signal samples; the third dimension to the 12 different leads of the ECG exams in the following order: `{DI, DII, DIII, AVR, AVL, AVF, V1, V2, V3, V4, V5, V6}`. The signals are sampled at 400 Hz. Some signals originally have a duration of 10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples). In order to make them all have the same size (4096 samples), we fill them with zeros on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648 samples at the beginning and 648 samples at the end, yielding 4096 samples that are then saved in the hdf5 dataset.

      In python, one can read this file using h5py.
      ```python
      import h5py

      f = h5py.File(path_to_file, 'r')
      # Get ids
      traces_ids = np.array(self.f['id_exam'])
      x = f['signal']
      ```
      The `signal` dataset is too large to fit in memory, so don't convert it to a numpy array all at once.
      It is possible to access a chunk of it using: ``x[start:end, :, :]``.

    The CODE dataset was collected by the Telehealth Network of Minas Gerais (TNMG) in the period between 2010 and 2016. TNMG is a public telehealth system assisting 811 out of the 853 municipalities in the state of Minas Gerais, Brazil. The dataset is described

    Ribeiro, Antônio H., Manoel Horta Ribeiro, Gabriela M. M. Paixão, Derick M. Oliveira, Paulo R. Gomes, Jéssica A. Canazart, Milton P. S. Ferreira, et al. “Automatic Diagnosis of the 12-Lead ECG Using a Deep Neural Network.” Nature Communications 11, no. 1 (2020): 1760. https://doi.org/10.1038/s41467-020-15432-4

    The CODE 15% dataset is obtained from stratified sampling from the CODE dataset. This subset of the code dataset is described in and used for assessing model performance:
    "Deep neural network estimated electrocardiographic-age as a mortality predictor"
    Emilly M Lima, Antônio H Ribeiro, Gabriela MM Paixão, Manoel Horta Ribeiro, Marcelo M Pinto Filho, Paulo R Gomes, Derick M Oliveira, Ester C Sabino, Bruce B Duncan, Luana Giatti, Sandhi M Barreto, Wagner Meira Jr, Thomas B Schön, Antonio Luiz P Ribeiro. MedRXiv (2021) https://www.doi.org/10.1101/2021.02.19.21251232

    The companion code for reproducing the experiments in the two papers described above can be found, respectively, in:
    - https://github.com/antonior92/automatic-ecg-diagnosis; and in,
    - https://github.com/antonior92/ecg-age-prediction.

    Note about authorship: Antônio H. Ribeiro, Emilly M. Lima and Gabriela M.M. Paixão contributed equally to this work.

  13. e

    Examples of CARE-related activities carried out by respositories, in...

    • portal.edirepository.org
    csv, pdf
    Updated Mar 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruth Duerr (2024). Examples of CARE-related activities carried out by respositories, in sequences or groups [Dataset]. http://doi.org/10.6073/pasta/aaedd84525ae922d3115fd05cf5bf4fe
    Explore at:
    csv(7273 byte), pdf(63891 byte)Available download formats
    Dataset updated
    Mar 13, 2024
    Dataset provided by
    EDI
    Authors
    Ruth Duerr
    Time period covered
    2020 - 2023
    Variables measured
    Trigger, Outreach, Technical, Repository Protocols, Situational Awareness
    Description

    This dataset is designed to accompany the paper submitted to Data Science Journal: O'Brien et al, "Earth Science Data Repositories: Implementing the CARE Principles". This dataset shows examples of activities that data repositories are likely to undertake as they implement the CARE principles. These examples were constructed as part of a discussion about the challenges faced by data repositories when acquiring, curating, and disseminating data and other information about Indigenous Peoples, communities, and lands. For clarity, individual repository activities were very specific. However, in practice, repository activities are not carried out singly, but are more likely to be performed in groups or in sequence. This dataset shows examples of how activities are likely to be combined in response to certain triggers. See related dataset O'Brien, M., R. Duerr, R. Taitingfong, A. Martinez, L. Vera, L. Jennings, R. Downs, E. Antognoli, T. ten Brink, N. Halmai, S.R. Carroll, D. David-Chavez, M. Hudson, and P. Buttigieg. 2024. Alignment between CARE Principles and Data Repository Activities. Environmental Data Initiative. https://doi.org/10.6073/pasta/23e699ad00f74a178031904129e78e93 (Accessed 2024-03-13), and the paper for more information about development of the activities and their categorization, raw data of relationships between specific activities and a discussion of the implementation of CARE Principles by data repositories.

       Data in this table are organized into groups delineated by a triggering event in the
      first column. For example, the first group consists of 9 rows; while the second group has 7
      rows. The first row of each group contains the event that triggers the set of actions
      described in the last 4 columns of the spreadsheet. Within each group, the associated rows
      in each column are given in numerical not temporal order, since activities will likely vary
      widely from repository to repository.
    
       For example, the first group of rows is about what likely needs to happen if a
      repository discovers that it holds Indigenous data (O6). Clearly, it will need to develop
      processes to identify communities to engage (R6) as well as processes for contacting those
      communities (R7) (if it doesn't already have them). It will also probably need to review and
      possibly update its data management policies to ensure that they are justifiable (R2). Based
      on these actions, it is likely that the repository's outreach group needs to prepare for
      working with more communities (O3) including ensuring that the repository's governance
      protocols are up-to-date and publicized (O5) and that the repository practices are
      transparent (O4). If initial contacts go well, it is likely that the repository will need
      ongoing engagement with the community or communities (S1). This may include adding
      representation to the repository's advisory board (O2); clarifying data usage with the
      communities (O9), facilitating relationships between data providers and communities (O1);
      working with the community to identify educational opportunities (O10); and sharing data
      with them (O8). It may also become necessary to liaise with whomever is maintaining the
      vocabularies in use at the repository (O7).
    
  14. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    • nada-demo.ihsn.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

  15. H

    Replication Data for: Consolidating Progress: The Selection of Female...

    • dataverse.harvard.edu
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stuart Bramwell; Hikaru Yamagishi (2023). Replication Data for: Consolidating Progress: The Selection of Female Ministers in Autocracies and Democracies [Dataset]. http://doi.org/10.7910/DVN/I9E6EX
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Stuart Bramwell; Hikaru Yamagishi
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/I9E6EXhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/I9E6EX

    Description

    Enclosed are all the replication material for Consolidating Progress: The Selection of Female Ministers in Autocracies and Democracies We are using R to conduct the analyses in the paper. Four files are needed to repliate the analysis, one data file and three r-scripts. Below we briefly go through each script. The dataset is found in "df_consolidatingprogress.csv". We only use this dataset in the analysis, and there are no other datasets. All variables are gathered from publicly available datasets and we discuss each in detail in Appendix B. The script which merges these datasets is not public, but please contact the authors if you are interested in this script. The first part of the analysis is found in "1_descriptive_analysis.R". This is the script which replicates Figure 1 and Figure 2. The second part of the analysis is found in 2_analysis.R". This is the script which replicates Figure 3, Table 2, Figure 4, Figure 5, and Figure 6. The script which replicates the appendix is found in "3_appendix.R". This file creates all figures and tables found in the appendix. In addition to these files, we have also uploaded 0_createdata which cleans, transforms, merges, and otherwise process the data prior to the analysis. The original datasets have also been uploaded. These are: 1_WhoGov_within_V2.0.xlsx 2_WhoGov_crosssectional_V2.0.xlsx 3_vdem_V12.xlsx 4_bmr_V4_edited.xlsx 5_bjornskovrode_V4.2.xlsx 6_polityiv.rds 7_autocraciesoftheworld.xlsx 8_pwt_v10.0.xlsx 9_qog_std_ts_jan22 10_qog_std_cs_jan22.xlsx 11_wb.xlsx

  16. R scripts

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    txt
    Updated May 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xueying Han (2018). R scripts [Dataset]. http://doi.org/10.6084/m9.figshare.5513170.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 10, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Xueying Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R scripts in this fileset are those used in the PLOS ONE publication "A snapshot of translational research funded by the National Institutes of Health (NIH): A case study using behavioral and social science research awards and Clinical and Translational Science Awards funded publications." The article can be accessed here: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196545This consists of all R scripts used for data cleaning, data manipulation, and statistical analysis used in the publication.There are eleven files in total:1. "Step1a.bBSSR.format.grants.and.publications.data.R" combines all bBSSR 2008-2014 grant award data and associated publications downloaded from NIH Reporter. 2. "Step1b.BSSR.format.grants.and.publications.data.R" combines all BSSR-only 2008-2014 grant award data and associated publications downloaded from NIH Reporter. 3. "Step2a.bBSSR.get.pubdates.transl.and.all.grants.R" queries PubMed and downloads associated bBSSR publication data.4. "Step2b.BSSR.get.pubdates.transl.and.all.grants.R" queries PubMed and downloads associated BSSR-only publication data.5. "Step3.summary.stats.R" performs summary statistics6. "Step4.time.to.first.publication.R" performs time to first publication analysis.7. "Step5.time.to.citation.analysis.R" performs time to first citation and time to overall citation analyses.8. "Step6.combine.NIH.iCite.data.R" combines NIH iCite citation data.9. "Step7.iCite.data.analysis.R" performs citation analysis on combined iCite data.10. "Step8.MeSH.descriptors.R" queries PubMed and pulls down all MeSH descriptors for all publications11. "Step9.CTSA.publications.R" compares the percent of translational publications among bBSSR, BSSR-only, and CTSA publications.

  17. San Francisco Airport Runway Use

    • kaggle.com
    Updated Jan 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). San Francisco Airport Runway Use [Dataset]. https://www.kaggle.com/datasets/thedevastator/san-francisco-airport-runway-use/versions/2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 20, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Area covered
    San Francisco
    Description

    San Francisco Airport Runway Use

    Late Night Departure Preferences

    By City of San Francisco [source]

    About this dataset

    This dataset explores the late night departing runways used by aircraft at San Francisco International Airport (SFO). From 1:00 a.m. to 6:00 a.m., aircraft are directed to either 10L/R, 01L/R or 28L/R with an immediate right turn when safety and weather conditions permit to reduce noise in the area's surrounding residential communities by following over-water departure procedures, directing aircraft over the bay instead. Providing insight into which late night runways are most frequently used, data from this dataset is broken down by runway, month and year of departure as well as what percent of total departures for each month come from each runway - allowing for a comprehensive look at SFO's preferential late night use of airport runways!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to analyze the degree of aircraft late night departure from San Francisco Airport in order to study the impact of runway usage on air and noise pollution in residential communities. This dataset contains information about departures from each runway (01L/R, 10L/R, 19L/R and 28L/R) at San Francisco Airport for a specified year and month. By studying the percentage of total departures by runway we can understand how much aircraft are using which runways during late night hours.

    To use this dataset one needs to first become familiar with the column names such as Year, Month, 01L/R(number of departures from 01L/R runway),01L/R Percent of Departures (percentage of departures from 01LR runway) etc. It is also important to become more familiar with terms such as departure and late-night which are prominently used in this dataset.

    Once you have familiarized yourself with these details you can start exploring the data for further insights into how specific runways are being used for late night flight operations in San Francisco Airport and also note any patterns or trends that may emerge when looking at multiple months or years within this data set. Additionally, by comparing percentages between different runways we can measure which runways are preferred more often than others during times when congested traffic is more common such as holidays or summer months when residents take trips more often

    Research Ideas

    • To identify areas of the San Francisco Airport prone to noise pollution from aircraft and develop ways to limit it.
    • To analyze the impacts of changing departure runway preferences on noise pollution levels over residential communities near the airport.
    • To monitor seasonal trends in aircraft late night departures by runways, along with identifying peak hours for each runway, in order to inform flight controllers and develop improved flight control regulations and procedures at the San Francisco Airport

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: late-night-preferential-runway-use-1.csv | Column name | Description | |:--------------------------------|:--------------------------------------------------------| | Year | The year of the data. (Integer) | | Month | The month of the data. (String) | | 01L/R | The number of departures from runway 01L/R. (Integer) | | 01L/R Percent of Departures | The percentage of departures from runway 01L/R. (Float) | | 10L/R | The number of departures from runway 10L/R. (Integer) | | 10L/R Percent of Departures | The percentage of departures from runway 10L/R. (Float) | | 19L/R | The number of departures from runway 19L/R. (Integer) | | 19L/R Percent of Departures | The percentage of departures from runway 19L/R. (Float) | | 28L/R | The number of departures from runway 28L/R. (Integer) | | 28L/R Percent of Departures | The percentage of departures from runway 28L/R. (Float) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit City of San Francisco.

  18. MeteoSerbia1km: the first daily gridded meteorological dataset at a 1-km...

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). MeteoSerbia1km: the first daily gridded meteorological dataset at a 1-km spatial resolution across Serbia for the 2000–2019 period [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4058167?locale=da
    Explore at:
    unknown(12533878)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MeteoSerbia1km is the first daily gridded meteorological dataset at a 1-km spatial resolution across Serbia for the 2000–2019 period. The dataset consists of five daily variables: maximum, minimum and mean temperature, mean sea level pressure, and total precipitation. Besides daily summaries, it contains monthly and annual summaries, daily, monthly, and annual long term means (LTM). Daily gridded data were interpolated using the Random Forest Spatial Interpolation methodology based on Random Forest and using nearest observations and distances to them as spatial covariates, together with environmental covariates. Complete script in R and datasets used for modelling, tuning, validation, and prediction of daily meteorological variables are available here. If you discover a bug, artifact or inconsistency in the MeteoSerbia1km maps, or if you have a question please use this channel. File naming convention of .zip files and containing MeteoSerbia1km files: Daily summaries per year: day_yyyy_proj.zip var_day_yyyymmdd_proj.tif Monthly summaries: mon_proj.zip var_mon_yyyymm_proj.tif Annual summaries: ann_proj.zip var_ann_yyyy_proj.tif Daily, monthly and annual LTM: ltm_proj.zip daily LTM: var_ltm_day_mmdd_proj.tif monthly LTM: var_ltm_mon_mm_proj.tif annual LTM: var_ltm_ann_proj.tif where: var is a daily meteorological variable name - tmax, tmin, tmean, slp, or prcp proj is a dataset projection - wgs84 or utm34 Units of the dataset values are temperature (Tmean, Tmax, and Tmin) - tenths of a degree in the Celsius scale (℃) SLP - tenths of a mbar PRCP - tenths of a mm All dataset values are stored as integers (INT32 data type) in order to reduce the size of the GeoTIFF files, i.e., temperature values should be divided by 10 to obtain degrees Celsius, and the same for SLP and PRCP to obtain millibars and millimeters.

  19. Z

    Data from: AgrImOnIA: Open Access dataset correlating livestock and air...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Feb 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cameletti, Michela (2024). AgrImOnIA: Open Access dataset correlating livestock and air quality in the Lombardy region, Italy [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6620529
    Explore at:
    Dataset updated
    Feb 6, 2024
    Dataset provided by
    Golini, Natalia
    Otto, Philipp
    Shaboviq, Qendrim
    Ignaccolo, Rosaria
    Rodeschini, Jacopo
    Finazzi, Francesco
    Vinciguerra, Marco
    Fassò, Alessandro
    Fusta Moro, Alessandro
    Maranzano, Paolo
    Cameletti, Michela
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Italy, Lombardy
    Description

    The AgrImOnIA dataset is a comprehensive dataset relating air quality and livestock (expressed as the density of bovines and swine bred) along with weather and other variables. The AgrImOnIA Dataset represents the first step of the AgrImOnIA project. The purpose of this dataset is to give the opportunity to assess the impact of agriculture on air quality in Lombardy through statistical techniques capable of highlighting the relationship between the livestock sector and air pollutants concentrations.

    The building process of the dataset is detailed in the companion paper:

    A. Fassò, J. Rodeschini, A. Fusta Moro, Q. Shaboviq, P. Maranzano, M. Cameletti, F. Finazzi, N. Golini, R. Ignaccolo, and P. Otto (2023). Agrimonia: a dataset on livestock, meteorology and air quality in the Lombardy region, Italy. SCIENTIFIC DATA, 1-19.

    available here.

    This dataset is a collection of estimated daily values for a range of measurements of different dimensions as: air quality, meteorology, emissions, livestock animals and land use. Data are related to Lombardy and the surrounding area for 2016-2021, inclusive. The surrounding area is obtained by applying a 0.3° buffer on Lombardy borders.

    The data uses several aggregation and interpolation methods to estimate the measurement for all days.

    The files in the record, renamed according to their version (es. .._v_3_0_0), are:

    Agrimonia_Dataset.csv(.mat and .Rdata) which is built by joining the daily time series related to the AQ, WE, EM, LI and LA variables. In order to simplify access to variables in the Agrimonia dataset, the variable name starts with the dimension of the variable, i.e., the name of the variables related to the AQ dimension start with 'AQ_'. This file is archived also in the format for MATLAB and R software.

    Metadata_Agrimonia.csv which provides further information about the Agrimonia variables: e.g. sources used, original names of the variables imported, transformations applied.

    Metadata_AQ_imputation_uncertainty.csv which contains the daily uncertainty estimate of the imputed observation for the AQ to mitigate missing data in the hourly time series.

    Metadata_LA_CORINE_labels.csv which contains the label and the description associated with the CLC class.

    Metadata_monitoring_network_registry.csv which contains all details about the AQ monitoring station used to build the dataset. Information about air quality monitoring stations include: station type, municipality code, environment type, altitude, pollutants sampled and other. Each row represents a single sensor.

    Metadata_LA_SIARL_labels.csv which contains the label and the description associated with the SIARL class.

    AGC_Dataset.csv(.mat and .Rdata) that includes daily data of almost all variables available in the Agrimonia Dataset (excluding AQ variables) on an equidistant grid covering the Lombardy region and its surrounding area.

    The Agrimonia dataset can be reproduced using the code available at the GitHub page: https://github.com/AgrImOnIA-project/AgrImOnIA_Data

    UPDATE 31/05/2023 - NEW RELEASE - V 3.0.0

    A new version of the dataset is released: Agrimonia_Dataset_v_3_0_0.csv (.Rdata and .mat), where variable WE_rh_min, WE_rh_mean and WE_rh_max have been recomputed due to some bugs.

    In addition, two new columns are added, they are LI_pigs_v2 and LI_bovine_v2 and represents the density of the pigs and bovine (expressed as animals per kilometer squared) of a square of size ~ 10 x 10 km centered at the station localisation.

    A new dataset is released: the Agrimonia Grid Covariates (AGC) that includes daily information for the period from 2016 to 2020 of almost all variables within the Agrimonia Dataset on a equidistant grid containing the Lombardy region and its surrounding area. The AGC does not include AQ variables as they come from the monitoring stations that are irregularly spread over the area considered.

    UPDATE 11/03/2023 - NEW RELEASE - V 2.0.2

    A new version of the dataset is released: Agrimonia_Dataset_v_2_0_2.csv (.Rdata), where variable WE_tot_precipitation have been recomputed due to some bugs.

    A new version of the metadata is available: Metadata_Agrimonia_v_2_0_2.csv where the spatial resolution of the variable WE_precipitation_t is corrected.

    UPDATE 24/01/2023 - NEW RELEASE - V 2.0.1

    minor bug fixed

    UPDATE 16/01/2023 - NEW RELEASE - V 2.0.0

    A new version of the dataset is released, Agrimonia_Dataset_v_2_0_0.csv (.Rdata) and Metadata_monitoring_network_registry_v_2_0_0.csv. Some minor points have been addressed:

    Added values for LA_land_use variable for Switzerland stations (in Agrimonia Dataset_v_2_0_0.csv)

    Deleted incorrect values for LA_soil_use variable for stations outside Lombardy region during 2018 (in Agrimonia Dataset_v_2_0_0.csv)

    Fixed duplicate sensors corresponding to the same pollutant within the same station (in Metadata_monitoring_network_registry_v_2_0_0.csv)

  20. SHREC'10 Track: Non-rigid 3D Shape Retrieval

    • catalog.data.gov
    • gimi9.com
    • +1more
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). SHREC'10 Track: Non-rigid 3D Shape Retrieval [Dataset]. https://catalog.data.gov/dataset/shrec10-track-non-rigid-3d-shape-retrieval-62918
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Non-rigid 3D objects are commonly seen in our surroundings. However, previous efforts have been mainly devoted to the retrieval of rigid 3D models, and thus comparing non-rigid 3D shapes is still a challenging problem in content-based 3D object retrieval. Therefore, we organize this track to promote the development of non-rigid 3D shape retrieval. The objective of this track is to evaluate the performance of 3D shape retrieval approaches on the subset of a publicly available non-rigid 3D models database----McGill Articulated Shape Benchmark database. Task description: The task is to evaluate the dissimilarity between every two objects in the database and then output the dissimilarity matrix. Data set: The McGill Articulated Shape Benchmark database consists of 255 non-rigid 3D models which are classified into 10 categories. The maximum number of the objects in a class is 31, while the minimum number is 20. 200 models are selected (or modified) to generate our test database to ensure that every class contains equal number of models. The models are represented as watertight triangle meshes and the file format is selected as the ASCII Object File Format (*.off). The original database is publicly available on the website: http://www.cim.mcgill.ca/~shape/benchMark/ Evaluation Methodology: We will employ the following evaluation measures: Precision-Recall curve; Average Precision (AP) and Mean Average Precision (MAP); E-Measure; Discounted Cumulative Gain; Nearest Neighbor, First-Tier (Tier1) and Second-Tier (Tier2). Please Cite the paper: SHREC'10 Track: Non-rigid 3D Shape Retrieval., Z. Lian, A. Godil, T. Fabry, T. Furuya, J. Hermans, R. Ohbuchi, C. Shu, D. Smeets, P. Suetens, D. Vandermeulen, S. Wuhrer In: M. Daoudi, T. Schreck, M. Spagnuolo, I. Pratikakis, R. Veltkamp (eds.), Proceedings of the Eurographics/ACM SIGGRAPH Symposium on 3D Object Retrieval, 2010.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Martin Bulla (2016). Example of how to manually extract incubation bouts from interactive plots of raw data - R-CODE and DATA [Dataset]. http://doi.org/10.6084/m9.figshare.2066784.v1
Organization logoOrganization logo

Example of how to manually extract incubation bouts from interactive plots of raw data - R-CODE and DATA

Explore at:
txtAvailable download formats
Dataset updated
Jan 22, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Martin Bulla
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

{# General information# The script runs with R (Version 3.1.1; 2014-07-10) and packages plyr (Version 1.8.1), XLConnect (Version 0.2-9), utilsMPIO (Version 0.0.25), sp (Version 1.0-15), rgdal (Version 0.8-16), tools (Version 3.1.1) and lattice (Version 0.20-29)# --------------------------------------------------------------------------------------------------------# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# -------------------------------------------------------------------------------------------------------- # Data collection and how the individual variables were derived is described in: #Steiger, S.S., et al., When the sun never sets: diverse activity rhythms under continuous daylight in free-living arctic-breeding birds. Proceedings of the Royal Society B: Biological Sciences, 2013. 280(1764): p. 20131016-20131016. # Dale, J., et al., The effects of life history and sexual selection on male and female plumage colouration. Nature, 2015. # Data are available as Rdata file # Missing values are NA. # --------------------------------------------------------------------------------------------------------# For better readability the subsections of the script can be collapsed # --------------------------------------------------------------------------------------------------------}{# Description of the method # 1 - data are visualized in an interactive actogram with time of day on x-axis and one panel for each day of data # 2 - red rectangle indicates the active field, clicking with the mouse in that field on the depicted light signal generates a data point that is automatically (via custom made function) saved in the csv file. For this data extraction I recommend, to click always on the bottom line of the red rectangle, as there is always data available due to a dummy variable ("lin") that creates continuous data at the bottom of the active panel. The data are captured only if greenish vertical bar appears and if new line of data appears in R console). # 3 - to extract incubation bouts, first click in the new plot has to be start of incubation, then next click depict end of incubation and the click on the same stop start of the incubation for the other sex. If the end and start of incubation are at different times, the data will be still extracted, but the sex, logger and bird_ID will be wrong. These need to be changed manually in the csv file. Similarly, the first bout for a given plot will be always assigned to male (if no data are present in the csv file) or based on previous data. Hence, whenever a data from a new plot are extracted, at a first mouse click it is worth checking whether the sex, logger and bird_ID information is correct and if not adjust it manually. # 4 - if all information from one day (panel) is extracted, right-click on the plot and choose "stop". This will activate the following day (panel) for extraction. # 5 - If you wish to end extraction before going through all the rectangles, just press "escape". }{# Annotations of data-files from turnstone_2009_Barrow_nest-t401_transmitter.RData dfr-- contains raw data on signal strength from radio tag attached to the rump of female and male, and information about when the birds where captured and incubation stage of the nest1. who: identifies whether the recording refers to female, male, capture or start of hatching2. datetime_: date and time of each recording3. logger: unique identity of the radio tag 4. signal_: signal strength of the radio tag5. sex: sex of the bird (f = female, m = male)6. nest: unique identity of the nest7. day: datetime_ variable truncated to year-month-day format8. time: time of day in hours9. datetime_utc: date and time of each recording, but in UTC time10. cols: colors assigned to "who"--------------------------------------------------------------------------------------------------------m-- contains metadata for a given nest1. sp: identifies species (RUTU = Ruddy turnstone)2. nest: unique identity of the nest3. year_: year of observation4. IDfemale: unique identity of the female5. IDmale: unique identity of the male6. lat: latitude coordinate of the nest7. lon: longitude coordinate of the nest8. hatch_start: date and time when the hatching of the eggs started 9. scinam: scientific name of the species10. breeding_site: unique identity of the breeding site (barr = Barrow, Alaska)11. logger: type of device used to record incubation (IT - radio tag)12. sampling: mean incubation sampling interval in seconds--------------------------------------------------------------------------------------------------------s-- contains metadata for the incubating parents1. year_: year of capture2. species: identifies species (RUTU = Ruddy turnstone)3. author: identifies the author who measured the bird4. nest: unique identity of the nest5. caught_date_time: date and time when the bird was captured6. recapture: was the bird capture before? (0 - no, 1 - yes)7. sex: sex of the bird (f = female, m = male)8. bird_ID: unique identity of the bird9. logger: unique identity of the radio tag --------------------------------------------------------------------------------------------------------}

Search
Clear search
Close search
Google apps
Main menu