48 datasets found
  1. e

    Merger of BNV-D data (2008 to 2019) and enrichment

    • data.europa.eu
    zip
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick VINCOURT (2025). Merger of BNV-D data (2008 to 2019) and enrichment [Dataset]. https://data.europa.eu/data/datasets/5f1c3eca9d149439e50c740f?locale=en
    Explore at:
    zip(18530465)Available download formats
    Dataset updated
    Jan 16, 2025
    Dataset authored and provided by
    Patrick VINCOURT
    Description

    Merging (in Table R) data published on https://www.data.gouv.fr/fr/datasets/ventes-de-pesticides-par-departement/, and joining two other sources of information associated with MAs: — uses: https://www.data.gouv.fr/fr/datasets/usages-des-produits-phytosanitaires/ — information on the “Biocontrol” status of the product, from document DGAL/SDQSPV/2020-784 published on 18/12/2020 at https://agriculture.gouv.fr/quest-ce-que-le-biocontrole

    All the initial files (.csv transformed into.txt), the R code used to merge data and different output files are collected in a zip. enter image description here NB: 1) “YASCUB” for {year,AMM,Substance_active,Classification,Usage,Statut_“BioConttrol”}, substances not on the DGAL/SDQSPV list being coded NA. 2) The file of biocontrol products shall be cleaned from the duplicates generated by the marketing authorisations leading to several trade names.
    3) The BNVD_BioC_DY3 table and the output file BNVD_BioC_DY3.txt contain the fields {Code_Region,Region,Dept,Code_Dept,Anne,Usage,Classification,Type_BioC,Quantite_substance)}

  2. KORUS-AQ Aircraft Merge Data Files - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). KORUS-AQ Aircraft Merge Data Files - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/korus-aq-aircraft-merge-data-files-9bba5
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    KORUSAQ_Merge_Data are pre-generated merge data files combining various products collected during the KORUS-AQ field campaign. This collection features pre-generated merge files for the DC-8 aircraft. Data collection for this product is complete.The KORUS-AQ field study was conducted in South Korea during May-June, 2016. The study was jointly sponsored by NASA and Korea’s National Institute of Environmental Research (NIER). The primary objectives were to investigate the factors controlling air quality in Korea (e.g., local emissions, chemical processes, and transboundary transport) and to assess future air quality observing strategies incorporating geostationary satellite observations. To achieve these science objectives, KORUS-AQ adopted a highly coordinated sampling strategy involved surface and airborne measurements including both in-situ and remote sensing instruments.Surface observations provided details on ground-level air quality conditions while airborne sampling provided an assessment of conditions aloft relevant to satellite observations and necessary to understand the role of emissions, chemistry, and dynamics in determining air quality outcomes. The sampling region covers the South Korean peninsula and surrounding waters with a primary focus on the Seoul Metropolitan Area. Airborne sampling was primarily conducted from near surface to about 8 km with extensive profiling to characterize the vertical distribution of pollutants and their precursors. The airborne observational data were collected from three aircraft platforms: the NASA DC-8, NASA B-200, and Hanseo King Air. Surface measurements were conducted from 16 ground sites and 2 ships: R/V Onnuri and R/V Jang Mok.The major data products collected from both the ground and air include in-situ measurements of trace gases (e.g., ozone, reactive nitrogen species, carbon monoxide and dioxide, methane, non-methane and oxygenated hydrocarbon species), aerosols (e.g., microphysical and optical properties and chemical composition), active remote sensing of ozone and aerosols, and passive remote sensing of NO2, CH2O, and O3 column densities. These data products support research focused on examining the impact of photochemistry and transport on ozone and aerosols, evaluating emissions inventories, and assessing the potential use of satellite observations in air quality studies.

  3. Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

  4. Data supporting the Master thesis "Monitoring von Open Data Praktiken -...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Katharina Zinke; Katharina Zinke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Dresden
    Description

    Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

    This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

    The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

    ## Data sources

    Folder 01_SourceData/

    - PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

    - ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

    - ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

    - Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

    ## Automatic classification

    Folder 02_AutomaticClassification/

    - (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

    - (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

    - PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

    - oddpub_results_wDOIs.csv (results file of the ODDPub classification)

    - PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

    ## Manual coding

    Folder 03_ManualCheck/

    - CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

    - ManualCheck_2023-06-08.csv (Manual coding results file)

    - PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

    ## Explorative analysis for the discoverability of open data

    Folder04_FurtherAnalyses

    Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

    ## R-Script

    Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)

  5. France Weekly Real Estate Listings 2022-2023

    • kaggle.com
    zip
    Updated Apr 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artur Dragunov (2024). France Weekly Real Estate Listings 2022-2023 [Dataset]. https://www.kaggle.com/datasets/arturdragunov/france-weekly-real-estate-listings-2022-2023
    Explore at:
    zip(2750497 bytes)Available download formats
    Dataset updated
    Apr 3, 2024
    Authors
    Artur Dragunov
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    France
    Description

    These Kaggle datasets provide downloaded real estate listings from the French real estate market, capturing data from a leading platform in France (Seloger), reminiscent of the approach taken for the US dataset from Redfin and UK dataset from Zoopla. It encompasses detailed property listings, pricing, and market trends across France, stored in weekly CSV snapshots. The cleaned and merged version of all the snapshots is named as France_clean_unique.csv.

    The cleaning process mirrored that of the US dataset, involving removing irrelevant features, normalizing variable names for dataset consistency with USA and UK, and adjusting variable value ranges to get rid of extreme outliers. To augment the dataset's depth, external factors like inflation rates, stock market volatility, and macroeconomic indicators have been integrated, offering a multifaceted perspective on France's real estate market drivers.

    For exact column descriptions, see columns for France_clean_unique.csv and my thesis.

    Table 2.5 and Section 2.2.1, which I refer to in the column descriptions, can be found in my thesis; see University Library. Click on Online Access->Hlavni prace.

    If you want to continue generating datasets yourself, see my Github Repository for code inspiration.

    Let me know if you want to see how I got from raw data to France_clean_unique.csv. There are multiple steps, including cleaning Tableau Prep and R, downloading and merging external variables to the dataset, removing duplicates, and renaming some columns.

  6. USA Weekly Real Estate Listings 2022-2023

    • kaggle.com
    zip
    Updated Apr 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artur Dragunov (2024). USA Weekly Real Estate Listings 2022-2023 [Dataset]. https://www.kaggle.com/datasets/arturdragunov/usa-weekly-real-estate-listings
    Explore at:
    zip(66961155 bytes)Available download formats
    Dataset updated
    Apr 3, 2024
    Authors
    Artur Dragunov
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    United States
    Description

    These Kaggle datasets offer a comprehensive analysis of the US real estate market, leveraging data sourced from Redfin via an unofficial API. It contains weekly snapshots stored in CSV files, reflecting the dynamic nature of property listings, prices, and market trends across various states and cities, except for Wyoming, Montana, and North Dakota, and with specific data generation for Texas cities. Notably, the dataset includes a prepared version, USA_clean_unique, which has undergone initial cleaning steps as outlined in the thesis. These datasets were part of my thesis; other two countries were France and UK.

    These steps include: - Removal of irrelevant features for statistical analysis. - Renaming variables for consistency across international datasets. - Adjustment of variable value ranges for a more refined analysis.

    Unique aspects such as Redfin’s “hot” label algorithm, property search status, and detailed categorizations of property types (e.g., single-family residences, condominiums/co-ops, multi-family homes, townhouses) provide deep insights into the market. Additionally, external factors like interest rates, stock market volatility, unemployment rates, and crime rates have been integrated to enrich the dataset and offer a multifaceted view of the real estate market's drivers.

    The USA_clean_unique dataset represents a key step before data normalization/trimming, containing variables both in their raw form and categorized based on predefined criteria, such as property size, year of construction, and number of bathrooms/bedrooms. This structured approach aims to capture the non-linear relationships between various features and property prices, enhancing the dataset's utility for predictive modeling and market analysis.

    See columns from USA_clean_unique.csv and my Thesis (Table 2.8) for exact column descriptions.

    Table 2.4 and Section 2.2.3, which I refer to in the column descriptions, can be found in my thesis; see University Library. Click on Online Access->Hlavni prace.

    If you want to continue generating datasets yourself, see my Github Repository for code inspiration.

    Let me know if you want to see how I got from raw data to USA_clean_unique.csv. Multiple steps include cleaning in Tableau Prep and R, downloading and merging external variables to the dataset, removing duplicates, and renaming columns for consistency.

  7. u

    SBI Cruise NBP03-04a merged bottle dataset

    • data.ucar.edu
    • arcticdata.io
    • +1more
    ascii
    Updated Aug 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dennis Hansell; Nick R. Bates; Service Group, Scripps Institution of Oceanography, University of California - San Diego; Steven Roberts (2025). SBI Cruise NBP03-04a merged bottle dataset [Dataset]. https://data.ucar.edu/dataset/sbi-cruise-nbp03-04a-merged-bottle-dataset
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Aug 1, 2025
    Authors
    Dennis Hansell; Nick R. Bates; Service Group, Scripps Institution of Oceanography, University of California - San Diego; Steven Roberts
    Time period covered
    Jul 5, 2003 - Aug 20, 2003
    Area covered
    Description

    This data set contains merged bottle data from the SBI cruise on the United States Coast Guard Cutter (USCGC) Nathaniel B. Palmer (NBP03-04a). During this cruise rosette casts were conducted and a bottle data file was generated by the Scripps Service group from these water samples. Additional groups were funded to measure supplementary parameters from these same water samples. This data set is the first version of the merging of the Scripps Service group bottle data file with these data gathered by these additional groups.

  8. n

    Multilevel modeling of time-series cross-sectional data reveals the dynamic...

    • data.niaid.nih.gov
    • dataone.org
    • +1more
    zip
    Updated Mar 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kodai Kusano (2020). Multilevel modeling of time-series cross-sectional data reveals the dynamic interaction between ecological threats and democratic development [Dataset]. http://doi.org/10.5061/dryad.547d7wm3x
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 6, 2020
    Dataset provided by
    University of Nevada, Reno
    Authors
    Kodai Kusano
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    What is the relationship between environment and democracy? The framework of cultural evolution suggests that societal development is an adaptation to ecological threats. Pertinent theories assume that democracy emerges as societies adapt to ecological factors such as higher economic wealth, lower pathogen threats, less demanding climates, and fewer natural disasters. However, previous research confused within-country processes with between-country processes and erroneously interpreted between-country findings as if they generalize to within-country mechanisms. In this article, we analyze a time-series cross-sectional dataset to study the dynamic relationship between environment and democracy (1949-2016), accounting for previous misconceptions in levels of analysis. By separating within-country processes from between-country processes, we find that the relationship between environment and democracy not only differs by countries but also depends on the level of analysis. Economic wealth predicts increasing levels of democracy in between-country comparisons, but within-country comparisons show that democracy declines as countries become wealthier over time. This relationship is only prevalent among historically wealthy countries but not among historically poor countries, whose wealth also increased over time. By contrast, pathogen prevalence predicts lower levels of democracy in both between-country and within-country comparisons. Our longitudinal analyses identifying temporal precedence reveal that not only reductions in pathogen prevalence drive future democracy, but also democracy reduces future pathogen prevalence and increases future wealth. These nuanced results contrast with previous analyses using narrow, cross-sectional data. As a whole, our findings illuminate the dynamic process by which environment and democracy shape each other.

    Methods Our Time-Series Cross-Sectional data combine various online databases. Country names were first identified and matched using R-package “countrycode” (Arel-Bundock, Enevoldsen, & Yetman, 2018) before all datasets were merged. Occasionally, we modified unidentified country names to be consistent across datasets. We then transformed “wide” data into “long” data and merged them using R’s Tidyverse framework (Wickham, 2014). Our analysis begins with the year 1949, which was occasioned by the fact that one of the key time-variant level-1 variables, pathogen prevalence was only available from 1949 on. See our Supplemental Material for all data, Stata syntax, R-markdown for visualization, supplemental analyses and detailed results (available at https://osf.io/drt8j/).

  9. d

    Health and Retirement Study (HRS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D

  10. Scripts for Analysis

    • figshare.com
    txt
    Updated Jul 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sneddon Lab UCSF (2018). Scripts for Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6783569.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 18, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sneddon Lab UCSF
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.

  11. UK Weekly Real Estate Listings 2022-2023

    • kaggle.com
    zip
    Updated Apr 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artur Dragunov (2024). UK Weekly Real Estate Listings 2022-2023 [Dataset]. https://www.kaggle.com/datasets/arturdragunov/uk-weekly-real-estate-listings-2022-2023
    Explore at:
    zip(29112488 bytes)Available download formats
    Dataset updated
    Apr 3, 2024
    Authors
    Artur Dragunov
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    These Kaggle datasets provide downloaded real-estate listings from the UK real estate market, capturing data from a leading platform in the UK (Zoopla), reminiscent of the approach taken for the US dataset from Redfin and French dataset from Seloger. It encompasses detailed property listings, pricing, and market trends across UK, stored in weekly CSV snapshots. The cleaned and merged version of all the snapshots is named as UK_clean_unique.csv.

    The cleaning process mirrored that of the US and French datasets, involving removing irrelevant features, normalizing variable names for dataset consistency with the USA and France, and adjusting variable value ranges to get rid of extreme outliers. To augment the dataset's depth, external factors like inflation rates, stock market volatility, and macroeconomic indicators have been integrated, offering a multifaceted perspective on the UK's real estate market drivers.

    For exact column descriptions, see columns for UK_clean_unique.csv and my thesis.

    Table 2.6 and Section 2.2.2, which I refer to in the column descriptions, can be found in my thesis; see University Library. Click on Online Access->Hlavni prace.

    If you want to continue generating datasets yourself, see my Github Repository for code inspiration.

    Let me know if you want to see how I got from raw data to France_clean_unique.csv. There are multiple steps, including cleaning Tableau Prep and R, downloading and merging external variables to the dataset, removing duplicates, and renaming some columns.

  12. Data and code for: "Combining environmental DNA and remote sensing variables...

    • zenodo.org
    zip
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robin Bauknecht; Robin Bauknecht; Loïc Pellissier; Loïc Pellissier; Sebastien Brosse; Sebastien Brosse; Vincent Prié; Vincent Prié; Manuel Lopes-Lima; Manuel Lopes-Lima; Pedro Beja; Monika Goralczyk; Monika Goralczyk; Andrea Polanco Fernandez; Jorge Moreno Tilano; Jorge Moreno Tilano; Rafik Neme; Rafik Neme; Mailyn Gonzalez; Shuo Zong; Pedro Beja; Andrea Polanco Fernandez; Mailyn Gonzalez; Shuo Zong (2025). Data and code for: "Combining environmental DNA and remote sensing variables to model fish biodiversity in tropical river ecosystems" [Dataset]. http://doi.org/10.5281/zenodo.15869405
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Robin Bauknecht; Robin Bauknecht; Loïc Pellissier; Loïc Pellissier; Sebastien Brosse; Sebastien Brosse; Vincent Prié; Vincent Prié; Manuel Lopes-Lima; Manuel Lopes-Lima; Pedro Beja; Monika Goralczyk; Monika Goralczyk; Andrea Polanco Fernandez; Jorge Moreno Tilano; Jorge Moreno Tilano; Rafik Neme; Rafik Neme; Mailyn Gonzalez; Shuo Zong; Pedro Beja; Andrea Polanco Fernandez; Mailyn Gonzalez; Shuo Zong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Combining environmental DNA and remote sensing variables to model fish biodiversity in tropical river ecosystems

    General Information

    This repository contains the code used for the analysis presented in the paper:
    "Combining environmental DNA and remote sensing variables to model fish biodiversity in tropical river ecosystems."

    Authors:
    Robin Bauknecht, Loïc Pellissier, Sébastien Brosse, Vincent Prié, Manuel Lopes-Lima, Pedro Beja, Monika K. Goralczyk, Andrea Polanco Fernandez, Jorge A. Moreno-Tilano, Rafik Neme, Mailyn A. Gonzalez, Shuo Zong

    Correspondence:
    Robin Bauknecht – rbauknecht@ethz.ch
    Shuo Zong – shuo.zong@usys.ethz.ch

    Link to paper: https://doi.org/10.1016/j.ecoinf.2025.103251" target="_blank" rel="noreferrer noopener">https://doi.org/10.1016/j.ecoinf.2025.103251

    Repository Structure

    This repository is organized as an R Project. Open the `.Rproj` file in RStudio for streamlined access to the analysis workflow.

    data/ folder:
    - swarm_output_clean/: Cleaned output from the SWARM clustering algorithm
    - rs_variables/: Remote sensing variables for each sampling site
    - site_sample_mapping/: Links samples to sampling sites (typically two replicates per site)

    scripts/ folder:
    - global_model.Rmd: Global biodiversity modeling
    - local_model_maroni.Rmd: Modeling for the Maroni River
    - local_model_oyapock.Rmd: Modeling for the Oyapock River
    - plots.R: Code for generating plots
    - helper_functions.R: Reusable helper functions
    - calculating_per_sample_metrics.R: Script for calculating per-sample biodiversity metrics



    Raw eDNA Sequencing Data

    In addition to the cleaned SWARM output included in this repository, raw sequencing data is available from the following sources:

    - Magdalena River:

    - Casamance:

    - Kinabatangan
    [Link to be added soon]

    - African Rivers

    - Guiana Rivers
    Some raw reads are available via:
    This study additionally includes new sites. Correlation tags for these are provided in data/additional_tags_guiana.csv and can be used together with the raw read files in the above repository to process these additional samples.

    ---

    Please reach out to the corresponding authors for any questions or collaboration inquiries.

    Funding

    This work was supported by a China Scholarship Council grant awarded to S. Z. This project has also received support from the project NORTE-01-0145-FEDER-000046 under the Norte Portugal Regional Operational Programme (NORTE2020), through the European Regional Development Fund (ERDF) and the Portugal 2020 Partnership Agreement. Additionally, M. L.-L. was funded by FCT - Fundação para a Ciência e Tecnologia [contract 2020.03608.CEECIND].

  13. s

    Data from: RAW data from Towards Holistic Environmental Policy Assessment:...

    • research.science.eus
    • data.europa.eu
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Borges, Cruz E.; Ferrón, Leandro; Soimu, Oxana; Mugarra, Aitziber; Borges, Cruz E.; Ferrón, Leandro; Soimu, Oxana; Mugarra, Aitziber (2024). RAW data from Towards Holistic Environmental Policy Assessment: Multi-Criteria Frameworks and recommendations for modelers paper [Dataset]. https://research.science.eus/documentos/685699066364e456d3a65172
    Explore at:
    Dataset updated
    2024
    Authors
    Borges, Cruz E.; Ferrón, Leandro; Soimu, Oxana; Mugarra, Aitziber; Borges, Cruz E.; Ferrón, Leandro; Soimu, Oxana; Mugarra, Aitziber
    Description

    Name: Data used to rate the relevance of each dimension necessary for a Holistic Environmental Policy Assessment.

    Summary: This dataset contains answers from a panel of experts and the public to rate the relevance of each dimension on a scale of 0 (Nor relevant at all) to 100 (Extremely relevant).

    License: CC-BY-SA

    Acknowledge: These data have been collected in the framework of the DECIPHER project. This project has received funding from the European Union’s Horizon Europe programme under grant agreement No. 101056898.

    Disclaimer: Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

    Collection Date: 2024-1 / 2024-04

    Publication Date: 22/04/2025

    DOI: 10.5281/zenodo.13909413

    Other repositories: -

    Author: University of Deusto

    Objective of collection: This data was originally collected to prioritise the dimensions to be further used for Environmental Policy Assessment and IAMs enlarged scope.

    Description:

    Data Files (CSV)

    decipher-public.csv : Public participants' general survey results in the framework of the Decipher project, including socio demographic characteristics and overall perception of each dimension necessary for a Holistic Environmental Policy Assessment.

    decipher-risk.csv : Contains individual survey responses regarding prioritisation of dimensions in risk situations. Includes demographic and opinion data from a targeted sample.

    decipher-experts.csv : Experts’ opinions collected on risk topics through surveys in the framework of Decipher Project, targeting professionals in relevant fields.

    decipher-modelers.csv: Answers given by the developers of models about the characteristics of the models and dimensions covered by them.

    prolific_export_risk.csv : Exported survey data from Prolific, focusing specifically on ratings in risk situations. Includes response times, demographic details, and survey metadata.

    prolific_export_public_{1,2}.csv : Public survey exports from Prolific, gathering prioritisation of dimensions necessary for environmental policy assessment.

    curated.csv : Final cleaned and harmonized dataset combining multiple survey sources. Designed for direct statistical analysis with standardized variable names.

    Scripts files (R)

    decipher-modelers.R: Script to assess the answers given modelers about the characteristics of the models.

    joint.R: Script to clean and joint the RAW answers from the different surveys to retrieve overall perception of each dimension necessary for a Holistic Environmental Policy Assessment.

    Report Files

    decipher-modelers.pdf: Diagram with the result of the

    full-Country.html : Full interactive report showing dimension prioritisation broken down by participant country.

    full-Gender.html : Visualization report displaying differences in dimension prioritisation by gender.

    full-Education.html : Detailed breakdown of dimension prioritisation results based on education level.

    full-Work.html : Report focusing on participant occupational categories and associated dimension prioritisation.

    full-Income.html : Analysis report showing how income level correlates with dimension prioritisation.

    full-PS.html : Report analyzing Political Sensitivity scores across all participants.

    full-type.html : Visualization report comparing participant dimensions prioritisation (public vs experts) in normal and risk situations.

    full-joint-Country.html : Joint analysis report integrating multiple dimensions of country-based dimension prioritisation in normal and risk situations. Combines demographic and response patterns.

    full-joint-Gender.html : Combined gender-based analysis across datasets, exploring intersections of demographic factors and dimensions prioritisation in normal and risk situations.

    full-joint-Education.html : Education-focused report merging various datasets to show consistent or divergent patterns of dimensions prioritisation in normal and risk awareness.

    full-joint-Work.html : Cross-dataset analysis of occupational groups and their dimensions prioritisation in normal and risk situation

    full-joint-Income.html : Income-stratified joint analysis, merging public and expert datasets to find common trends and significant differences during dimensions prioritisation in normal and risks situations.

    full-joint-PS.html : Comprehensive Political Sensitivity score report from merged datasets, highlighting general patterns and subgroup variations in normal and risk situations.

    5 star: ⭐⭐⭐

    Preprocessing steps: The data has been re-coded and cleaned using the scripts provided.

    Reuse: NA

    Update policy: No more updates are planned.

    Ethics and legal aspects: Names of the persons involved have been removed.

    Technical aspects:

    Other:

  14. Cyclistic

    • kaggle.com
    zip
    Updated May 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salam Ibrahim (2022). Cyclistic [Dataset]. https://www.kaggle.com/datasets/salamibrahim/cyclistic
    Explore at:
    zip(209748131 bytes)Available download formats
    Dataset updated
    May 12, 2022
    Authors
    Salam Ibrahim
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    **Introduction ** This case study will be based on Cyclistic, a bike sharing company in Chicago. I will perform tasks of a junior data analyst to answer business questions. I will do this by following a process that includes the following phases: ask, prepare, process, analyze, share and act.

    Background Cyclistic is a bike sharing company that operates 5828 bikes within 692 docking stations. The company has been around since 2016 and separates itself from the competition due to the fact that they offer a variety of bike services including assistive options. Lily Moreno is the director of the marketing team and will be the person to receive these insights from this analysis.

    Case Study and business task Lily Morenos perspective on how to generate more income by marketing Cyclistics services correctly includes converting casual riders (one day passes and/or pay per ride customers) into annual riders with a membership. Annual riders are more profitable than casual riders according to the finance analysts. She would rather see a campaign targeting casual riders into annual riders, instead of launching campaigns targeting new costumers. So her strategy as the manager of the marketing team is simply to maximize the amount of annual riders by converting casual riders.

    In order to make a data driven decision, Moreno needs the following insights: - A better understanding of how casual riders and annual riders differ - Why would a casual rider become an annual one - How digital media can affect the marketing tactics

    Moreno has directed me to the first question - how do casual riders and annual riders differ?

    Stakeholders Lily Moreno, manager of the marketing team Cyclistic Marketing team Executive team

    Data sources and organization Data used in this report is made available and is licensed by Motivate International Inc. Personal data is hidden to protect personal information. Data used is from the past 12 months (01/04/2021 – 31/03/2022) of bike share dataset.

    By merging all 12 monthly bike share data provided, an extensive amount of data with 5,400,000 rows were returned and included in this analysis.

    Data security and limitations: Personal information is secured and hidden to prevent unlawful use. Original files are backed up in folders and subfolders.

    Tools and documentation of cleaning process The tools used for data verification and data cleaning are Microsoft Excel and R programming. The original files made accessible by Motivate International Inc. are backed up in their original format and in separate files.

    Microsoft Excel is used to generally look through the dataset and get a overview of the content. I performed simple checks of the data by filtering, sorting, formatting and standardizing the data to make it easily mergeable.. In Excel, I also changed data type to have the right format, removed unnecessary data if its incomplete or incorrect, created new columns to subtract and reformat existing columns and deleting empty cells. These tasks are easily done in spreadsheets and provides an initial cleaning process of the data.

    R will be used to perform queries of bigger datasets such as this one. R will also be used to create visualizations to answer the question at hand.

    Limitations Microsoft Excel has a limitation of 1,048,576 rows while the data of the 12 months combined are over 5,500,000 rows. When combining the 12 months of data into one table/sheet, Excel is no longer efficient and I switched over to R programming.

  15. h

    Mixmix-LLaMAX

    • huggingface.co
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcus Cedric R. Idia (2025). Mixmix-LLaMAX [Dataset]. https://huggingface.co/datasets/marcuscedricridia/Mixmix-LLaMAX
    Explore at:
    Dataset updated
    Apr 3, 2025
    Authors
    Marcus Cedric R. Idia
    Description

    Merged UI Dataset: Mixmix-LLaMAX

    This dataset was automatically generated by merging and processing the following sources: marcuscedricridia/s1K-claude-3-7-sonnet, marcuscedricridia/Creative_Writing-ShareGPT-deepclean-sharegpt, marcuscedricridia/Medical-R1-Distill-Data-deepclean-sharegpt, marcuscedricridia/Open-Critic-GPT-deepclean-sharegpt, marcuscedricridia/kalo-opus-instruct-22k-no-refusal-deepclean-sharegpt, marcuscedricridia/unAIthical-ShareGPT-deepclean-sharegpt… See the full description on the dataset page: https://huggingface.co/datasets/marcuscedricridia/Mixmix-LLaMAX.

  16. D

    Replication Data for: Wake merging and turbulence transition downstream of...

    • dataverse.no
    • search.dataone.org
    Updated Nov 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    R. Jason Hearst; R. Jason Hearst; Fanny Olivia Johannessen Berstad; Ingrid Neunaber; Ingrid Neunaber; Fanny Olivia Johannessen Berstad (2025). Replication Data for: Wake merging and turbulence transition downstream of side-by-side porous discs [Dataset]. http://doi.org/10.18710/XAEWC5
    Explore at:
    application/x-rlang-transport(1417054263), application/x-rlang-transport(1363277492), application/x-rlang-transport(1400436794), txt(7883), application/x-rlang-transport(1355448278), application/x-rlang-transport(1069205992), application/x-rlang-transport(1389202797), application/x-rlang-transport(1434576877), application/x-rlang-transport(959411386), application/x-rlang-transport(1373148467), application/x-rlang-transport(1398098974), application/x-rlang-transport(1195049341), application/x-rlang-transport(1605897578), application/x-rlang-transport(1341687981), application/x-rlang-transport(1276474862), application/x-rlang-transport(1097556108), application/x-rlang-transport(1412349302), application/x-rlang-transport(1471679338), application/x-rlang-transport(1292190917), application/x-rlang-transport(1033022936), application/x-rlang-transport(1287168311), application/x-rlang-transport(1425403151), application/x-rlang-transport(1417989437), application/x-rlang-transport(1361195525), application/x-rlang-transport(1313472566)Available download formats
    Dataset updated
    Nov 4, 2025
    Dataset provided by
    DataverseNO
    Authors
    R. Jason Hearst; R. Jason Hearst; Fanny Olivia Johannessen Berstad; Ingrid Neunaber; Ingrid Neunaber; Fanny Olivia Johannessen Berstad
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These are the streamwise velocity time series measured in the wakes of two sets of porous discs in side-by-side setting as used in the manuscript ``Wake merging and turbulence transition downstream of side-by-side porous discs´´ which is accepted by Journal of Fluid Mechanics. Data was obtained by means of hot-wire anemometry in the Large Scale Wind Tunnel at the Norwegian University of Science and Technology in near-laminar inflow (background turbulence intensity of approximately 0.3%) at an inflow velocity of 10m/s (diameter-based Reynolds number 125000). Two types of porous discs with diameters D = 0.2m, one with uniform blockage and one with radially changing blockage, were used. Three spacings, namely 1.5D, 2D and 3D, were investigated. Span-wise profiles were measured at 8D and 30D downstream for each case, and a streamwise profile along the centerline between the discs was additionally obtained. In addition, measurements downstream of both disc types (singe disc setting) are provided as comparison. The scope of these experiments was to study the merging mechanisms of the turbulence when the two wakes are meeting.

  17. t

    A High Statistics Measurement of the Proton Structure Functions F(2) (x,...

    • service.tib.eu
    Updated Mar 1, 2003
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2003). A High Statistics Measurement of the Proton Structure Functions F(2) (x, Q**2) and R from Deep Inelastic Muon Scattering at High Q**2 - Vdataset - LDM in NFDI4Energy [Dataset]. https://service.tib.eu/ldm_nfdi4energy/ldmservice/dataset/inspirehep_6113ee8e-6d78-4f49-9f94-0d76772539d3
    Explore at:
    Dataset updated
    Mar 1, 2003
    Description

    CERN-SPS. NA4/BCDMS collaboration. Plab 100 - 280 GEV/C. These are data from the BCDMS collaboration on F2 and R=SIG(L)/SIG(T) with a hydrogen target. The statistics are very large (1.8 million events). The ranges of X,Q2 are 0.06& lt;X& lt;0.8 and 7& lt;Q2& lt;260 GeV2. The F2 data show a distinct difference from the data on F2 proton taken by the EMC.. The publication lists values of F2 corresponding to R=0 and R=R(QCD) at each of the four energies, 100, 120, 200 and 280 GeV. As well as the statistical errors also given are 5 factors representing the effects of estimated systematic errors on F2 associated with (1) beam momentum calibration, (2) magnetic field calibration, (3) spectrometer resolution, (4) detector and trigger inefficiencies, and (5) relative normalisation uncertainty of data taken from external and internal targets. This record contains our attempt to merge these data at different energies using the statistical errors as weight factors. The final one-sigma systematic errors given here have been calculated using a prescription from the authors involving calculation of new merged F2 values for each of the systematic errors applied individually, and the combining in quadrature the differences in the new merged F2 values and the original F2. The individual F2 values at each energy are given in separate database records (& lt;a href=http://durpdg.dur.ac.uk/scripts/reacsearch.csh/TESTREAC/red+3021& gt; RED = 3021 & lt;/a& gt;). PLAB=100 GeV/c. These are the data from the BCDMS Collaboration on F2 and R=SIG(L)/SIG(T) with a hydrogen target. The statistics are very large (1.8 million events). The ranges of X, Q2 are 0.06& lt;X& lt;0.8 and 7& lt;Q2& lt;260 GeV2. The F2 data show a distinct difference from the data on F2 proton taken by the EMC. In the preprint are listed values of F2 corresponding to R=0 and R=R(QCD) at each of the four energies, 100, 120, 200 and 280 GeV. Also listed are 5 systematic errors associated with beam momentum calibration, magnetic field calibration, spectrometer resolution, detector and trigger inefficiencies and relative normalisationuncertainty.. The sytematic error shown in the tables is a result of combining together the 5 individual errors according to a prescription provided by the authors. Themethod involves taking the quadratic sum of the errors from each source.. The record (& lt;a href=http://durpdg.dur.ac.uk/scripts/reacsearch.csh/TESTREAC/red+3019& gt; RED = 3019 & lt;/a& gt;) contains our attempt to merge these data at different energies using the statistical errors as weight factors. PLAB=120 GeV/c. These are the data from the BCDMS Collaboration on F2 and R=SIG(L)/SIG(T) with a hydrogen target. The statistics are very large (1.8 million events). The ranges of X, Q2 are 0.06& lt;X& lt;0.8 and 7& lt;Q2& lt;260 GeV2. The F2 data show a distinct difference from the data on F2 proton taken by the EMC. In the preprint are listed values of F2 corresponding to R=0 and R=R(QCD) at each of the four energies, 100, 120, 200 and 280 GeV. Also listed are 5 systematic errors associated with beam momentum calibration, magnetic field calibration, spectrometer resolution, detector and trigger inefficiencies and relative normalisationuncertainty. The sytematic error shown in the tables is a result of combining together the 5 individual errors according to a prescription provided by the authors. Themethod involves taking the quadratic sum of the errors from each source. The record (& lt;a href=http://durpdg.dur.ac.uk/scripts/reacsearch.csh/TESTREAC/red+3019& gt; RED = 3019 & lt;/a& gt;) contains our attempt to merge these data at different energies using the statistical errors as weight factors. PLAB=200 GeV/c. These are the data from the BCDMS Collaboration on F2 and R=SIG(L)/SIG(T) with a hydrogen target. The statistics are very large (1.8 million events). The ranges of X, Q2 are 0.06& lt;X& lt;0.8 and 7& lt;Q2& lt;260 GeV2. The F2 data show a distinct difference from the data on F2 proton taken by the EMC. In the preprint are listed values of F2 corresponding to R=0 and R=R(QCD) at each of the four energies, 100, 120, 200 and 280 GeV. Also listed are 5 systematic errors associated with beam momentum calibration, magnetic field calibration, spectrometer resolution, detector and trigger inefficiencies and relative normalisationuncertainty. The sytematic error shown in the tables is a result of combining together the 5 individual errors according to a prescription provided by the authors. Themethod involves taking the quadratic sum of the errors from each source. The record (& lt;a href=http://durpdg.dur.ac.uk/scripts/reacsearch.csh/TESTREAC/red+3019& gt; RED = 3019 & lt;/a& gt;) contains our attempt to merge these data at different energies using the statistical errors as weight factors. PLAB=280 GeV/c. These are the data from the BCDMS Collaboration on F2 and R=SIG(L)/SIG(T) with a hydrogen target. The statistics are very large (1.8 million events). The ranges of X, Q2 are 0.06& lt;X& lt;0.8 and 7& lt;Q2& lt;260 GeV**2. The F2 data show a distinct difference from the data on F2 proton taken by the EMC. In the preprint are listed values of F2 corresponding to R=0 and R=R(QCD) at each of the four energies, 100, 120, 200 and 280 GeV. Also listed are 5 systematic errors associated with beam momentum calibration, magnetic field calibration, spectrometer resolution, detector and trigger inefficiencies and relative normalisationuncertainty. The sytematic error shown in the tables is a result of combining together the 5 individual errors according to a prescription provided by the authors. Themethod involves taking the quadratic sum of the errors from each source. The record (& lt;a href=http://durpdg.dur.ac.uk/scripts/reacsearch.csh/TESTREAC/red+3019& gt; RED = 3019 & lt;/a& gt;) contains our attempt to merge these data at different energies using the statistical errors as weight factors.

  18. H

    data set for EC 30 mins for the two sites

    • dataverse.harvard.edu
    Updated Oct 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Benavides (2025). data set for EC 30 mins for the two sites [Dataset]. http://doi.org/10.7910/DVN/BBKIOQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Juan Benavides
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Dataset Description This dataset and accompanying R scripts support the intensive carbon dynamics observation platform conducted in tropical alpine peatlands of Guatavita, Colombia. The data include half-hourly and cumulative greenhouse-gas fluxes (CO₂, CH₄, N₂O), dissolved organic carbon (DOC) transport, and related hydrological and meteorological measurements, together with model outputs and analysis scripts. All analyses were performed in R (version ≥ 4.2). The repository is organized into two main components: Chamber and Bayesian analysis pipeline (root folder) Tower flux gap-filling and uncertainty analysis (folder golden/) 1. Chamber and Bayesian Workflow This section integrates chamber measurements, water-table data, and modeled fluxes for both conserved and degraded peatland plots. The scripts allow data preparation, prediction of half-hourly fluxes, Bayesian partitioning of net ecosystem exchange (NEE) into gross primary production (GPP) and ecosystem respiration (ER), and generation of publication-quality figures. Main steps: Data preparation – Cleaning and merging chamber and tower data (flux_chamber3.r, flux_wt_guatavita_jc.r, waterlevel.r). Prediction dataset construction – Builds model input datasets (flux predict.R, flux predict2.R). Bayesian flux partitioning – Separates NEE into GPP and ER using hierarchical Bayesian models (bayesian models.r, bayesianflux.r). This step must be run separately for each station (ST1 and ST2) by modifying the station code inside the scripts. Trace gas analyses – Quantifies N₂O and DOC fluxes (N2Oflux.r, DOC_flux.r). Visualization and summaries – Produces the cumulative and seasonal flux figures and summary tables (final plot.r). Primary outputs: Modelled CO₂ and CH₄ fluxes (*_Model_EC_long.csv, _pred_30min_.csv) Seasonal and cumulative carbon balance summaries (Final_Cumulative_CO2_CH4_CO2eq_2023_2024_bySeason_Method_Station.csv, Summary_CO2_CH4_CO2eq_byMethod_Station_Season_Year.csv) Mean and confidence-interval tables for each gas (PerGas_CO2_CH4_with_CO2eq_Mg_ha_mean95CI.csv, Totals_CO2eq_across_gases_Mg_ha_mean95CI.csv) Publication figures (figure.png, figure_transparent.png, figure.svg) 2. Tower Flux (Eddy-Covariance) Workflow The folder golden/ contains the workflow used for tower-based fluxes, including gap-filling, uncertainty analysis, and manuscript-quality visualization. These scripts use the REddyProc R package and standard meteorological variables. Scripts: REddyProc_Guatavita_Station1_Gold.R – Gap-filling for Station 1 REddyProc_Guatavita_Station2_Gold.R – Gap-filling for Station 2 Guatavita_gapfilling_uncertainty.R – Quantifies gap-filling uncertainty Guatavita_plot_manuscript.R – Generates final tower flux figures Each station’s eddy-covariance data were processed independently following standard u-star filtering and uncertainty propagation routines. Data Files Input data include chamber fluxes (co2flux.csv, ch4flux.csv, db_gutavita_N2O_all.csv), water-table and hydrological measurements (WaterTable.csv, wtd_martos_21_25.csv), DOC transport (DOC transport.csv), and auxiliary meteorological variables (tower_var.csv). Intermediate model results are stored in .rds files, and cumulative or seasonal summaries are provided in .csv and .xlsx formats. Reproducibility Notes All scripts assume relative paths from the project root. To reproduce the complete analyses: Install required R packages (tidyverse, ggplot2, rjags, coda, REddyProc, among others). Run the chamber workflow in the order listed above. Repeat the Bayesian modeling step for both stations. Execute the tower scripts in the golden/ folder for gap-filling and visualization. Large intermediate .rds files are retained for reproducibility and should not be deleted unless re-running the models from scratch. Citation and Contact Principal Investigator: Juan C. Benavides, Pontificia Universidad Javeriana, Bogotá, Colombia

  19. h

    Data from: A High Statistics Measurement of the Deuteron Structure Functions...

    • hepdata.net
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A High Statistics Measurement of the Deuteron Structure Functions F2 (X, $Q^2$) and R From Deep Inelastic Muon Scattering at High $Q^2$ [Dataset]. http://doi.org/10.17182/hepdata.6191.v1
    Explore at:
    Description

    CERN-SPS. NA4/BCDMS Collaboration. Plab 120 - 280 GeV/c. These are data from the BCDMS Collaboration on F2 and R=SIG(L)/SIG(T) with a deuterium target. The ranges of X,Q**2 are 0.06& lt;X& lt;0.8 and 8& lt;Q**2& lt;260 GeV**2. CR.= The publication lists values of F2 corresponding to R=0 and R=R(QCD) at each of the three energies, 120, 200 and 280 GeV. As well as the statistical errors also given are 5 factors representing the effects of estimated systematic errors on F2 associated with (1) beam momentum calibration, (2) magnetic field calibration, (3) spectrometer resolution, (4) detector and trigger inefficiencies, and (5) relative normalization uncertainty of data taken from external and internal targets. This record contains our attempt to merge these data at different energies using the statistical errors as weight factors. The final one-sigma systematic errors given here have been calculated using a prescription from the authors involving calculation of new merged F2 values for each of the systematic errors applied individually, and the combining in quadrature the differences in the new merged F2 values and the original F2. The individual F2 values at each energy are given in separate database records. Plab 120 GeV/c. These are data from the BCDMS Collaboration on F2 and R=SIG(L)/SIG(T) with a deuterium target. The ranges of X,Q**2 are 0.06& lt;X& lt;0.8 and 8& lt;Q**2& lt;260 GeV**2. CR.= The publication lists values of F2 corresponding to R=0 and R=R(QCD) at each of the three energies, 120, 200 and 280 GeV. As well as the statistical errors also given are 5 factors representing the effects of estimated systematic errors on F2 associated with (1) beam momentum calibration, (2) magnetic field calibration, (3) spectrometer resolution, (4) detector and trigger inefficiencies, and (5) relative normalization uncertainty of data taken from external and internal targets. The systematic error shown in the tables is a result of combining together the 5 individual errors according to a prescription provided by the authors. The method involves taking the quadratic sum of the errors from each source. Plab 200 GeV/c. These are data from the BCDMS Collaboration on F2 and R=SIG(L)/SIG(T) with a deuterium target. The ranges of X,Q**2 are 0.06& lt;X& lt;0.8 and 8& lt;Q**2& lt;260 GeV**2. CR.= The publication lists values of F2 corresponding to R=0 and R=R(QCD) at each of the three energies, 120, 200 and 280 GeV. As well as the statistical errors also given are 5 factors representing the effects of estimated systematic errors on F2 associated with (1) beam momentum calibration, (2) magnetic field calibration, (3) spectrometer resolution, (4) detector and trigger inefficiencies, and (5) relative normalization uncertainty of data taken from external and internal targets. The systematic error shown in the tables is a result of combining together the 5 individual errors according to a prescription provided by the authors. This method involves taking the quadratic sum of the errors from each source. Plab 280 GeV/c. These are data from the BCDMS Collaboration on F2 and R=SIG(L)/SIG(T) with a deuterium target. The ranges of X,Q**2 are 0.06& lt;X& lt;0.8 and 8& lt;Q**2& lt;260 GeV**2. CR.= The publication lists values of F2 corresponding to R=0 and R=R(QCD) at each of the three energies, 120, 200 and 280 GeV. As well as the statistical errors also given are 5 factors representing the effects of estimated systematic errors on F2 associated with (1) beam momentum calibration, (2) magnetic field calibration, (3) spectrometer resolution, (4) detector and trigger inefficiencies, and (5) relative normalization uncertainty of data taken from external and internal targets. The systematic error shown in the tables is a result of combining together the 5 individual errors according to a prescription provided by the authors. This method involves taking the quadratic sum of the errors from each source.

  20. BRAINTEASER ALS and MS Datasets

    • data.europa.eu
    • data.niaid.nih.gov
    unknown
    Updated Jul 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). BRAINTEASER ALS and MS Datasets [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-14857741?locale=lv
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    Description

    BRAINTEASER (Bringing Artificial Intelligence home for a better care of amyotrophic lateral sclerosis and multiple sclerosis) is a data science project that seeks to exploit the value of big data, including those related to health, lifestyle habits, and environment, to support patients with Amyotrophic Lateral Sclerosis (ALS) and Multiple Sclerosis (MS) and their clinicians. Taking advantage of cost-efficient sensors and apps, BRAINTEASER will integrate large, clinical datasets that host both patient-generated and environmental data. As part of its activities, BRAINTEASER organized three open evaluation challenges on Intelligent Disease Progression Prediction (iDPP), iDPP@CLEF 2022, iDPP@CLEF 2023, and iDPP@CLEF 2024 co-located with the Conference and Labs of the Evaluation Forum (CLEF). The goal of iDPP@CLEF is to design and develop an evaluation infrastructure for AI algorithms able to: better describe disease mechanisms; stratify patients according to their phenotype assessed all over the disease evolution; predict disease progression in a probabilistic, time-dependent fashion. The iDPP@CLEF challenges relied on retrospective and prospective ALS and MS patient data made available by the clinical partners of the BRAINTEASER consortium. Retrospective Dataset We release three retrospective datasets, one for ALS and two for MS. The two retrospective MS datasets, one consisting of clinical data only and one with clinical data and environmental/pollution data. The retrospective datasets contain data about 2,204 ALS patients (static variables, ALSFRS-R questionnaires, spirometry tests, environmental/pollution data) and 1,792 MS patients (static variables, EDSS scores, evoked potentials, relapses, MRIs). A subset of 280 MS patients contains environmental and pollution data. More in detail, the BRAINTEASER project retrospective datasets were derived from the merging of already existing datasets obtained by the clinical centers involved in the BRAINTEASER Project. The ALS dataset was obtained by the merge and homogenisation of the Piemonte and Valle d’Aosta Registry for Amyotrophic Lateral Sclerosis (PARALS, Chiò et al., 2017) and the Lisbon ALS clinic (CENTRO ACADÉMICO DE MEDICINA DE LISBOA, Centro Hospitalar Universitário de Lisboa-Norte, Hospital de Santa Maria, Lisbon, Portugal,) dataset. Both datasets were initiated in 1995 and are currently maintained by researchers of the ALS Regional Expert Centre (CRESLA), University of Turin, and of the CENTRO ACADÉMICO DE MEDICINA DE LISBOA-Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa. They include demographic and clinical data, comprehending both static and dynamic variables. The MS dataset was obtained from the Pavia MS clinical dataset, which was started in 1990 and contains demographic and clinical information that is continuously updated by the researchers of the Institute and the Turin MS clinic dataset (Department of Neurosciences and Mental Health, Neurology Unit 1, Città della Salute e della Scienza di Torino. Retrospective environmental data are accessible at various scales at the individual subject level. Thus, environmental data have been retrieved at different scales: To gather macroscale air pollution data we’ve leveraged data coming from public monitoring stations that cover the whole extension of the involved countries, namely the European Air Quality Portal; data from a network of air quality sensors (PurpleAir - Outdoor Air Quality Monitor / PurpleAir PA-II) installed in different points of the city of Pavia (Italy) were extracted as well. In both cases, environmental data were previously publicly available. In order to merge environmental data with individual subject locations we leverage postcodes (postcodes of the station for the pollutant detection and postcodes of subject address). Data were merged following an anonymization procedure based on hash keys. Environmental exposure trajectories have been pre-processed and aggregated in order to avoid fine temporal and spatial granularities. Thus, individual exposure information could not disclose personal addresses. The retrospective datasets are shared in two formats: RDF (serialized in Turtle) modeled according to the BRAINTEASER Ontology (BTO); CSV, as shared during the iDPP@CLEF 2022 and 2023 challenges, split into training and test. Each format corresponds to a specific folder in the datasets, where a dedicated README file provides further details on the datasets. Note that the ALS dataset is split into multiple ZIP files due to the size of the environmental data. Prospective Dataset For the iDPP@CLEF 2024 challenge, the datasets contain prospective data about 86 ALS patients (static variables, ALSFRS-R questionnaires compiled by clinicians or patients using the BRAINTEASER mobile application, sensors data). The prospective datasets are shared in two formats: RDF (serialized in Turtle) modeled according to the BRAINTEASER Ontology (BTO); CSV, as shared durin

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Patrick VINCOURT (2025). Merger of BNV-D data (2008 to 2019) and enrichment [Dataset]. https://data.europa.eu/data/datasets/5f1c3eca9d149439e50c740f?locale=en

Merger of BNV-D data (2008 to 2019) and enrichment

Explore at:
zip(18530465)Available download formats
Dataset updated
Jan 16, 2025
Dataset authored and provided by
Patrick VINCOURT
Description

Merging (in Table R) data published on https://www.data.gouv.fr/fr/datasets/ventes-de-pesticides-par-departement/, and joining two other sources of information associated with MAs: — uses: https://www.data.gouv.fr/fr/datasets/usages-des-produits-phytosanitaires/ — information on the “Biocontrol” status of the product, from document DGAL/SDQSPV/2020-784 published on 18/12/2020 at https://agriculture.gouv.fr/quest-ce-que-le-biocontrole

All the initial files (.csv transformed into.txt), the R code used to merge data and different output files are collected in a zip. enter image description here NB: 1) “YASCUB” for {year,AMM,Substance_active,Classification,Usage,Statut_“BioConttrol”}, substances not on the DGAL/SDQSPV list being coded NA. 2) The file of biocontrol products shall be cleaned from the duplicates generated by the marketing authorisations leading to several trade names.
3) The BNVD_BioC_DY3 table and the output file BNVD_BioC_DY3.txt contain the fields {Code_Region,Region,Dept,Code_Dept,Anne,Usage,Classification,Type_BioC,Quantite_substance)}

Search
Clear search
Close search
Google apps
Main menu