100+ datasets found
  1. Startup Performance Analysis

    • kaggle.com
    Updated Aug 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SanskratiiGupta (2025). Startup Performance Analysis [Dataset]. https://www.kaggle.com/datasets/sanskratiigupta/startup-performance-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SanskratiiGupta
    Description

    Dataset

    This dataset was created by SanskratiiGupta

    Contents

  2. Data from: Delta Food Outlets Study

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Delta Food Outlets Study [Dataset]. https://catalog.data.gov/dataset/delta-food-outlets-study-2786d
    Explore at:
    Dataset updated
    May 8, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The Delta Food Outlets Study was an observational study designed to assess the nutritional environments of 5 towns located in the Lower Mississippi Delta region of Mississippi. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns in which Delta Healthy Sprouts participants resided and that contained at least one convenience (corner) store, grocery store, or gas station. Data were collected via electronic surveys between March 2016 and September 2018 using the Nutrition Environment Measures Survey (NEMS) tools. Survey scores for the NEMS Corner Store, NEMS Grocery Store, and NEMS Restaurant were computed using modified scoring algorithms provided for these tools via SAS software programming. Because the towns were not randomly selected and the sample sizes are relatively small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one (NEMS-C) contains data collected with the NEMS Corner (convenience) Store tool. Dataset two (NEMS-G) contains data collected with the NEMS Grocery Store tool. Dataset three (NEMS-R) contains data collected with the NEMS Restaurant tool. Resources in this dataset:Resource Title: Delta Food Outlets Data Dictionary. File Name: DFO_DataDictionary_Public.csvResource Description: This file contains the data dictionary for all 3 datasets that are part of the Delta Food Outlets Study.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One NEMS-C. File Name: NEMS-C Data.csvResource Description: This file contains data collected with the Nutrition Environment Measures Survey (NEMS) tool for convenience stores.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two NEMS-G. File Name: NEMS-G Data.csvResource Description: This file contains data collected with the Nutrition Environment Measures Survey (NEMS) tool for grocery stores.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three NEMS-R. File Name: NEMS-R Data.csvResource Description: This file contains data collected with the Nutrition Environment Measures Survey (NEMS) tool for restaurants.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  3. p

    Trends in Diversity Score (1997-2007): The Excel Charter School vs. Colorado...

    • publicschoolreview.com
    Updated Oct 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2025). Trends in Diversity Score (1997-2007): The Excel Charter School vs. Colorado vs. Durango School District No. 9-R [Dataset]. https://www.publicschoolreview.com/the-excel-charter-school-profile
    Explore at:
    Dataset updated
    Oct 26, 2025
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Durango School District 9-R
    Description

    This dataset tracks annual diversity score from 1997 to 2007 for The Excel Charter School vs. Colorado and Durango School District No. 9-R

  4. BlinkIT Grocery Sales Dataset (Excel)

    • kaggle.com
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lavudya Swamy (2025). BlinkIT Grocery Sales Dataset (Excel) [Dataset]. http://doi.org/10.34740/kaggle/dsv/11490905
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Lavudya Swamy
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    his dataset contains transactional grocery data from BlinkIT, a grocery delivery platform. It includes product names, categories, prices, units sold, and potentially order or date-based features (depending on the columns in the file

  5. Google Certificate BellaBeats Capstone Project

    • kaggle.com
    zip
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Porzelius (2023). Google Certificate BellaBeats Capstone Project [Dataset]. https://www.kaggle.com/datasets/jasonporzelius/google-certificate-bellabeats-capstone-project
    Explore at:
    zip(169161 bytes)Available download formats
    Dataset updated
    Jan 5, 2023
    Authors
    Jason Porzelius
    Description

    Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.

    Section 1 - Ask:

    A. Guiding Questions:
    1. Who are the key stakeholders and what are their goals for the data analysis project? 2. What is the business task that this data analysis project is attempting to solve?

    B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -UrŔka SrŔen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.

    1. Identify the business task. *The business task is: -As provided by co-founder UrŔka SrŔen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.

    Section 2 - Prepare:

    A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?

    B. Key Tasks:

    1. Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
      *Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDay_merged.csv -dailyActivity_merged.csv

    2. Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...

  6. Data from: Student Academic Performance Dataset

    • kaggle.com
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hackathon data (2025). Student Academic Performance Dataset [Dataset]. https://www.kaggle.com/datasets/aryancodes12fyds/student-academic-performance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hackathon data
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    šŸ“˜ Description

    The Student Academic Performance Dataset contains detailed academic and lifestyle information of 250 students, created to analyze how various factors — such as study hours, sleep, attendance, stress, and social media usage — influence their overall academic outcomes and GPA.

    This dataset is synthetic but realistic, carefully generated to reflect believable academic patterns and relationships. It’s perfect for learning data analysis, statistics, and visualization using Excel, Python, or R.

    The data includes 12 attributes, primarily numerical, ensuring that it’s suitable for a wide range of analytical tasks — from basic descriptive statistics (mean, median, SD) to correlation and regression analysis.

    šŸ“Š Key Features

    🧮 250 rows and 12 columns

    šŸ’” Mostly numerical — great for Excel-based statistical functions

    šŸ” No missing values — ready for direct use

    šŸ“ˆ Balanced and realistic — ideal for clear visualizations and trend analysis

    šŸŽÆ Suitable for:

    Descriptive statistics

    Correlation & regression

    Data visualization projects

    Dashboard creation (Excel, Tableau, Power BI)

    šŸ’” Possible Insights to Explore

    How do study hours impact GPA?

    Is there a relationship between stress levels and performance?

    Does social media usage reduce study efficiency?

    Do students with higher attendance achieve better grades?

    āš™ļø Data Generation Details

    Each record represents a unique student.

    GPA is calculated using a weighted formula based on midterm and final scores.

    Relationships are designed to be realistic — for example:

    Higher study hours → higher scores and GPA

    Higher stress → slightly lower sleep hours

    Excessive social media time → reduced academic performance

    āš ļø Disclaimer

    This dataset is synthetically generated using statistical modeling techniques and does not contain any real student data. It is intended purely for educational, analytical, and research purposes.

  7. p

    Trends in Student-Teacher Ratio (1997-2007): The Excel Charter School vs....

    • publicschoolreview.com
    Updated Oct 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2025). Trends in Student-Teacher Ratio (1997-2007): The Excel Charter School vs. Colorado vs. Durango School District No. 9-R [Dataset]. https://www.publicschoolreview.com/the-excel-charter-school-profile
    Explore at:
    Dataset updated
    Oct 26, 2025
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Durango School District 9-R
    Description

    This dataset tracks annual student-teacher ratio from 1997 to 2007 for The Excel Charter School vs. Colorado and Durango School District No. 9-R

  8. s

    Unlocking Data to Inform Public Health Policy and Practice: WP1 Mapping...

    • orda.shef.ac.uk
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Clowes; Anthea Sutton; Tony Stone; Matthew Franklin (2023). Unlocking Data to Inform Public Health Policy and Practice: WP1 Mapping Review Supplementary Excel S1 [Dataset]. http://doi.org/10.15131/shef.data.21222272.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    The University of Sheffield
    Authors
    Mark Clowes; Anthea Sutton; Tony Stone; Matthew Franklin
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Unlocking Data to Inform Public Health Policy and Practice: WP1 Mapping Review Supplementary Excel S1
    The data extracted into Excel Tab "S1 Case studies (extracted)" represents information from 31 case studies as part of the "Unlocking Data to Inform Public Health Policy and Practice" project, Workpackage (WP) 1 Mapping Review. Details about the WP1 mapping review can be found in the "Unlocking Data to Inform Public Health Policy and Practice" project report, which can be found via this DOI link: https://doi.org/10.15131/shef.data.21221606

  9. Google Ads sales dataset

    • kaggle.com
    Updated Jul 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NayakGanesh007 (2025). Google Ads sales dataset [Dataset]. https://www.kaggle.com/datasets/nayakganesh007/google-ads-sales-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 22, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    NayakGanesh007
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Google Ads Sales Dataset for Data Analytics Campaigns (Raw & Uncleaned) šŸ“ Dataset Overview This dataset contains raw, uncleaned advertising data from a simulated Google Ads campaign promoting data analytics courses and services. It closely mimics what real digital marketers and analysts would encounter when working with exported campaign data — including typos, formatting issues, missing values, and inconsistencies.

    It is ideal for practicing:

    Data cleaning

    Exploratory Data Analysis (EDA)

    Marketing analytics

    Campaign performance insights

    Dashboard creation using tools like Excel, Python, or Power BI

    šŸ“ Columns in the Dataset Column Name ----- -Description Ad_ID --------Unique ID of the ad campaign Campaign_Name ------Name of the campaign (with typos and variations) Clicks --Number of clicks received Impressions --Number of ad impressions Cost --Total cost of the ad (in ₹ or $ format with missing values) Leads ---Number of leads generated Conversions ----Number of actual conversions (signups, sales, etc.) Conversion Rate ---Calculated conversion rate (Conversions Ć· Clicks) Sale_Amount ---Revenue generated from the conversions Ad_Date------ Date of the ad activity (in inconsistent formats like YYYY/MM/DD, DD-MM-YY) Location ------------City where the ad was served (includes spelling/case variations) Device------------ Device type (Mobile, Desktop, Tablet with mixed casing) Keyword ----------Keyword that triggered the ad (with typos)

    āš ļø Data Quality Issues (Intentional) This dataset was intentionally left raw and uncleaned to reflect real-world messiness, such as:

    Inconsistent date formats

    Spelling errors (e.g., "analitics", "anaytics")

    Duplicate rows

    Mixed units and symbols in cost/revenue columns

    Missing values

    Irregular casing in categorical fields (e.g., "mobile", "Mobile", "MOBILE")

    šŸŽÆ Use Cases Data cleaning exercises in Python (Pandas), R, Excel

    Data preprocessing for machine learning

    Campaign performance analysis

    Conversion optimization tracking

    Building dashboards in Power BI, Tableau, or Looker

    šŸ’” Sample Analysis Ideas Track campaign cost vs. return (ROI)

    Analyze click-through rates (CTR) by device or location

    Clean and standardize campaign names and keywords

    Investigate keyword performance vs. conversions

    šŸ”– Tags Digital Marketing Ā· Google Ads Ā· Marketing Analytics Ā· Data Cleaning Ā· Pandas Practice Ā· Business Analytics Ā· CRM Data

  10. g

    HUN AWRA-R simulation nodes v01 | gimi9.com

    • gimi9.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HUN AWRA-R simulation nodes v01 | gimi9.com [Dataset]. https://gimi9.com/dataset/au_fda20928-d486-49d2-b362-e860c1918b06/
    Explore at:
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract The dataset was derived by the Bioregional Assessment Programme from multiple datasets. The source dataset is identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. The dataset consists of an excel spreadsheet and shapefile representing the locations of simulation nodes used in the AWRA-R model. Some of the nodes correspond to gauging station locations or dam locations whereas other locations represent river confluences or catchment outlets which have no gauging. These are marked as "Dummy". ## Purpose Locations are used as pour points in oder to define reach areas for river system modelling. ## Dataset History Subset of data for the Hunter that was extracted from the Bureau of Meteorology's hydstra system and includes all gauges where data has been received from the lead water agency of each jurisdiction. Simulation nodes were added in locations in which the model will provide simulated streamflow. There are 3 files that have been extracted from the Hydstra database to aid in identifying sites in each bioregion and the type of data collected from each on. These data were used to determine the simulation node locations where model outputs were generated. The 3 files contained within the source dataset used for this determination are: Site - lists all sites available in Hydstra from data providers. The data provider is listed in the #Station as _xxx. For example, sites in NSW are _77, QLD are _66. Some sites do not have locational information and will not be able to be plotted. Period - the period table lists all the variables that are recorded at each site and the period of record. Variable - the variable table shows variable codes and names which can be linked to the period table. Relevant location information and other data were extracted to construct the spreadsheet and shapefile within this dataset. ## Dataset Citation Bioregional Assessment Programme (XXXX) HUN AWRA-R simulation nodes v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/fda20928-d486-49d2-b362-e860c1918b06. ## Dataset Ancestors * Derived From National Surface Water sites Hydstra

  11. d

    R script that creates a wrapper function to automate the generation of...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). R script that creates a wrapper function to automate the generation of boxplots of change factors for all Florida HUC-8 basins (basin_boxplot.R) [Dataset]. https://catalog.data.gov/dataset/r-script-that-creates-a-wrapper-function-to-automate-the-generation-of-boxplots-of-change-
    Explore at:
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    The Florida Flood Hub for Applied Research and Innovation and the U.S. Geological Survey have developed projected future change factors for precipitation depth-duration-frequency (DDF) curves at 242 National Oceanic and Atmospheric Administration (NOAA) Atlas 14 stations in Florida. The change factors were computed as the ratio of projected future to historical extreme-precipitation depths fitted to extreme-precipitation data from downscaled climate datasets using a constrained maximum likelihood (CML) approach as described in https://doi.org/10.3133/sir20225093. The change factors correspond to the periods 2020-59 (centered in the year 2040) and 2050-89 (centered in the year 2070) as compared to the 1966-2005 historical period. An R script (basin_boxplot.R) is provided as an example on how to create a wrapper function that will automate the generation of boxplots of change factors for all Florida HUC-8 basins. The wrapper script sources the file create_boxplot.R and calls the function create_boxplot() one Florida basin at a time to create a figure with boxplots of change factors for all durations (1, 3, and 7 days) and return periods (5, 10, 25, 50, 100, 200, and 500 years) evaluated as part of this project. An example is also provided in the code that shows how to generate a figure showing boxplots of change factors for a single duration and return period. A Microsoft Word file documenting code usage is also provided within this data release (Documentation_R_script_create_boxplot.docx). As described in the documentation, the R script relies on some of the Microsoft Excel spreadsheets published as part of this data release. The script uses HUC-8 basins defined in the "Florida Hydrologic Unit Code (HUC) Basins (areas)" from the Florida Department of Environmental Protection (FDEP; https://geodata.dep.state.fl.us/datasets/FDEP::florida-hydrologic-unit-code-huc-basins-areas/explore) and their names are listed in the file basins_list.txt provided with the script. County names are listed in the file counties_list.txt provided with the script. NOAA Atlas 14 stations located in each Florida basin or county are defined in the Microsoft Excel spreadsheet Datasets_station_information.xlsx which is part of this data release. Instructions are provided in code documentation (see highlighted text on page 7 of Documentation_R_script_create_boxplot.docx) so that users can modify the script to generate boxplots for basins different from the FDEP "Florida Hydrologic Unit Code (HUC) Basins (areas)."

  12. u

    HLY-02-03 Mesozooplankton Grazing Rates (Excel) [Campbell, R. and C....

    • data.ucar.edu
    • search.dataone.org
    • +2more
    excel
    Updated Oct 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carin J. Ashjian; Robert G. Campbell (2025). HLY-02-03 Mesozooplankton Grazing Rates (Excel) [Campbell, R. and C. Ashjian] [Dataset]. http://doi.org/10.5065/D6HH6H5G
    Explore at:
    excelAvailable download formats
    Dataset updated
    Oct 7, 2025
    Authors
    Carin J. Ashjian; Robert G. Campbell
    Time period covered
    Jul 18, 2002 - Aug 21, 2002
    Area covered
    Description

    This data set contains mesozooplankton grazing rates measured in ship-board incubations conducted during the SBI U.S. Coast Guard Cutter (USCGC) Healy Process cruises. Each data set presents individual bottle measurements of clearance and ingestion rates for each species / stage for each experiment, as ml/individual/hr and ng chlorophyll a/individual/hr, respectively. Station number, station name, experiment number, date, position (latitude, longitude), bottom depth, and initial chlorophyll a concentration are presented. These data are in Excel format.

  13. Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.ā€œdictionary_drug_codes.csvā€ contains the dictionary for descriptors on the drugs codes.ā€œnhanes_inconsistencies_documentation.xlsxā€ is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.ā€œw - nhanes_1988_2018.RDataā€ contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.ā€œm - nhanes_1988_2018.Rā€ shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.ā€œexample_0 - merge_datasets_together.Rmdā€ demonstrates how to merge the curated NHANES datasets together.ā€œexample_1 - account_for_nhanes_design.Rmdā€ demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.ā€œexample_2 - calculate_summary_statistics.Rmdā€ demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.ā€œexample_3 - run_multiple_regressions.Rmdā€ demonstrates how run multiple regression models with and without adjusting for the sampling design.

  14. u

    HLY-04-02 Mesozooplankton Grazing Rates (Excel) [Campbell, R. and C....

    • data.ucar.edu
    • search.dataone.org
    • +2more
    excel
    Updated Oct 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carin J. Ashjian; Robert G. Campbell (2025). HLY-04-02 Mesozooplankton Grazing Rates (Excel) [Campbell, R. and C. Ashjian] [Dataset]. http://doi.org/10.5065/D6N877WS
    Explore at:
    excelAvailable download formats
    Dataset updated
    Oct 7, 2025
    Authors
    Carin J. Ashjian; Robert G. Campbell
    Time period covered
    May 15, 2004 - Jun 23, 2004
    Area covered
    Description

    This data set contains mesozooplankton grazing rates measured in ship-board incubations conducted during the SBI process cruises. Each data set presents individual bottle measurements of clearance and ingestion rates for each species/stage for each experiment, as ml/individual/hr and ng chlorophyll a/individual/hr, respectively. Station number, station name, experiment number, date, position (latitude, longitude), bottom depth, and initial chlorophyll a concentration are presented. These data were collected aboard the U.S. Coast Guard Cutter (USCGC) Healy. These data are in Excel format.

  15. Petre_Slide_CategoricalScatterplotFigShare.pptx

    • figshare.com
    pptx
    Updated Sep 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
    Explore at:
    pptxAvailable download formats
    Dataset updated
    Sep 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Benj Petre; Aurore Coince; Sophien Kamoun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Categorical scatterplots with R for biologists: a step-by-step guide

    Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

    1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

    Weissgerber and colleagues (2015) recently stated that ā€˜as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ā€˜allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

    Protocol

    • Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ā€˜Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ā€˜Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ā€˜Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ā€˜File Format’, select .csv). This .csv file is the input file to import in R.

    • Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

    • Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

    Notes

    • Note 1: install the ggplot2 package. The R script requires the package ā€˜ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ā€˜ggplot2’ in the Package Search space and click on ā€˜Get List’. Select ā€˜ggplot2’ in the Package column and click on ā€˜Install Selected’. Install all dependencies as well.

    • Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

    7 Display the graph in a separate window. Dot colors indicate

    replicates

    graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

    References

    Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

    Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

    Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

    https://cran.r-project.org/

    http://ggplot2.org/

  16. Data set: St. Louis River Watershed, MN Conductivity Assessment March 2022

    • catalog.data.gov
    • datasets.ai
    Updated Jul 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2025). Data set: St. Louis River Watershed, MN Conductivity Assessment March 2022 [Dataset]. https://catalog.data.gov/dataset/data-set-st-louis-river-watershed-mn-conductivity-assessment-march-2022
    Explore at:
    Dataset updated
    Jul 18, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Saint Louis River, Minnesota
    Description

    Data used to evaluate potential downstream impacts of the NorthMet Mine, by USEPA Office of Research and Development is providing, for USEPA Region 5’s use, including a characterization of stream specific conductivity (SC) levels, least disturbed background SC, and SC levels that may exceed the Fond du Lac Band’s WQ standards and adversely affect aquatic life, including brook trout (Salvelinus fontinalis), lake sturgeon (Acipenser fulvescens), and benthic macroinvertebrates. Keywords: Conductivity, St. Louis River, benthic invertebrates; mining The attached Excel Pedigree includes: _Datasets: Data file uploaded to EPA Science Hub and/or Environmental Data Set Gateway _R : Clean R scripts used to generate document figures and tables _Tables_Figures: Files generated from R script and used in the Region 5 memo 20220325 R Code and Data: All additional files used for this project, including original files, intermediate files, extra output files, and extra functions. The "_R" folder contains four subfolders. Each subfolder has several R scripts, input and output files, and an R project file. Users can run R scripts directly from each subfolder by installing R, RStudio, and associated R packages. Data Dictionary: See tab DataDictionary in Excel file Datasets: Simplified language is used in the text to identify parent data sets. Source and File names are retained in this pedigree in original form to enable R-scripts to retain functionality. • Thingvold et al. (1975-1977) • Griffith (1998-2009) • Predicted background (2000-2015) • Water Quality Portal (1996-2021) • Water Quality Portal Less Disturbed (1996-2021) • Minnesota Pollution Control Agency (MPCA) (1996-2013) • Mid-Atlantic Highlands (1990 to 2014). This dataset is associated with the following publication: Cormier, S., and Y. Wang. Appendix C: ORD Specific Conductance Memo, from Susan Cormier to Tera Fong. March 15, 2022. Assessment of effects of increased ion concentrations in the St. Louis River Watershed with special attention to potential mining influence and the jurisdiction of the Fond du Lac Band of Lake Superior Chippewa. U.S. Environmental Protection Agency, Washington, DC, USA, 2022.

  17. f

    Dataset for social support paper in Excel format.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jul 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walters, Renee; McKenzie, Joette A.; Ferguson, Trevor S.; Williams, David R.; Francis, Damian K.; Bennett, Nadia R.; Tulloch-Reid, Marshall K.; Govia, Ishtar; Wilks, Rainford J.; McFarlane, Shelly R.; Blake, Alphanso L.; Younger-Coleman, Novie O. (2024). Dataset for social support paper in Excel format. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001386095
    Explore at:
    Dataset updated
    Jul 30, 2024
    Authors
    Walters, Renee; McKenzie, Joette A.; Ferguson, Trevor S.; Williams, David R.; Francis, Damian K.; Bennett, Nadia R.; Tulloch-Reid, Marshall K.; Govia, Ishtar; Wilks, Rainford J.; McFarlane, Shelly R.; Blake, Alphanso L.; Younger-Coleman, Novie O.
    Description

    Recent studies have suggested that high levels of social support can encourage better health behaviours and result in improved cardiovascular health. In this study we evaluated the association between social support and ideal cardiovascular health among urban Jamaicans. We conducted a cross-sectional study among urban residents in Jamaica’s south-east health region. Socio-demographic data and information on cigarette smoking, physical activity, dietary practices, blood pressure, body size, cholesterol, and glucose, were collected by trained personnel. The outcome variable, ideal cardiovascular health, was defined as having optimal levels of ≄5 of these characteristics (ICH-5) according to the American Heart Association definitions. Social support exposure variables included number of friends (network size), number of friends willing to provide loans (instrumental support) and number of friends providing advice (informational support). Principal component analysis was used to create a social support score using these three variables. Survey-weighted logistic regression models were used to evaluate the association between ICH-5 and social support score. Analyses included 841 participants (279 males, 562 females) with mean age of 47.6 ± 18.42 years. ICH-5 prevalence was 26.6% (95%CI 22.3, 31.0) with no significant sex difference (male 27.5%, female 25.7%). In sex-specific, multivariable logistic regression models, social support score, was inversely associated with ICH-5 among males (OR 0.67 [95%CI 0.51, 0.89], p = 0.006) but directly associated among females (OR 1.26 [95%CI 1.04, 1.53], p = 0.020) after adjusting for age and community SES. Living in poorer communities was also significantly associated with higher odds of ICH-5 among males, while living communities with high property value was associated with higher odds of ICH among females. In this study, higher level of social support was associated with better cardiovascular health among women, but poorer cardiovascular health among men in urban Jamaica. Further research should explore these associations and identify appropriate interventions to promote cardiovascular health.

  18. Excel spreadsheet on Natural Gas Aquisition Program

    • figshare.com
    xlsx
    Updated Jan 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xuan hong Ong (2016). Excel spreadsheet on Natural Gas Aquisition Program [Dataset]. http://doi.org/10.6084/m9.figshare.1477009.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 20, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Xuan hong Ong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Download the Excel spreadsheet on Natural Gas Aquisition Program here: https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx (original data source: http://catalog.data.gov/dataset/natural-gas-acquisition-program)

  19. S

    Geographical Distribution and Climate Data of Mikania micrantha in China

    • scidb.cn
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    谢昄平 (2025). Geographical Distribution and Climate Data of Mikania micrantha in China [Dataset]. http://doi.org/10.57760/sciencedb.19837
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    Science Data Bank
    Authors
    谢昄平
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    This dataset contains the geographical distribution and climate data for *Mikania micrantha , focusing on its presence across regions in Fujian, Guangdong, and Hainan provinces of China. The dataset includes geographical coordinates (longitude and latitude), monthly climate data (minimum and maximum temperature, and precipitation) across different months, as well as bioclimatic variables based on the WorldClim dataset.Temporal and Spatial Information* The data covers long-term climate information, with monthly data for each location recorded over a 12-month period (January to December). The dataset includes spatial data in terms of longitude and latitude, corresponding to various locations where *Mikania micrantha * populations are present. The spatial resolution is specific to each point location, and the temporal resolution reflects the monthly climate data for each year.**Data Structure and Units** The dataset consists of 205 records, each representing a unique location with corresponding climate and geographical data. The table includes the following columns: 1. No.: Unique identifier for each data record 2. Longitude: Geographic longitude in decimal degrees 3. Latitude: Geographic latitude in decimal degrees 4. tmin1 to tmin12: Minimum temperature (°C) for each month (January to December) 5. tmax1 to tmax12: Maximum temperature (°C) for each month (January to December) 6. prec1 to prec12: Precipitation (mm) for each month (January to December) 7. bio1 to bio19: Bioclimatic variables (e.g., annual mean temperature, temperature seasonality, precipitation, etc.) derived from WorldClim data (unit varies depending on the variable)The units for each measurement are as follows: - Temperature: Degrees Celsius (°C) - Precipitation: Millimeters (mm) - Bioclimatic variables: Varies depending on the specific variable (e.g., °C, mm)**File Format and Software Compatibility** The dataset is provided in CSV format for ease of use and compatibility with various data analysis tools. It can be opened and processed using software such as Microsoft Excel, R, or Python (with Pandas). Users can download the dataset and work with it in software such as R (https://cran.r-project.org/) or Python (https://www.python.org/). The dataset is compatible with any software that supports CSV files.This dataset provides valuable information for research related to the geographical distribution and climate preferences of *Mikania micrantha * and can be used to inform invasive plant control strategies, ecological studies, and climate change modeling.

  20. SCOAPE R/V Point Sur Data - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). SCOAPE R/V Point Sur Data - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/scoape-r-v-point-sur-data-8fcd2
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    SCOAPE_RVPointSur_Data is the data collected from instruments onboard the University of Southern Mississippi’s Research Vessel (R/V) Point Sur during the Satellite Coastal and Oceanic Atmospheric Pollution Experiment (SCOAPE). Data was collected by sun photometers, ceilometers, aethalometers, anemometers, and pyranometers. Data collection for this product is complete.The Outer Continental Shelf Lands Act (OCSLA) requires the US Department of Interior Bureau of Ocean Energy Management (BOEM) to ensure compliance with the US National Ambient Air Quality Standard (NAAQS) so that Outer Continental Shelf (OCS) oil and natural gas (ONG) exploration, development, and production do not significantly impact the air quality of any US state. In 2017, BOEM and NASA entered into an interagency agreement to begin a study to scope out the feasibility of BOEM personnel using a suite of NASA and non-NASA resources to assess how pollutants from ONG exploration, development, and production activities affect air quality. An important activity of this interagency agreement was SCOAPE, a field deployment that took place in May 2019, that aimed to assess the capability of satellite observations for monitoring offshore air quality. The outcomes of the study are documented in two BOEM reports (Duncan, 2020; Thompson, 2020).To address BOEM’s goals, the SCOAPE science team conducted surface-based remote sensing and in-situ measurements, which enabled a systematic assessment of the application of satellite observations, primarily NO2, for monitoring air quality. The SCOAPE field measurements consisted of onshore ground sites, including in the vicinity of the Louisiana Universities Marine Consortium (LUMCON; Cocodrie, LA), as well as those from University of Southern Mississippi’s R/V Point Sur, which cruised in the Gulf of America from 10-18 May 2019. Based on the 2014 and 2017 BOEM emissions inventories as well as daily air quality and meteorological forecasts, the cruise track was designed to sample both areas with large oil drilling platforms and areas with dense small natural gas facilities. The R/V Point Sur was instrumented to carry out both remote sensing and in-situ measurements of NO2 and O3 along with in-situ CH4, CO2, CO, and VOC tracers which allowed detailed characterization of airmass type and emissions. In addition, there were also measurements of multi-wavelength AOD and black carbon as well as planetary boundary layer structure and meteorological variables, including surface temperature, humidity, and winds. A ship-based spectrometer instrument provided remotely-sensed total column amounts of NO2 and O3 for direct comparison with satellite measurements. Ozonesondes and radiosondes were also launched 1-3 times daily from the R/V Point Sur to provide O3 and meteorological vertical profile observations. The ground-based observations, primarily at LUMCON, included spectrometer-measured column NO2 and O3, in-situ NO2, VOCs, and planetary boundary layer structure. A NO2sonde was also mounted on a vehicle with the goal to detect pollution onshore from offshore ONG activities during onshore flow; data were collected along coastal Louisiana from Burns Point Park to Grand Isle to the tip of the Mississippi River delta. The in-situ measurements were reported in ICARTT files or Excel files. The remote sensing data are in either HDF or netCDF files.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SanskratiiGupta (2025). Startup Performance Analysis [Dataset]. https://www.kaggle.com/datasets/sanskratiigupta/startup-performance-analysis
Organization logo

Startup Performance Analysis

R Programming and Excel based data analysis

Explore at:
11 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SanskratiiGupta
Description

Dataset

This dataset was created by SanskratiiGupta

Contents

Search
Clear search
Close search
Google apps
Main menu