100+ datasets found
  1. Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  2. m

    Dataset of development of business during the COVID-19 crisis

    • data.mendeley.com
    • narcis.nl
    Updated Nov 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
    Explore at:
    Dataset updated
    Nov 9, 2020
    Authors
    Tatiana N. Litvinova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.

  3. E-commerce Dataset analysis by pivot table

    • kaggle.com
    zip
    Updated May 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radha Gandhi (2023). E-commerce Dataset analysis by pivot table [Dataset]. https://www.kaggle.com/datasets/radhagandhi/e-commerce-dataset-analysis-by-pivot-table
    Explore at:
    zip(176329 bytes)Available download formats
    Dataset updated
    May 26, 2023
    Authors
    Radha Gandhi
    Description

    I have been taking a data analysis course with Coding Invaders, and this module focuses on pivot table exercises. By completing this module, you will gain a good amount of confidence in using pivot tables.

    let's grow together

  4. 🌆 City Lifestyle Segmentation Dataset

    • kaggle.com
    zip
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UmutUygurr (2025). 🌆 City Lifestyle Segmentation Dataset [Dataset]. https://www.kaggle.com/datasets/umuttuygurr/city-lifestyle-segmentation-dataset
    Explore at:
    zip(11274 bytes)Available download formats
    Dataset updated
    Nov 15, 2025
    Authors
    UmutUygurr
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">

    🌆 About This Dataset

    This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.

    🎯 Perfect For:

    • 📊 K-Means, DBSCAN, Agglomerative Clustering
    • 🔬 PCA & t-SNE Dimensionality Reduction
    • 🗺️ Geospatial Visualization (Plotly, Folium)
    • 📈 Correlation Analysis & Feature Engineering
    • 🎓 Educational Projects (Beginner to Intermediate)

    📦 What's Inside?

    FeatureDescriptionRange
    10 FeaturesEconomic, environmental & social indicatorsRealistically scaled
    300 CitiesEurope, Asia, Americas, Africa, OceaniaDiverse distributions
    Strong CorrelationsIncome ↔ Rent (+0.8), Density ↔ Pollution (+0.6)ML-ready
    No Missing ValuesClean, preprocessed dataReady for analysis
    4-5 Natural ClustersMetropolitan hubs, eco-towns, developing centersPre-validated

    🔥 Key Features

    Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
    Regional Diversity: Each region has distinct economic and environmental characteristics
    Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
    Beginner-Friendly: No data cleaning required, includes example code
    Documented: Comprehensive README with methodology and use cases

    🚀 Quick Start Example

    import pandas as pd
    from sklearn.cluster import KMeans
    from sklearn.preprocessing import StandardScaler
    
    # Load and prepare
    df = pd.read_csv('city_lifestyle_dataset.csv')
    X = df.drop(['city_name', 'country'], axis=1)
    X_scaled = StandardScaler().fit_transform(X)
    
    # Cluster
    kmeans = KMeans(n_clusters=5, random_state=42)
    df['cluster'] = kmeans.fit_predict(X_scaled)
    
    # Analyze
    print(df.groupby('cluster').mean())
    

    🎓 Learning Outcomes

    After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics

    📚 Ideal For These Projects

    • 🏆 Kaggle Competitions: Practice clustering techniques
    • 📝 Academic Projects: Urban planning, sociology, environmental science
    • 💼 Portfolio Work: Showcase ML skills to employers
    • 🎓 Learning: Hands-on practice with unsupervised learning
    • 🔬 Research: Urban lifestyle segmentation studies

    🌍 Expected Clusters

    ClusterCharacteristicsExample Cities
    Metropolitan Tech HubsHigh income, density, rentSilicon Valley, Singapore
    Eco-Friendly TownsLow density, clean air, high happinessNordic cities
    Developing CentersMid income, high density, poor airEmerging markets
    Low-Income SuburbanLow infrastructure, incomeRural areas
    Industrial Mega-CitiesVery high density, pollutionManufacturing hubs

    🛠️ Technical Details

    • Format: CSV (UTF-8)
    • Size: ~300 rows × 10 columns
    • Missing Values: 0%
    • Data Types: 2 categorical, 8 numerical
    • Target Variable: None (unsupervised)
    • Correlation Strength: Pre-validated (r: 0.4 to 0.8)

    📖 What Makes This Dataset Special?

    Unlike random synthetic data, this dataset was carefully engineered with: - ✨ Realistic correlation structures based on urban research - 🌍 Regional characteristics matching real-world patterns - 🎯 Optimal cluster separability (validated via silhouette scores) - 📚 Comprehensive documentation and starter code

    🏅 Use This Dataset If You Want To:

    ✓ Learn clustering without data cleaning hassles
    ✓ Practice PCA and dimensionality reduction
    ✓ Create beautiful geographic visualizations
    ✓ Understand feature correlation in real-world contexts
    ✓ Build a portfolio project with clear business insights

    📊 Acknowledgments

    This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.

    Happy Clustering! 🎉

  5. H

    Current Population Survey (CPS)

    • dataverse.harvard.edu
    • search.dataone.org
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  6. PubTables-1M (Table Structure Recognition Subset)

    • kaggle.com
    zip
    Updated Nov 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brandon Smock (2022). PubTables-1M (Table Structure Recognition Subset) [Dataset]. https://www.kaggle.com/datasets/bsmock/pubtables-1m-structure/code
    Explore at:
    zip(32827710431 bytes)Available download formats
    Dataset updated
    Nov 28, 2022
    Authors
    Brandon Smock
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    This is a subset of the full PubTables-1M dataset, for just the table structure recognition and functional analysis tasks. The data for the table detection task is not included here.

    Code: https://github.com/microsoft/table-transformer Paper: PubTables-1M: Towards comprehensive table extraction from unstructured documents

    Folder Structure

    The dataset contains 5 top-level folders: - images: JPG images - train: object bounding box annotations in PASCAL VOC XML format - test: object bounding box annotations in PASCAL VOC XML format - val: object bounding box annotations in PASCAL VOC XML format - words: word bounding boxes and text in JSON format

    Downloading from the Command Line

    Using the Kaggle CLI to download PubTables-1M may not download the entire dataset. Alternatively, to download each top-level folder as a zip file from a command line (for this example we use the 'words' directory and words.zip): 1. From a web browser, view the dataset and click on the top-level folder 'words' 2. Click download for words.zip 3. Pause/stop the download and copy the download link 4. In a terminal, run wget -O words.zip 'download link' where the download link is pasted inside single quotes 5. Repeat steps 1-4 for each top-level folder ('train', 'test', 'val', and 'images')

  7. LDA tables per dataset (4)

    • figshare.com
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Hernandez (2025). LDA tables per dataset (4) [Dataset]. http://doi.org/10.6084/m9.figshare.28927160.v1
    Explore at:
    Dataset updated
    May 4, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Maria Hernandez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tables for LDA analysis:sample by topic, metadata for sample, topic membership with taxonomyopen in R with load()

  8. d

    Replication Data for: \"A Topic-based Segmentation Model for Identifying...

    • search.dataone.org
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert (2024). Replication Data for: \"A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews\" [Dataset]. http://doi.org/10.7910/DVN/EE3DE2
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert
    Description

    We provide instructions, codes and datasets for replicating the article by Kim, Lee and McCulloch (2024), "A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews." This repository provides a user-friendly R package for any researchers or practitioners to apply A Topic-based Segmentation Model with Unstructured Texts (latent class regression with group variable selection) to their datasets. First, we provide a R code to replicate the illustrative simulation study: see file 1. Second, we provide the user-friendly R package with a very simple example code to help apply the model to real-world datasets: see file 2, Package_MixtureRegression_GroupVariableSelection.R and Dendrogram.R. Third, we provide a set of codes and instructions to replicate the empirical studies of customer-level segmentation and restaurant-level segmentation with Yelp reviews data: see files 3-a, 3-b, 4-a, 4-b. Note, due to the dataset terms of use by Yelp and the restriction of data size, we provide the link to download the same Yelp datasets (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/6). Fourth, we provided a set of codes and datasets to replicate the empirical study with professor ratings reviews data: see file 5. Please see more details in the description text and comments of each file. [A guide on how to use the code to reproduce each study in the paper] 1. Full codes for replicating Illustrative simulation study.txt -- [see Table 2 and Figure 2 in main text]: This is R source code to replicate the illustrative simulation study. Please run from the beginning to the end in R. In addition to estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships, you will get dendrograms of selected groups of variables in Figure 2. Computing time is approximately 20 to 30 minutes 3-a. Preprocessing raw Yelp Reviews for Customer-level Segmentation.txt: Code for preprocessing the downloaded unstructured Yelp review data and preparing DV and IVs matrix for customer-level segmentation study. 3-b. Instruction for replicating Customer-level Segmentation analysis.txt -- [see Table 10 in main text; Tables F-1, F-2, and F-3 and Figure F-1 in Web Appendix]: Code for replicating customer-level segmentation study with Yelp data. You will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 3 to 4 hours. 4-a. Preprocessing raw Yelp reviews_Restaruant Segmentation (1).txt: R code for preprocessing the downloaded unstructured Yelp data and preparing DV and IVs matrix for restaurant-level segmentation study. 4-b. Instructions for replicating restaurant-level segmentation analysis.txt -- [see Tables 5, 6 and 7 in main text; Tables E-4 and E-5 and Figure H-1 in Web Appendix]: Code for replicating restaurant-level segmentation study with Yelp. you will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 10 to 12 hours. [Guidelines for running Benchmark models in Table 6] Unsupervised Topic model: 'topicmodels' package in R -- after determining the number of topics(e.g., with 'ldatuning' R package), run 'LDA' function in the 'topicmodels'package. Then, compute topic probabilities per restaurant (with 'posterior' function in the package) which can be used as predictors. Then, conduct prediction with regression Hierarchical topic model (HDP): 'gensimr' R package -- 'model_hdp' function for identifying topics in the package (see https://radimrehurek.com/gensim/models/hdpmodel.html or https://gensimr.news-r.org/). Supervised topic model: 'lda' R package -- 'slda.em' function for training and 'slda.predict' for prediction. Aggregate regression: 'lm' default function in R. Latent class regression without variable selection: 'flexmix' function in 'flexmix' R package. Run flexmix with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, conduct prediction of dependent variable per each segment. Latent class regression with variable selection: 'Unconstraind_Bayes_Mixture' function in Kim, Fong and DeSarbo(2012)'s package. Run the Kim et al's model (2012) with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, we can do prediction of dependent variables per each segment. The same R package ('KimFongDeSarbo2012.zip') can be downloaded at: https://sites.google.com/scarletmail.rutgers.edu/r-code-packages/home 5. Instructions for replicating Professor ratings review study.txt -- [see Tables G-1, G-2, G-4 and G-5, and Figures G-1 and H-2 in Web Appendix]: Code to replicate the Professor ratings reviews study. Computing time is approximately 10 hours. [A list of the versions of R, packages, and computer...

  9. d

    Data from: Inventory of well-construction data, water-quality and quality...

    • catalog.data.gov
    • datasets.ai
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Inventory of well-construction data, water-quality and quality control data, statistical data, and geochemical modeling data for wells in Atlantic and Gulf Coastal Plain aquifers, eastern United States, 2012 and 2013 [Dataset]. https://catalog.data.gov/dataset/inventory-of-well-construction-data-water-quality-and-quality-control-data-statistical-dat
    Explore at:
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Gulf Coastal Plain, United States
    Description

    This dataset provides analytical and other data in support of an analysis of lead and manganese in untreated drinking water from Atlantic and Gulf Coastal Plain aquifers, eastern United States. The occurrence of dissolved lead and manganese in sampled groundwater, prior to its distribution or treatment, is related to the potential presence of source minerals and specific environmental factors including hydrologic position along the flow path, water-rock interactions, and associated geochemical conditions such as pH and dissolved oxygen (DO) concentrations. A DO/pH framework is proposed as a screening tool for evaluating risk of elevated lead or manganese, based on the occurrence of elevated lead and manganese concentrations and the corresponding distributions of DO and pH in 258 wells screened in the Atlantic and Gulf Coastal Plain aquifers. Included in this data release are the Supplementary Information Tables that also accompany the Applied Geochemistry journal article: Table of details on construction and hydrologic position of wells (percent distance from outcrop and percent depth to well centroid) sampled in Atlantic and Gulf Coastal Plain aquifers, 2012 and 2013. Table of general chemical characteristic and concentrations of major and trace elements and calculated parameters for groundwater samples from wells in Atlantic and Gulf Coastal Plain aquifers, 2012 and 2013. Table of mineral saturation indices (SI) and partial pressures of CO2 (PCO2 ) and O2 (PO2) computed with PHREEQC (Parkhurst and Appelo, 2013) for 258 groundwater samples from wells in Atlantic and Gulf Coastal Plain aquifers, 2012 and 2013. Table of spearman's rank correlation coefficient (r) matrix of principal components (PC1-PC6) and chemical data for 258 groundwater samples from wells in Atlantic and Gulf Coastal Plain aquifers, 2012 and 2013. Table of results of blank analysis for major and trace elements analyzed for 258 groundwater samples from wells in Atlantic and Gulf Coastal Plain aquifers, 2012 and 2013. Table of criteria and threshold concentrations for identifying redox processes in groundwater (after McMahon and Chapelle, 2008). Table of principal components analysis model of major factors affecting the chemistry of groundwater samples from wells in Atlantic and Gulf Coastal Plain aquifers, 2012 and 2013.

  10. N

    Table Rock, NE Annual Population and Growth Analysis Dataset: A...

    • neilsberg.com
    csv, json
    Updated Jul 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Table Rock, NE Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in Table Rock from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/table-rock-ne-population-by-year/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Table Rock, Nebraska
    Variables measured
    Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
    Measurement technique
    The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Table Rock population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Table Rock across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

    Key observations

    In 2023, the population of Table Rock was 234, a 0.43% decrease year-by-year from 2022. Previously, in 2022, Table Rock population was 235, a decline of 0.42% compared to a population of 236 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Table Rock decreased by 29. In this period, the peak population was 272 in the year 2011. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

    Data Coverage:

    • From 2000 to 2023

    Variables / Data Columns

    • Year: This column displays the data year (Measured annually and for years 2000 to 2023)
    • Population: The population for the specific year for the Table Rock is shown in this column.
    • Year on Year Change: This column displays the change in Table Rock population for each year compared to the previous year.
    • Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Table Rock Population by Year. You can refer the same here

  11. r

    Samples not used in analysis

    • redivis.com
    Updated Oct 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Impact Data Collaborative (2022). Samples not used in analysis [Dataset]. https://redivis.com/datasets/ws5a-390wrpycz/usage
    Explore at:
    Dataset updated
    Oct 4, 2022
    Dataset authored and provided by
    Environmental Impact Data Collaborative
    Description

    The table Samples not used in analysis is part of the dataset Lead Results From Tap Water Sampling in Flint, available at https://redivis.com/datasets/ws5a-390wrpycz. It contains 5 rows across 7 variables.

  12. Data from: (Table 2) Number of samples used for each type of analysis by...

    • doi.pangaea.de
    html, tsv
    Updated Jan 14, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick L Barnard; Li H Erikson; Peter W Swarzenski; Renee K Takesue; Amy C Foxgrover; Edwin Elias; James R Hein; Mary L McGann; Kira Mizell; Robert J Rosenbauer; Florence L Wong; Donald L Woodrow (2013). (Table 2) Number of samples used for each type of analysis by sample origin [Dataset]. http://doi.org/10.1594/PANGAEA.805150
    Explore at:
    html, tsvAvailable download formats
    Dataset updated
    Jan 14, 2013
    Dataset provided by
    PANGAEA
    Authors
    Patrick L Barnard; Li H Erikson; Peter W Swarzenski; Renee K Takesue; Amy C Foxgrover; Edwin Elias; James R Hein; Mary L McGann; Kira Mizell; Robert J Rosenbauer; Florence L Wong; Donald L Woodrow
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Variables measured
    Analysis, Sample amount
    Description

    This dataset is about: (Table 2) Number of samples used for each type of analysis by sample origin. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.803904 for more information.

  13. Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Dec 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Westfall; Mullins James (2023). Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies [Dataset]. http://doi.org/10.5061/dryad.w3r2280w0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 7, 2023
    Dataset provided by
    National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
    HIV Prevention Trials Networkhttp://www.hptn.org/
    HIV Vaccine Trials Networkhttp://www.hvtn.org/
    PEPFAR
    Authors
    Dylan Westfall; Mullins James
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies. Methods This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies" Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005 For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub. The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub. The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results. Sequence_Analysis.Rmd has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd and Figures.Rmd. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program. To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper. Using Identifying_Recombinant_Reads.Rmd, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd. Figures.Rmd used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.

  14. N

    Table Grove, IL Annual Population and Growth Analysis Dataset: A...

    • neilsberg.com
    csv, json
    Updated Jul 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Table Grove, IL Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in Table Grove from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/table-grove-il-population-by-year/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Illinois, Table Grove
    Variables measured
    Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
    Measurement technique
    The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Table Grove population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Table Grove across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

    Key observations

    In 2023, the population of Table Grove was 335, a 2.05% decrease year-by-year from 2022. Previously, in 2022, Table Grove population was 342, a decline of 1.16% compared to a population of 346 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Table Grove decreased by 54. In this period, the peak population was 418 in the year 2010. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

    Data Coverage:

    • From 2000 to 2023

    Variables / Data Columns

    • Year: This column displays the data year (Measured annually and for years 2000 to 2023)
    • Population: The population for the specific year for the Table Grove is shown in this column.
    • Year on Year Change: This column displays the change in Table Grove population for each year compared to the previous year.
    • Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Table Grove Population by Year. You can refer the same here

  15. N

    Table Rock, NE Population Breakdown by Gender and Age

    • neilsberg.com
    csv, json
    Updated Sep 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). Table Rock, NE Population Breakdown by Gender and Age [Dataset]. https://www.neilsberg.com/research/datasets/67b02389-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Sep 14, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Table Rock, Nebraska
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Table Rock by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Table Rock. The dataset can be utilized to understand the population distribution of Table Rock by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Table Rock. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Table Rock.

    Key observations

    Largest age group (population): Male # 65-69 years (27) | Female # 35-39 years (32). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

    Variables / Data Columns

    • Age Group: This column displays the age group for the Table Rock population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the Table Rock is shown in the following column.
    • Population (Female): The female population in the Table Rock is shown in the following column.
    • Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Table Rock for each age group.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Table Rock Population by Gender. You can refer the same here

  16. d

    Data from: Data Tables Associated with an Analysis of the U.S. Geological...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data Tables Associated with an Analysis of the U.S. Geological Survey's Historical Water-use Data, 1985–2015 [Dataset]. https://catalog.data.gov/dataset/data-tables-associated-with-an-analysis-of-the-u-s-geological-surveys-historical-water-use
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    The datasets in this data release contain the results of an analysis of the U.S. Geological Survey's historical water-use data from 1985 to 2015. Data were assessed to determine the top category of water use by volume. Data from groundwater, surface water, and total water (groundwater plus surface water) use were parsed by water type, and the top category of use by county or the geographic region or local government equivalent to a county (for example, parishes in Louisiana) was determined. There are two sets of results provided, one for the "Priority" categories of water use and the second for all categories of water use. "Priority" categories are irrigation, public supply, and thermoelectric power and comprise 90 percent of all water use nationwide. In addition to the priority categories, the remaining categories of water use are as follows: aquaculture, domestic, industrial, livestock, and mining. Water-use data historically have been compiled at the county level every 5 years as part of the U.S. Geological Survey's National Water Use Science Project. In 2020 the U.S. Geological Survey began transitioning the collection of water-use data from every 5 years to an annual collection, from county level to hydrologic unit code (HUC) 12, and to a model-based approach. To assist in the transition, an assessment of the current (2022) historical water-use data was done by the Water-Use Gap Analysis Project.

  17. The dataset for Example 2 of Table 3.

    • plos.figshare.com
    txt
    Updated Nov 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Razaw Al-Sarraj; Johannes Forkman (2023). The dataset for Example 2 of Table 3. [Dataset]. http://doi.org/10.1371/journal.pone.0295066.s004
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Razaw Al-Sarraj; Johannes Forkman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    It is commonly believed that if a two-way analysis of variance (ANOVA) is carried out in R, then reported p-values are correct. This article shows that this is not always the case. Results can vary from non-significant to highly significant, depending on the choice of options. The user must know exactly which options result in correct p-values, and which options do not. Furthermore, it is commonly supposed that analyses in SAS and R of simple balanced experiments using mixed-effects models result in correct p-values. However, the simulation study of the current article indicates that frequency of Type I error deviates from the nominal value. The objective of this article is to compare SAS and R with respect to correctness of results when analyzing small experiments. It is concluded that modern functions and procedures for analysis of mixed-effects models are sometimes not as reliable as traditional ANOVA based on simple computations of sums of squares.

  18. Dataset for Privacy Exercises

    • kaggle.com
    zip
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shining (2024). Dataset for Privacy Exercises [Dataset]. https://www.kaggle.com/datasets/shiningana/dataset-for-privacy-exercises
    Explore at:
    zip(7327312 bytes)Available download formats
    Dataset updated
    Apr 9, 2024
    Authors
    Shining
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset gives some data of a hypothetical business that can be used to practice your privacy data transformation and analysis skills.

    The dataset contains the following files/tables: 1. customer_orders_for_privacy_exercises.csv contains data of a business about customer orders (columns separated by commas) 2. users_web_browsing_for_privacy_exercises.csv contains data collected by the business website about its users (columns separated by commas) 3. iot_example.csv contains data collected by a smart device on users' bio-metric data (columns separated by commas) 4. members.csv contains data collected by a library on its users (columns separated by commas)

  19. P-values obtained from the analysis of example data in Table 1 using several...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leena Choi; Jeffrey D. Blume; William D. Dupont (2023). P-values obtained from the analysis of example data in Table 1 using several methods. [Dataset]. http://doi.org/10.1371/journal.pone.0121263.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Leena Choi; Jeffrey D. Blume; William D. Dupont
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    • The likelihood ratio (LR) test based on the Equation (9).P-values obtained from the analysis of example data in Table 1 using several methods.
  20. Complete DAX Practice Dataset

    • kaggle.com
    zip
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Mahmoud Ali (2025). Complete DAX Practice Dataset [Dataset]. https://www.kaggle.com/datasets/thesnak/complete-dax-practice-dataset
    Explore at:
    zip(2980320 bytes)Available download formats
    Dataset updated
    Oct 29, 2025
    Authors
    Mohamed Mahmoud Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🧮 Complete DAX Practice Dataset — Power BI / DAX Learning Resource

    📘 Overview

    This synthetic dataset is designed specifically for Power BI and DAX (Data Analysis Expressions) learners and professionals. It provides a complete star schema for practicing DAX measures, relationships, filters, and time intelligence — just like in real-world business analytics projects.

    The dataset simulates a multi-year sales environment with customers, employees, products, geographies, and dates — allowing you to perform calculations across multiple business dimensions.

    🧩 Dataset Structure

    This dataset contains 6 CSV files, forming a clean star schema:

    Table NameTypeDescription
    FactSalesFactContains transactional sales data with quantities, amounts, profits, discounts, and references to all dimension keys.
    DimDateDimensionA complete date table (2018–2024) including Year, Quarter, Month, DayOfWeek, Weekend/Holiday flags, etc.
    DimProductDimensionProduct catalog with Category, SubCategory, Color, Size, StandardCost, and ListPrice.
    DimCustomerDimensionCustomer information including name, gender, signup date, loyalty tier, and geographic key.
    DimEmployeeDimensionSales employee data including name, role, hire date, and region.
    DimGeographyDimensionGeographic data covering countries, regions, and cities.

    🗂️ Fact Table Fields

    ColumnDescription
    SalesKeyUnique identifier for each transaction
    OrderDateKey, ShipDateKeyForeign keys to DimDate
    ProductKey, CustomerKey, EmployeeKey, GeographyKeyForeign keys to respective dimensions
    QuantityNumber of units sold
    UnitPricePrice per unit
    DiscountDiscount applied to the sale
    SalesAmountTotal sales value after discount
    TotalCostTotal cost of goods sold
    ProfitSalesAmount – TotalCost
    ChannelOnline, Retail, or Distributor
    PaymentMethodCredit, Cash, or Transfer
    OrderPriorityLow, Medium, or High priority

    📅 DimDate Fields

    Includes:

    • DateKey (YYYYMMDD)
    • Date
    • Year, Quarter, Month, MonthName, Day, DayOfWeek
    • IsWeekend, IsHoliday

    Perfect for DAX time intelligence functions like: TOTALYTD, SAMEPERIODLASTYEAR, DATESINPERIOD, and PARALLELPERIOD.

    🌍 Business Scenario

    Imagine a mid-sized electronics retailer operating across multiple regions and sales channels. The dataset captures 7 years of simulated performance — including seasonal patterns, regional sales variations, and customer loyalty effects.

    🧠 Learning Objectives

    This dataset is designed for:

    • Practicing Power BI data modeling
    • Learning and mastering DAX functions
    • Building interactive dashboards
    • Applying time intelligence and advanced calculations
    • Teaching data modeling concepts in analytics courses

    💡 Example DAX Practice Topics

    You can use this dataset to practice almost every DAX concept:

    Basic Aggregations

    Total Sales = SUM(FactSales[SalesAmount])
    Total Profit = SUM(FactSales[Profit])
    

    Context & Filters

    Online Sales = CALCULATE([Total Sales], FactSales[Channel] = "Online")
    

    Time Intelligence

    YTD Sales = TOTALYTD([Total Sales], DimDate[Date])
    Sales YoY % = DIVIDE([Total Sales] - [Previous Year Sales], [Previous Year Sales])
    

    Relationship Functions

    Shipped Sales = CALCULATE([Total Sales], USERELATIONSHIP(FactSales[ShipDateKey], DimDate[DateKey]))
    

    Ranki...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Organization logoOrganization logo

Orange dataset table

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
xlsxAvailable download formats
Dataset updated
Mar 4, 2022
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Rui Simões
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

Search
Clear search
Close search
Google apps
Main menu