100+ datasets found
  1. Iris dataset

    • kaggle.com
    zip
    Updated Jan 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ehsan Zafari (2024). Iris dataset [Dataset]. https://www.kaggle.com/datasets/ehsanzafari/iris-dataset
    Explore at:
    zip(955 bytes)Available download formats
    Dataset updated
    Jan 16, 2024
    Authors
    Ehsan Zafari
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The Iris dataset is a classic dataset in the field of machine learning and statistics. It's often used for demonstrating various data analysis, machine learning, and statistical techniques. Here are some key details about it:

    Background - Origin: The dataset was introduced by the British statistician and biologist Ronald Fisher in his 1936 paper titled "The use of multiple measurements in taxonomic problems." - Purpose: Fisher developed the dataset as an example of linear discriminant analysis.

    Data Composition - Data Points: The dataset consists of 150 samples from three species of Iris flowers: Iris Setosa, Iris Versicolour, and Iris Virginica. - Features: There are four features measured in centimeters for each sample: 1. Sepal Length 2. Sepal Width 3. Petal Length 4. Petal Width - Classes: The dataset contains three classes, corresponding to the three species of Iris. Each class has 50 samples.

    Usage - Classification: The Iris dataset is widely used for classification tasks, especially to illustrate the principles of supervised machine learning algorithms. - Testing Algorithms: It's often used to test out algorithms for linear regression, classification, and clustering due to its simplicity and small size. - Educational Purpose: Because of its clarity and simplicity, it's frequently used in teaching data science and machine learning.

    Characteristics - Simple and Clean: The dataset is straightforward, with minimal preprocessing required, making it ideal for beginners. - Well-Behaved Classes: The species are relatively well separated, though there's some overlap between Versicolor and Virginica. - Multivariate Data: It involves understanding the relationship between multiple variables (the four features).

    Applications - Benchmarking: The Iris dataset serves as a benchmark for evaluating the performance of different algorithms. - Visualization**: It's great for practicing data visualization, especially for exploring techniques like scatter plots, box plots, and pair plots to understand feature relationships.

    Despite its simplicity, the Iris dataset remains one of the most famous datasets in the world of data science and machine learning. It serves as an excellent starting point for anyone new to the field and remains a baseline for testing algorithms and teaching concepts.

  2. N

    Dana Point, CA Annual Population and Growth Analysis Dataset: A...

    • neilsberg.com
    csv, json
    Updated Jul 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Dana Point, CA Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in Dana Point from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/dana-point-ca-population-by-year/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California, Dana Point
    Variables measured
    Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
    Measurement technique
    The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Dana Point population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Dana Point across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

    Key observations

    In 2023, the population of Dana Point was 32,567, a 0.25% decrease year-by-year from 2022. Previously, in 2022, Dana Point population was 32,647, a decline of 0.51% compared to a population of 32,815 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Dana Point decreased by 2,634. In this period, the peak population was 35,992 in the year 2009. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

    Data Coverage:

    • From 2000 to 2023

    Variables / Data Columns

    • Year: This column displays the data year (Measured annually and for years 2000 to 2023)
    • Population: The population for the specific year for the Dana Point is shown in this column.
    • Year on Year Change: This column displays the change in Dana Point population for each year compared to the previous year.
    • Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Dana Point Population by Year. You can refer the same here

  3. m

    Example Stata syntax and data construction for negative binomial time series...

    • data.mendeley.com
    Updated Nov 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Price (2022). Example Stata syntax and data construction for negative binomial time series regression [Dataset]. http://doi.org/10.17632/3mj526hgzx.2
    Explore at:
    Dataset updated
    Nov 2, 2022
    Authors
    Sarah Price
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2).

    The variables contained therein are defined as follows:

    case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases).

    patid: a unique patient identifier.

    time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer,

    ncons: number of consultations per month.

    period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point.

    burden: binary variable denoting membership of one of two multimorbidity burden groups.

    We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do).

    Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.

  4. 1999 RoxAnn Data Points from Apalachicola Bay, Florida

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Oct 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA Office for Coastal Management (Point of Contact, Custodian) (2024). 1999 RoxAnn Data Points from Apalachicola Bay, Florida [Dataset]. https://catalog.data.gov/dataset/1999-roxann-data-points-from-apalachicola-bay-florida1
    Explore at:
    Dataset updated
    Oct 31, 2024
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Area covered
    Florida, Apalachicola Bay
    Description

    The Apalachicola Bay National Estuarine Research Reserve and the NOAA Office for Coastal Management worked together to map benthic habitats within Apalachicola Bay, Florida. The bay and the lower portions of four distributaries were surveyed on 11-22 October 1999 using three benthic sampling techniques. This data set represents the information gathered from a RoxAnn acoustic sensor. The instrument was used to characterize bottom type by extracting data on bottom roughness and bottom hardness from the primary and secondary sounder echoes. The data is classified on-the-fly, using the Sediment Profile Images and grab samples collected for field validation, and subject to a post-processing classification. The RoxAnn data points were exported into a geographic information system (GIS) and post-processed to remove unreliable data points and re-classified. This data set is comprised of the cleaned, attributed point data. The attributes include location, date, time, depth, field derived classification, and the classification derived from post-processing the data. Original contact information: Contact Org: NOAA Office for Coastal Management Phone: 843-740-1202 Email: coastal.info@noaa.gov

  5. Data from: Point of sales

    • kaggle.com
    zip
    Updated Jan 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    smmmmmmmmmmmm (2024). Point of sales [Dataset]. https://www.kaggle.com/datasets/smmmmmmmmmmmm/point-of-sales
    Explore at:
    zip(34427 bytes)Available download formats
    Dataset updated
    Jan 12, 2024
    Authors
    smmmmmmmmmmmm
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    It corresponds to a unique transaction identified by Transaction_ID and includes details such as Date, Product_ID, Product_Name, Quantity, Unit_Price, Total_Price, Customer_ID, Payment_Method, and Store_Location. The synthetic data simulates diverse transactions with random product information, quantities, prices, customer IDs, payment methods, and store locations. This dataset provides a foundation for analyzing and understanding patterns within a Point of Sale environment, facilitating research or development in related fields such as retail analytics, inventory management, and customer behavior analysis.

  6. d

    Gulf of Maine - Control Points Used to Validate the Accuracies of the...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated May 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact, Custodian) (2025). Gulf of Maine - Control Points Used to Validate the Accuracies of the Interpolated Water Density Rasters [Dataset]. https://catalog.data.gov/dataset/gulf-of-maine-control-points-used-to-validate-the-accuracies-of-the-interpolated-water-density-1
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    (Point of Contact, Custodian)
    Area covered
    Gulf of Maine
    Description

    This feature dataset contains the control points used to validate the accuracies of the interpolated water density rasters for the Gulf of Maine. These control points were selected randomly from the water density data points, using Hawth's Create Random Selection Tool. Twenty-five percent of each seasonal bin (for each year and at each depth) were randomly selected and set aside for validation. For example, if there were 1,000 water density data points for the fall (September, October, November) 2003 at 0 meters, then 250 of those points were randomly selected, removed and set aside to assess the accuracy of interpolated surface. The naming convention of the validation point feature class includes the year (or years), the season, and the depth (in meters) it was selected from. So for example, the name: ValidationPoints_1997_2004_Fall_0m would indicate that this point feature class was randomly selected from water density points that were at 0 meters in the fall between 1997-2004. The seasons were defined using the same months as the remote sensing data--namely, Fall = September, October, November; Winter = December, January, February; Spring = March, April, May; and Summer = June, July, August.

  7. US Travel Check-Ins - Analysis

    • kaggle.com
    zip
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). US Travel Check-Ins - Analysis [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-travel-check-ins-analysis
    Explore at:
    zip(2350764 bytes)Available download formats
    Dataset updated
    Feb 11, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    US Travel Check-Ins - Analysis

    In-Depth Study of Location, Date, Temperature, USIndex, and Crime Rates

    By [source]

    About this dataset

    This comprehensive dataset offers an in-depth exploration into US travel check-ins from Instagram. It includes detailed data scraped from Instagram, such as the location of each check-in, the USIndex for each state, average temperature for each state per month, and crime rate per state. In addition to location and time information, this dataset also provides latitude and longitude coordinates for every entry. This extensive collection of data is invaluable for those interested in studying various aspects of movement within the United States. With detailed insights on factors like climate conditions and economic health of a region at a given point in time, this dataset can help uncover fascinating trends regarding how travelers choose their destinations and how they experience their journeys around the country

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This Kaggle dataset - US Travel Check-Ins Analysis - provides valuable insights for travel researchers, marketers and businesses in the travel industry. It contains check-in location, USIndex rating (economic health of each state), average temperature, and crime rate per state. Latitude and longitude of each check-ins are also provided with added geographic context to help you visualize the data.

    This guide will show you how to use this dataset for your research or business venture.

    Step 1: Prepare your data First and foremost, it is important to cleanse your data before you can analyze it. Depending on what sort of analysis needs to be conducted (e.g., time series analysis) you will need to select the applicable columns from the dataset that match your needs best and exclude any unnecessary columns such as dates or season related data points as they are not relevant here. Furthermore, variable formatting should be consistent across all instances in a variable/column category as well (elevation is a good example here). You can always double check that everything is formatted correctly by running a quick summary on selected columns using conditional queries like df['var'].describe() command in Python for descriptive results about an entire column’s statistical makeup including mean values, quartile ranges etc..

    Step 2: Explore & Analyze Your Data Graphically Once the data has been prepped properly you can start visualizing it in order to gain better insights into any trends or patterns that may be present within it when compared with other datasets or information sources simultaneously such as weather forecasts or nationwide trend indicators etc.. Grafana dashboards are feasible solutions when multiple dataset need to be compared but depending on what type of graphs/charts being used Excel worksheet formats can offer great customization options flexiblity along with various export file types (.csv; .jpegs; .pdfs). Plotting markers onto map applications like Google Maps API offers more geographical awareness that could useful when analyzing location dependent variables too which means we have one advantage over manual inspection tasks just by leveraging existing software applications alongside publicly available APIs!

    Step 3: Interpretation & Hypothesis Testing
    After generating informative graphical interpretation from exploratory visualizations the next step would involve testing out various hypotheses based on established correlations between different variables derived from overall quantitative estimates vizualizations regarding distribution trends across different regions tends towards geographical areas where certain logistical processes could yeild higher success ratios giving potential customers greater satisfaction than

    Research Ideas

    • Travel trends analysis: Using this dataset, researchers could track which areas of the US are popular destinations based on travel check-ins and spot any interesting trends or correlations in terms of geography, seasonal changes, economic health or crime rates.
    • Predictive Modeling: By using various features from this dataset such as average temperature, US Index and crime rate, predictors could be developed to suggest how safe an area would feel to a tourist based on their current location and other predetermined variables they choose to input into the model.
    • Trip Planning Tool: The dataset can also be used to develop a tool that quickly allows travelers to plan trips according to their preferences in terms of duration and budget as well a...
  8. N

    Dana Point, CA Population Dataset: Yearly Figures, Population Change, and...

    • neilsberg.com
    csv, json
    Updated Sep 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). Dana Point, CA Population Dataset: Yearly Figures, Population Change, and Percent Change Analysis [Dataset]. https://www.neilsberg.com/research/datasets/6e47b262-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Sep 18, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California, Dana Point
    Variables measured
    Annual Population Growth Rate, Population Between 2000 and 2022, Annual Population Growth Rate Percent
    Measurement technique
    The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2022. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2022. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Dana Point population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Dana Point across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

    Key observations

    In 2022, the population of Dana Point was 32,465, a 1.00% decrease year-by-year from 2021. Previously, in 2021, Dana Point population was 32,794, a decline of 0.82% compared to a population of 33,066 in 2020. Over the last 20 plus years, between 2000 and 2022, population of Dana Point decreased by 2,736. In this period, the peak population was 35,992 in the year 2009. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

    Data Coverage:

    • From 2000 to 2022

    Variables / Data Columns

    • Year: This column displays the data year (Measured annually and for years 2000 to 2022)
    • Population: The population for the specific year for the Dana Point is shown in this column.
    • Year on Year Change: This column displays the change in Dana Point population for each year compared to the previous year.
    • Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Dana Point Population by Year. You can refer the same here

  9. d

    Global Point of Interest (POI) Data | 230M+ Locations, 5000 Categories,...

    • datarade.ai
    .json
    Updated Sep 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xverum (2024). Global Point of Interest (POI) Data | 230M+ Locations, 5000 Categories, Geographic & Location Intelligence, Regular Updates [Dataset]. https://datarade.ai/data-products/global-point-of-interest-poi-data-230m-locations-5000-c-xverum
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Sep 7, 2024
    Dataset provided by
    Xverum LLC
    Authors
    Xverum
    Area covered
    French Polynesia, Mauritania, Andorra, Costa Rica, Kyrgyzstan, Vietnam, Antarctica, Guatemala, Northern Mariana Islands, Bahamas
    Description

    Xverum’s Point of Interest (POI) Data is a comprehensive dataset containing 230M+ verified locations across 5000 business categories. Our dataset delivers structured geographic data, business attributes, location intelligence, and mapping insights, making it an essential tool for GIS applications, market research, urban planning, and competitive analysis.

    With regular updates and continuous POI discovery, Xverum ensures accurate, up-to-date information on businesses, landmarks, retail stores, and more. Delivered in bulk to S3 Bucket and cloud storage, our dataset integrates seamlessly into mapping, geographic information systems, and analytics platforms.

    🔥 Key Features:

    Extensive POI Coverage: âś… 230M+ Points of Interest worldwide, covering 5000 business categories. âś… Includes retail stores, restaurants, corporate offices, landmarks, and service providers.

    Geographic & Location Intelligence Data: ✅ Latitude & longitude coordinates for mapping and navigation applications. ✅ Geographic classification, including country, state, city, and postal code. ✅ Business status tracking – Open, temporarily closed, or permanently closed.

    Continuous Discovery & Regular Updates: âś… New POIs continuously added through discovery processes. âś… Regular updates ensure data accuracy, reflecting new openings and closures.

    Rich Business Insights: âś… Detailed business attributes, including company name, category, and subcategories. âś… Contact details, including phone number and website (if available). âś… Consumer review insights, including rating distribution and total number of reviews (additional feature). âś… Operating hours where available.

    Ideal for Mapping & Location Analytics: âś… Supports geospatial analysis & GIS applications. âś… Enhances mapping & navigation solutions with structured POI data. âś… Provides location intelligence for site selection & business expansion strategies.

    Bulk Data Delivery (NO API): âś… Delivered in bulk via S3 Bucket or cloud storage. âś… Available in structured format (.json) for seamless integration.

    🏆Primary Use Cases:

    Mapping & Geographic Analysis: 🔹 Power GIS platforms & navigation systems with precise POI data. 🔹 Enhance digital maps with accurate business locations & categories.

    Retail Expansion & Market Research: 🔹 Identify key business locations & competitors for market analysis. 🔹 Assess brand presence across different industries & geographies.

    Business Intelligence & Competitive Analysis: 🔹 Benchmark competitor locations & regional business density. 🔹 Analyze market trends through POI growth & closure tracking.

    Smart City & Urban Planning: 🔹 Support public infrastructure projects with accurate POI data. 🔹 Improve accessibility & zoning decisions for government & businesses.

    💡 Why Choose Xverum’s POI Data?

    • 230M+ Verified POI Records – One of the largest & most detailed location datasets available.
    • Global Coverage – POI data from 249+ countries, covering all major business sectors.
    • Regular Updates – Ensuring accurate tracking of business openings & closures.
    • Comprehensive Geographic & Business Data – Coordinates, addresses, categories, and more.
    • Bulk Dataset Delivery – S3 Bucket & cloud storage delivery for full dataset access.
    • 100% Compliant – Ethically sourced, privacy-compliant data.

    Access Xverum’s 230M+ POI dataset for mapping, geographic analysis, and location intelligence. Request a free sample or contact us to customize your dataset today!

  10. N

    West Point, UT Population Dataset: Yearly Figures, Population Change, and...

    • neilsberg.com
    csv, json
    Updated Sep 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). West Point, UT Population Dataset: Yearly Figures, Population Change, and Percent Change Analysis [Dataset]. https://www.neilsberg.com/research/datasets/6fb2b328-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Sep 18, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    West Point, Utah
    Variables measured
    Annual Population Growth Rate, Population Between 2000 and 2022, Annual Population Growth Rate Percent
    Measurement technique
    The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2022. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2022. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the West Point population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of West Point across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

    Key observations

    In 2022, the population of West Point was 11,892, a 3.81% increase year-by-year from 2021. Previously, in 2021, West Point population was 11,456, an increase of 3.58% compared to a population of 11,060 in 2020. Over the last 20 plus years, between 2000 and 2022, population of West Point increased by 5,784. In this period, the peak population was 11,892 in the year 2022. The numbers suggest that the population has not reached its peak yet and is showing a trend of further growth. Source: U.S. Census Bureau Population Estimates Program (PEP).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

    Data Coverage:

    • From 2000 to 2022

    Variables / Data Columns

    • Year: This column displays the data year (Measured annually and for years 2000 to 2022)
    • Population: The population for the specific year for the West Point is shown in this column.
    • Year on Year Change: This column displays the change in West Point population for each year compared to the previous year.
    • Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for West Point Population by Year. You can refer the same here

  11. Energy Consumption of United States Over Time

    • kaggle.com
    zip
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Energy Consumption of United States Over Time [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-the-energy-consumption-of-united-state
    Explore at:
    zip(222388 bytes)Available download formats
    Dataset updated
    Dec 14, 2022
    Authors
    The Devastator
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Energy Consumption of United States Over Time

    Building Energy Data Book

    By Department of Energy [source]

    About this dataset

    The Building Energy Data Book (2011) is an invaluable resource for gaining insight into the current state of energy consumption in the buildings sector. This dataset provides comprehensive data on residential, commercial and industrial building energy consumption, construction techniques, building technologies and characteristics. With this resource, you can get an in-depth understanding of how energy is used in various types of buildings - from single family homes to large office complexes - as well as its impact on the environment. The BTO within the U.S Department of Energy's Office of Energy Efficiency and Renewable Energy developed this dataset to provide a wealth of knowledge for researchers, policy makers, engineers and even everyday observers who are interested in learning more about our built environment and its energy usage patterns

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides comprehensive information regarding energy consumption in the buildings sector of the United States. It contains a number of key variables which can be used to analyze and explore the relations between energy consumption and building characteristics, technologies, and construction. The data is provided in both CSV format as well as tabular format which can make it helpful for those who prefer to use programs like Excel or other statistical modeling software.

    In order to get started with this dataset we've developed a guide outlining how to effectively use it for your research or project needs.

    • Understand what's included: Before you start analyzing the data, you should read through the provided documentation so that you fully understand what is included in the datasets. You'll want to be aware of any potential limitations or requirements associated with each type of data point so that your results are valid and reliable when drawing conclusions from them.

    • Clean up any outliers: You may need to take some time upfront investigating suspicious outliers within your dataset before using it in any further analyses — otherwise, they can skew results down the road if not dealt with first-hand! Furthermore, they could also make complex statistical modeling more difficult as well since they artificially inflate values depending on their magnitude within each example data point (i.e., one outlier could affect an entire model’s prior distributions). Missing values should also be accounted for too since these may not always appear obvious at first glance when reviewing a table or graphical representation - but accurate statistics must still be obtained either way no matter how messy things seem!

    • Exploratory data analysis: After cleaning up your dataset you'll want to do some basic exploring by visualizing different types of summaries like boxplots, histograms and scatter plots etc.. This will give you an initial case into what trends might exist within certain demographic/geographic/etc.. regions & variables which can then help inform future predictive models when needed! Additionally this step will highlight any clear discontinuous changes over time due over-generalization (if applicable), making sure predictors themselves don’t become part noise instead contributing meaningful signals towards overall effect predictions accuracy etc…

    • Analyze key metrics & observations: Once exploratory analyses have been carried out on rawsamples post-processing steps are next such as analyzing metrics such ascorrelations amongst explanatory functions; performing significance testing regression models; imputing missing/outlier values and much more depending upon specific project needs at hand… Additionally – interpretation efforts based

    Research Ideas

    • Creating an energy efficiency rating system for buildings - Using the dataset, an organization can develop a metric to rate the energy efficiency of commercial and residential buildings in a standardized way.
    • Developing targeted campaigns to raise awareness about energy conservation - Analyzing data from this dataset can help organizations identify areas of high energy consumption and create targeted campaigns and incentives to encourage people to conserve energy in those areas.
    • Estimating costs associated with upgrading building technologies - By evaluating various trends in building technologies and their associated costs, decision-makers can determine the most cost-effective option when it comes time to upgrade their structures' energy efficiency...
  12. Human Resource Data Set (The Company)

    • kaggle.com
    zip
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koluit (2025). Human Resource Data Set (The Company) [Dataset]. https://www.kaggle.com/datasets/koluit/human-resource-data-set-the-company
    Explore at:
    zip(401322 bytes)Available download formats
    Dataset updated
    Nov 12, 2025
    Authors
    Koluit
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Similar to others who have created HR data sets, we felt that the lack of data out there for HR was limiting. It is very hard for someone to test new systems or learn People Analytics in the HR space. The only dataset most HR practitioners have is their real employee data and there are a lot of reasons why you would not want to use that when experimenting. We hope that by providing this dataset with an evergrowing variation of data points, others can learn and grow their HR data analytics and systems knowledge.

    Some example test cases where someone might use this dataset:

    HR Technology Testing and Mock-Ups Engagement survey tools HCM tools BI Tools Learning To Code For People Analytics Python/R/SQL HR Tech and People Analytics Educational Courses/Tools

    Content

    The core data CompanyData.txt has the basic demographic data about a worker. We treat this as the core data that you can join future data sets to.

    Please read the Readme.md for additional information about this along with the Changelog for additional updates as they are made.

    Acknowledgements

    Initial names, addresses, and ages were generated using FakenameGenerator.com. All additional details including Job, compensation, and additional data sets were created by the Koluit team using random generation in Excel.

    Inspiration

    Our hope is this data is used in the HR or Research space to experiment and learn using HR data. Some examples that we hope this data will be used are listed above.

    Contact Us

    Have any suggestions for additions to the data? See any issues with our data? Want to use it for your project? Please reach out to us! https://koluit.com/ ryan@koluit.com

  13. Geospatial Data Pack for Visualization

    • kaggle.com
    zip
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vega Datasets (2025). Geospatial Data Pack for Visualization [Dataset]. https://www.kaggle.com/datasets/vega-datasets/geospatial-data-pack
    Explore at:
    zip(1422109 bytes)Available download formats
    Dataset updated
    Oct 21, 2025
    Dataset authored and provided by
    Vega Datasets
    Description

    Geospatial Data Pack for Visualization 🗺️

    Learn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets

    Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.

    Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.

    Why Use This Dataset? 🤔

    • Comprehensive Geospatial Types: Explore a variety of core geospatial data models:
      • Vector Data: Includes points (like airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).
      • Raster-like Data: Work with gridded datasets (like windvectors.csv, annual-precip.json).
    • Diverse Formats: Gain experience with standard and efficient geospatial formats like GeoJSON (see Table 1, 2, 4), compressed TopoJSON (see Table 1), and plain CSV/TSV (see Table 2, 3, 4) for point data and attribute tables ready for joining.
    • Multi-Scale Coverage: Practice visualization across different geographic scales, from global and national (Table 1, 4) down to the city level (Table 1).
    • Rich Thematic Mapping: Includes multiple datasets (Table 3) specifically designed for joining attributes to geographic boundaries (like states or counties from Table 1) to create insightful choropleth maps.
    • Ready-to-Use & Example-Driven: Cleaned datasets tightly integrated with 31+ official examples (see Appendix) from Altair, Vega-Lite, and Vega, allowing you to immediately practice techniques like projections, point maps, network maps, and interactive displays.
    • Python Friendly: Works seamlessly with essential Python libraries like Altair (which can directly read TopoJSON/GeoJSON), Pandas, and GeoPandas, fitting perfectly into the Kaggle notebook environment.

    Table of Contents

    Dataset Inventory 🗂️

    This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.

    1. BASE MAP BOUNDARIES (Topological Data)

    DatasetFileSizeFormatLicenseDescriptionKey Fields / Join Info
    US Map (1:10m)us-10m.json627 KBTopoJSONCC-BY-4.0US state and county boundaries. Contains states and counties objects. Ideal for choropleths.id (FIPS code) property on geometries
    World Map (1:110m)world-110m.json117 KBTopoJSONCC-BY-4.0World country boundaries. Contains countries object. Suitable for world-scale viz.id property on geometries
    London BoroughslondonBoroughs.json14 KBTopoJSONCC-BY-4.0London borough boundaries.properties.BOROUGHN (name)
    London CentroidslondonCentroids.json2 KBGeoJSONCC-BY-4.0Center points for London boroughs.properties.id, properties.name
    London Tube LineslondonTubeLines.json78 KBGeoJSONCC-BY-4.0London Underground network lines.properties.name, properties.color

    2. GEOGRAPHIC REFERENCE POINTS (Point Data) 📍

    DatasetFileSizeFormatLicenseDescriptionKey Fields / Join Info
    US Airportsairports.csv205 KBCSVPublic DomainUS airports with codes and coordinates.iata, state, `l...
  14. InFORM Fire Occurrence Data Records

    • data-nifc.opendata.arcgis.com
    • nifc.hub.arcgis.com
    • +4more
    Updated Feb 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Interagency Fire Center (2023). InFORM Fire Occurrence Data Records [Dataset]. https://data-nifc.opendata.arcgis.com/datasets/inform-fire-occurrence-data-records/explore
    Explore at:
    Dataset updated
    Feb 17, 2023
    Dataset authored and provided by
    National Interagency Fire Centerhttps://www.nifc.gov/
    Area covered
    Description

    This data set is part of an ongoing project to consolidate interagency fire point data. The incorporation of all available historical data is in progress.The InFORM (Interagency Fire Occurrence Reporting Modules) FODR (Fire Occurrence Data Records) are the official record of fire events. Built on top of IRWIN (Integrated Reporting of Wildland Fire Information), the FODR starts with an IRWIN record and then captures the final incident information upon certification of the record by the appropriate local authority. This service contains all wildland fire incidents from the InFORM FODR incident service that meet the following criteria:Categorized as a Wildfire (WF) or Prescribed Fire (RX) recordIs Valid and not "quarantined" due to potential conflicts with other recordsNo "fall-off" rules are applied to this service.Service is a real time display of data.Warning: Please refrain from repeatedly querying the service using a relative date range. This includes using the “(not) in the last” operators in a Web Map filter and any reference to CURRENT_TIMESTAMP. This type of query puts undue load on the service and may render it temporarily unavailable.Attributes:ABCDMiscA FireCode used by USDA FS to track and compile cost information for emergency initial attack fire suppression expenditures. for A, B, C & D size class fires on FS lands.ADSPermissionStateIndicates the permission hierarchy that is currently being applied when a system utilizes the UpdateIncident operation.CalculatedAcresA measure of acres calculated (i.e., infrared) from a geospatial perimeter of a fire. More specifically, the number of acres within the current perimeter of a specific, individual incident, including unburned and unburnable islands. The minimum size must be 0.1.ContainmentDateTimeThe date and time a wildfire was declared contained. ControlDateTimeThe date and time a wildfire was declared under control.CreatedBySystemArcGIS Server Username of system that created the IRWIN Incident record.CreatedOnDateTimeDate/time that the Incident record was created.IncidentSizeReported for a fire. The minimum size is 0.1.DiscoveryAcresAn estimate of acres burning upon the discovery of the fire. More specifically when the fire is first reported by the first person that calls in the fire. The estimate should include number of acres within the current perimeter of a specific, individual incident, including unburned and unburnable islands.DispatchCenterIDA unique identifier for a dispatch center responsible for supporting the incident.EstimatedCostToDateThe total estimated cost of the incident to date.FinalAcresReported final acreage of incident.FinalFireReportApprovedByTitleThe title of the person that approved the final fire report for the incident.FinalFireReportApprovedByUnitNWCG Unit ID associated with the individual who approved the final report for the incident.FinalFireReportApprovedDateThe date that the final fire report was approved for the incident.FireBehaviorGeneralA general category describing the manner in which the fire is currently reacting to the influences of fuel, weather, and topography. FireCodeA code used within the interagency wildland fire community to track and compile cost information for emergency fire suppression expenditures for the incident. FireDepartmentIDThe U.S. Fire Administration (USFA) has created a national database of Fire Departments. Most Fire Departments do not have an NWCG Unit ID and so it is the intent of the IRWIN team to create a new field that includes this data element to assist the National Association of State Foresters (NASF) with data collection.FireDiscoveryDateTimeThe date and time a fire was reported as discovered or confirmed to exist. May also be the start date for reporting purposes.FireMgmtComplexityThe highest management level utilized to manage a wildland fire event. FireOutDateTimeThe date and time when a fire is declared out. FSJobCodeA code use to indicate the Forest Service job accounting code for the incident. This is specific to the Forest Service. Usually displayed as 2 char prefix on FireCode.FSOverrideCodeA code used to indicate the Forest Service override code for the incident. This is specific to the Forest Service. Usually displayed as a 4 char suffix on FireCode. For example, if the FS is assisting DOI, an override of 1502 will be used.GACCA code that identifies one of the wildland fire geographic area coordination center at the point of origin for the incident.A geographic area coordination center is a facility that is used for the coordination of agency or jurisdictional resources in support of one or more incidents within a geographic coordination area.IncidentNameThe name assigned to an incident.IncidentShortDescriptionGeneral descriptive location of the incident such as the number of miles from an identifiable town. IncidentTypeCategoryThe Event Category is a sub-group of the Event Kind code and description. The Event Category further breaks down the Event Kind into more specific event categories.IncidentTypeKindA general, high-level code and description of the types of incidents and planned events to which the interagency wildland fire community responds.InitialLatitudeThe latitude location of the initial reported point of origin specified in decimal degrees.InitialLongitudeThe longitude location of the initial reported point of origin specified in decimal degrees.InitialResponseDateTimeThe date/time of the initial response to the incident. More specifically when the IC arrives and performs initial size up. IsFireCauseInvestigatedIndicates if an investigation is underway or was completed to determine the cause of a fire.IsFSAssistedIndicates if the Forest Service provided assistance on an incident outside their jurisdiction.IsReimbursableIndicates the cost of an incident may be another agency’s responsibility.IsTrespassIndicates if the incident is a trespass claim or if a bill will be pursued.LocalIncidentIdentifierA number or code that uniquely identifies an incident for a particular local fire management organization within a particular calendar year.ModifiedBySystemArcGIS Server username of system that last modified the IRWIN Incident record.ModifiedOnDateTimeDate/time that the Incident record was last modified.PercentContainedIndicates the percent of incident area that is no longer active. Reference definition in fire line handbook when developing standard.POOCityThe closest city to the incident point of origin.POOCountyThe County Name identifying the county or equivalent entity at point of origin designated at the time of collection.POODispatchCenterIDA unique identifier for the dispatch center that intersects with the incident point of origin. POOFipsThe code which uniquely identifies counties and county equivalents. The first two digits are the FIPS State code and the last three are the county code within the state.POOJurisdictionalAgencyThe agency having land and resource management responsibility for a incident as provided by federal, state or local law. POOJurisdictionalUnitNWCG Unit Identifier to identify the unit with jurisdiction for the land where the point of origin of a fire falls. POOJurisdictionalUnitParentUnitThe unit ID for the parent entity, such as a BLM State Office or USFS Regional Office, that resides over the Jurisdictional Unit.POOLandownerCategoryMore specific classification of land ownership within land owner kinds identifying the deeded owner at the point of origin at the time of the incident.POOLandownerKindBroad classification of land ownership identifying the deeded owner at the point of origin at the time of the incident.POOProtectingAgencyIndicates the agency that has protection responsibility at the point of origin.POOProtectingUnitNWCG Unit responsible for providing direct incident management and services to a an incident pursuant to its jurisdictional responsibility or as specified by law, contract or agreement. Definition Extension: - Protection can be re-assigned by agreement. - The nature and extent of the incident determines protection (for example Wildfire vs. All Hazard.)POOStateThe State alpha code identifying the state or equivalent entity at point of origin.PredominantFuelGroupThe fuel majority fuel model type that best represents fire behavior in the incident area, grouped into one of seven categories.PredominantFuelModelDescribes the type of fuels found within the majority of the incident area. UniqueFireIdentifierUnique identifier assigned to each wildland fire. yyyy = calendar year, SSUUUU = POO protecting unit identifier (5 or 6 characters), xxxxxx = local incident identifier (6 to 10 characters) FORIDUnique identifier assigned to each incident record in the FODR database.

  15. 96 wells fluorescence reading and R code statistic for analysis

    • zenodo.org
    bin, csv, doc, pdf
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JVD Molino; JVD Molino (2024). 96 wells fluorescence reading and R code statistic for analysis [Dataset]. http://doi.org/10.5281/zenodo.1119285
    Explore at:
    doc, csv, pdf, binAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    JVD Molino; JVD Molino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    Data points present in this dataset were obtained following the subsequent steps: To assess the secretion efficiency of the constructs, 96 colonies from the selection plates were evaluated using the workflow presented in Figure Workflow. We picked transformed colonies and cultured in 400 μL TAP medium for 7 days in Deep-well plates (Corning Axygen®, No.: PDW500CS, Thermo Fisher Scientific Inc., Waltham, MA), covered with Breathe-Easy® (Sigma-Aldrich®). Cultivation was performed on a rotary shaker, set to 150 rpm, under constant illumination (50 μmol photons/m2s). Then 100 μL sample were transferred clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA) and fluorescence was measured using an Infinite® M200 PRO plate reader (Tecan, Männedorf, Switzerland). Fluorescence was measured at excitation 575/9 nm and emission 608/20 nm. Supernatant samples were obtained by spinning Deep-well plates at 3000 × g for 10 min and transferring 100 μL from each well to the clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA), followed by fluorescence measurement. To compare the constructs, R Statistic version 3.3.3 was used to perform one-way ANOVA (with Tukey's test), and to test statistical hypotheses, the significance level was set at 0.05. Graphs were generated in RStudio v1.0.136. The codes are deposit herein.

    Info

    ANOVA_Turkey_Sub.R -> code for ANOVA analysis in R statistic 3.3.3

    barplot_R.R -> code to generate bar plot in R statistic 3.3.3

    boxplotv2.R -> code to generate boxplot in R statistic 3.3.3

    pRFU_+_bk.csv -> relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    sup_+_bl.csv -> supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    sup_raw.csv -> supernatant mCherry fluorescence dataset of 96 colonies for each construct.

    who_+_bl2.csv -> whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    who_raw.csv -> whole culture mCherry fluorescence dataset of 96 colonies for each construct.

    who_+_Chlo.csv -> whole culture chlorophyll fluorescence dataset of 96 colonies for each construct.

    Anova_Output_Summary_Guide.pdf -> Explain the ANOVA files content

    ANOVA_pRFU_+_bk.doc -> ANOVA of relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    ANOVA_sup_+_bk.doc -> ANOVA of supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    ANOVA_who_+_bk.doc -> ANOVA of whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    ANOVA_Chlo.doc -> ANOVA of whole culture chlorophyll fluorescence of all constructs, plus average and standard deviation values.

    Consider citing our work.

    Molino JVD, de Carvalho JCM, Mayfield SP (2018) Comparison of secretory signal peptides for heterologous protein expression in microalgae: Expanding the secretion portfolio for Chlamydomonas reinhardtii. PLoS ONE 13(2): e0192433. https://doi.org/10.1371/journal. pone.0192433

  16. t

    DEFRA Water Quality Data Archive - Dataset - Data Place Plymouth

    • plymouth.thedata.place
    Updated Mar 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). DEFRA Water Quality Data Archive - Dataset - Data Place Plymouth [Dataset]. https://plymouth.thedata.place/dataset/defra-water-quality-data-archive-api
    Explore at:
    Dataset updated
    Mar 8, 2022
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Plymouth
    Description

    Samples are taken from sampling points round the country and then analysed by laboratories to measure aspects of the water quality or the environment at the sampling point. The archive provides data on these measurements and samples dating from 2000 to present day. It contains 58 million measurements on nearly 4 million samples from 58 thousand sampling points.

  17. Z

    WiFi RTT RSS dataset for indoor positioning

    • data.niaid.nih.gov
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Feng, Xu; Nguyen, Khuong An; Luo, Zhiyuan (2024). WiFi RTT RSS dataset for indoor positioning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11558191
    Explore at:
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Royal Holloway University of London
    Authors
    Feng, Xu; Nguyen, Khuong An; Luo, Zhiyuan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the first batch of WiFi RSS RTT datasets with LOS conditions we published. Please see https://doi.org/10.5281/zenodo.11558792 for the second batch.

    Please do use version 2 for better quality.

    We provide publicly available datasets of three different indoor scenarios: building floor, office and apartment. The datasets contain both WiFi RSS and RTT signal measures with groud truth coordinates label and LOS condition label.

    1.Building Floor

    This is a detailed WiFi RTT and RSS dataset of a whole floor of a university building, of moare than 92 x 15 square metres. We divided the area of interest was divided into discrete grids and labelled them with correct ground truth coordinates and the LoS APs from the grid. The dataset contains WiFi RTT and RSS signal measures recorded in 642 reference points for 3 days and is well separated so that training points and testing points will not overlap.

    1. Office

    Office scenario is of more than 4.5 x 5.5 square metres. 3 APs are set to cover the whole space. At least two LOS AP could be seen at any reference point (RP).

    3.Apartment

    Apartment scenario is of more than 7.7 x 9.4 square metres.Four APs were leveraged to generate WiFi signal measures for this testbed. Note that AP 1 in the apartment dataset was positioned so that it could had an NLOS path to most of the testbed.

    Collection methodology

    The APs utilised were Google WiFi Router AC-1304, the smartphone used to collect the data was Google Pixel 3 with Android 9.

    The ground truth coordinates were collected using fixed tile size on the floor and manual post-it note markers.

    Only RTT-enabled APs were included in the dataset.

    The features of the datasets

    The features of the building floor dataset are as follows:

    Testbed area: 92 Ă— 15 m2

    Grid size: 0.6 Ă— 0.6 m2

    Number of AP: 13

    Number of reference points: 642

    Samples per reference point: 120

    Number of all data samples: 77040

    Number of training samples: 57960

    Number of testing samples: 19080

    Signal measure: WiFi RTT, WiFi RSS

    Collection time interval: 3 days

    The features of the office dataset are as follows:

    Testbed area: 4.5 Ă— 5.5 m2

    Grid size: 0.455 Ă— 0.455 m2

    Number of AP: 3

    Reference points: 37

    Samples per reference point: 120

    Data samples: 4,440

    Training samples: 3,240

    Testing samples: 1,200

    Signal measure: WiFi RTT, WiFi RSS

    Other information: LOS condition of every AP

    Collection time: 1 day

    Notes: A LOS scenario

    The features of the apartment dataset are as follows:

    Testbed area: 7.7 Ă— 9.4 m2

    Grid size: 0.48 Ă— 0.48 m2

    Number of AP: 4

    Reference points: 110

    Samples per reference point: 120

    Data samples: 13,200

    Training samples: 9,720

    Testing samples: 3,480

    Signal measure: WiFi RTT, WiFi RSS

    Other information: LOS condition of every AP

    Collection time: 1 day

    Notes: Contains an AP with NLOS paths for most of the RPs

    Dataset explanation

    The columns of the dataset are as follows:

    Column 'X': the X coordinates of the sample.

    Column 'Y': the Y coordinates of the sample.

    Column 'AP1 RTT(mm)', 'AP2 RTT(mm)', ..., 'AP13 RTT(mm)': the RTT measure from corresponding AP at a reference point.

    Column 'AP1 RSS(dBm)', 'AP2 RSS(dBm)', ..., 'AP13 RSS(dBm)': the RSS measure from corresponding AP at a reference point.

    Column 'LOS APs': indicating which AP has a LOS to this reference point.

    Please note:

    The RSS value -200 dBm indicates that the AP is too far away from the current reference point and no signals could be heard from it.

    The RTT value 100,000 mm indicates that no signal is received from the specific AP.

    Citation request

    When using this dataset, please cite the following two items:Feng, X., Nguyen, K. A., & Luo, Z. (2024). WiFi RTT RSS dataset for indoor positioning [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11558192@article{feng2023wifi, title={WiFi round-trip time (RTT) fingerprinting: an analysis of the properties and the performance in non-line-of-sight environments}, author={Feng, Xu and Nguyen, Khuong an and Luo, Zhiyuan}, journal={Journal of Location Based Services}, volume={17}, number={4}, pages={307--339}, year={2023}, publisher={Taylor & Francis} }

  18. Marketing Insights for E-Commerce Company

    • kaggle.com
    zip
    Updated Oct 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishi Kumar (2023). Marketing Insights for E-Commerce Company [Dataset]. https://www.kaggle.com/datasets/rishikumarrajvansh/marketing-insights-for-e-commerce-company
    Explore at:
    zip(628618 bytes)Available download formats
    Dataset updated
    Oct 27, 2023
    Authors
    Rishi Kumar
    Description

    ** Inputs related to Analysis for additional reference:** 1. Why do we need customer Segmentation? As every customer is unique and can be targeted in different ways. The Customer segmentation plays an important role in this case. The segmentation helps to understand profiles of customers and can be helpful in defining cross sell/upsell/activation/acquisition strategies. 2. What is RFM Segmentation? RFM Segmentation is an acronym of recency, frequency and monetary based segmentation. Recency is about when the last order of a customer. It means the number of days since a customer made the last purchase. If it’s a case for a website or an app, this could be interpreted as the last visit day or the last login time. Frequency is about the number of purchases in a given period. It could be 3 months, 6 months or 1 year. So we can understand this value as for how often or how many customers used the product of a company. The bigger the value is, the more engaged the customers are. Alternatively We can define, average duration between two transactions Monetary is the total amount of money a customer spent in that given period. Therefore big spenders will be differentiated with other customers such as MVP or VIP. 3. What is LTV and How to define it? In the current world, almost every retailer promotes its subscription and this is further used to understand the customer lifetime. Retailer can manage these customers in better manner if they know which customer is high life time value. Customer lifetime value (LTV) can also be defined as the monetary value of a customer relationship, based on the present value of the projected future cash flows from the customer relationship. Customer lifetime value is an important concept in that it encourages firms to shift their focus from quarterly profits to the long-term health of their customer relationships. Customer lifetime value is an important metric because it represents an upper limit on spending to acquire new customers. For this reason it is an important element in calculating payback of advertising spent in marketing mix modelling. 4. Why do need to predict Customer Lifetime Value? The LTV is an important building block in campaign design and marketing mix management. Although targeting models can help to identify the right customers to be targeted, LTV analysis can help to quantify the expected outcome of targeting in terms of revenues and profits. The LTV is also important because other major metrics and decision thresholds can be derived from it. For example, the LTV is naturally an upper limit on the spending to acquire a customer, and the sum of the LTVs for all of the customers of a brand, known as the customer equity, is a major metric forbusiness valuations. Similarly to many other problems of marketing analytics and algorithmic marketing, LTV modelling can be approached from descriptive, predictive, and prescriptive perspectives. 5. How Next Purchase Day helps to Retailers? Our objective is to analyse when our customer will purchase products in the future so for such customers we can build strategy and can come up with strategies and marketing campaigns accordingly. a. Group-1: Customers who will purchase in more than 60 days b. Group-2: Customers who will purchase in 30-60 days c. Group-3: Customers who will purchase in 0-30 days 6. What is Cohort Analysis? How it will be helpful? A cohort is a group of users who share a common characteristic that is identified in this report by an Analytics dimension. For example, all users with the same Acquisition Date belong to the same cohort. The Cohort Analysis report lets you isolate and analyze cohort behaviour. Cohort analysis in e-commerce means to monitor your customers’ behaviour based on common traits they share – the first product they bought, when they became customers, etc. - - to find patterns and tailor marketing activities for the group.

    Transaction data has been provided for the period of 1st Jan 2019 to 31st Dec 2019. The below data sets have been provided. Online_Sales.csv: This file contains actual orders data (point of Sales data) at transaction level with below variables. CustomerID: Customer unique ID Transaction_ID: Transaction Unique ID Transaction_Date: Date of Transaction Product_SKU: SKU ID – Unique Id for product Product_Description: Product Description Product_Cateogry: Product Category Quantity: Number of items ordered Avg_Price: Price per one quantity Delivery_Charges: Charges for delivery Coupon_Status: Any discount coupon applied Customers_Data.csv: This file contains customer’s demographics. CustomerID: Customer Unique ID Gender: Gender of customer Location: Location of Customer Tenure_Months: Tenure in Months Discount_Coupon.csv: Discount coupons have been given for different categories in different months Month: Discount coupon applied in that month Product_Category: Product categor...

  19. N

    Center Point, IA Population Dataset: Yearly Figures, Population Change, and...

    • neilsberg.com
    csv, json
    Updated Sep 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). Center Point, IA Population Dataset: Yearly Figures, Population Change, and Percent Change Analysis [Dataset]. https://www.neilsberg.com/research/datasets/6e2cc13c-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Sep 18, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Iowa, Center Point
    Variables measured
    Annual Population Growth Rate, Population Between 2000 and 2022, Annual Population Growth Rate Percent
    Measurement technique
    The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2022. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2022. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Center Point population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Center Point across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

    Key observations

    In 2022, the population of Center Point was 2,559, a 0.47% decrease year-by-year from 2021. Previously, in 2021, Center Point population was 2,571, a decline of 0.54% compared to a population of 2,585 in 2020. Over the last 20 plus years, between 2000 and 2022, population of Center Point increased by 545. In this period, the peak population was 2,585 in the year 2020. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

    Data Coverage:

    • From 2000 to 2022

    Variables / Data Columns

    • Year: This column displays the data year (Measured annually and for years 2000 to 2022)
    • Population: The population for the specific year for the Center Point is shown in this column.
    • Year on Year Change: This column displays the change in Center Point population for each year compared to the previous year.
    • Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Center Point Population by Year. You can refer the same here

  20. N

    West Point, UT Annual Population and Growth Analysis Dataset: A...

    • neilsberg.com
    csv, json
    Updated Jul 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). West Point, UT Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in West Point from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/west-point-ut-population-by-year/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    West Point, Utah
    Variables measured
    Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
    Measurement technique
    The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the West Point population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of West Point across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

    Key observations

    In 2023, the population of West Point was 12,479, a 5.01% increase year-by-year from 2022. Previously, in 2022, West Point population was 11,884, an increase of 3.75% compared to a population of 11,455 in 2021. Over the last 20 plus years, between 2000 and 2023, population of West Point increased by 6,371. In this period, the peak population was 12,479 in the year 2023. The numbers suggest that the population has not reached its peak yet and is showing a trend of further growth. Source: U.S. Census Bureau Population Estimates Program (PEP).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

    Data Coverage:

    • From 2000 to 2023

    Variables / Data Columns

    • Year: This column displays the data year (Measured annually and for years 2000 to 2023)
    • Population: The population for the specific year for the West Point is shown in this column.
    • Year on Year Change: This column displays the change in West Point population for each year compared to the previous year.
    • Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for West Point Population by Year. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ehsan Zafari (2024). Iris dataset [Dataset]. https://www.kaggle.com/datasets/ehsanzafari/iris-dataset
Organization logo

Iris dataset

Explore at:
zip(955 bytes)Available download formats
Dataset updated
Jan 16, 2024
Authors
Ehsan Zafari
License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

The Iris dataset is a classic dataset in the field of machine learning and statistics. It's often used for demonstrating various data analysis, machine learning, and statistical techniques. Here are some key details about it:

Background - Origin: The dataset was introduced by the British statistician and biologist Ronald Fisher in his 1936 paper titled "The use of multiple measurements in taxonomic problems." - Purpose: Fisher developed the dataset as an example of linear discriminant analysis.

Data Composition - Data Points: The dataset consists of 150 samples from three species of Iris flowers: Iris Setosa, Iris Versicolour, and Iris Virginica. - Features: There are four features measured in centimeters for each sample: 1. Sepal Length 2. Sepal Width 3. Petal Length 4. Petal Width - Classes: The dataset contains three classes, corresponding to the three species of Iris. Each class has 50 samples.

Usage - Classification: The Iris dataset is widely used for classification tasks, especially to illustrate the principles of supervised machine learning algorithms. - Testing Algorithms: It's often used to test out algorithms for linear regression, classification, and clustering due to its simplicity and small size. - Educational Purpose: Because of its clarity and simplicity, it's frequently used in teaching data science and machine learning.

Characteristics - Simple and Clean: The dataset is straightforward, with minimal preprocessing required, making it ideal for beginners. - Well-Behaved Classes: The species are relatively well separated, though there's some overlap between Versicolor and Virginica. - Multivariate Data: It involves understanding the relationship between multiple variables (the four features).

Applications - Benchmarking: The Iris dataset serves as a benchmark for evaluating the performance of different algorithms. - Visualization**: It's great for practicing data visualization, especially for exploring techniques like scatter plots, box plots, and pair plots to understand feature relationships.

Despite its simplicity, the Iris dataset remains one of the most famous datasets in the world of data science and machine learning. It serves as an excellent starting point for anyone new to the field and remains a baseline for testing algorithms and teaching concepts.

Search
Clear search
Close search
Google apps
Main menu