Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The Iris dataset is a classic dataset in the field of machine learning and statistics. It's often used for demonstrating various data analysis, machine learning, and statistical techniques. Here are some key details about it:
Background - Origin: The dataset was introduced by the British statistician and biologist Ronald Fisher in his 1936 paper titled "The use of multiple measurements in taxonomic problems." - Purpose: Fisher developed the dataset as an example of linear discriminant analysis.
Data Composition - Data Points: The dataset consists of 150 samples from three species of Iris flowers: Iris Setosa, Iris Versicolour, and Iris Virginica. - Features: There are four features measured in centimeters for each sample: 1. Sepal Length 2. Sepal Width 3. Petal Length 4. Petal Width - Classes: The dataset contains three classes, corresponding to the three species of Iris. Each class has 50 samples.
Usage - Classification: The Iris dataset is widely used for classification tasks, especially to illustrate the principles of supervised machine learning algorithms. - Testing Algorithms: It's often used to test out algorithms for linear regression, classification, and clustering due to its simplicity and small size. - Educational Purpose: Because of its clarity and simplicity, it's frequently used in teaching data science and machine learning.
Characteristics - Simple and Clean: The dataset is straightforward, with minimal preprocessing required, making it ideal for beginners. - Well-Behaved Classes: The species are relatively well separated, though there's some overlap between Versicolor and Virginica. - Multivariate Data: It involves understanding the relationship between multiple variables (the four features).
Applications - Benchmarking: The Iris dataset serves as a benchmark for evaluating the performance of different algorithms. - Visualization**: It's great for practicing data visualization, especially for exploring techniques like scatter plots, box plots, and pair plots to understand feature relationships.
Despite its simplicity, the Iris dataset remains one of the most famous datasets in the world of data science and machine learning. It serves as an excellent starting point for anyone new to the field and remains a baseline for testing algorithms and teaching concepts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Dana Point population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Dana Point across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2023, the population of Dana Point was 32,567, a 0.25% decrease year-by-year from 2022. Previously, in 2022, Dana Point population was 32,647, a decline of 0.51% compared to a population of 32,815 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Dana Point decreased by 2,634. In this period, the peak population was 35,992 in the year 2009. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Dana Point Population by Year. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2).
The variables contained therein are defined as follows:
case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases).
patid: a unique patient identifier.
time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer,
ncons: number of consultations per month.
period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point.
burden: binary variable denoting membership of one of two multimorbidity burden groups.
We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do).
Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.
Facebook
TwitterThe Apalachicola Bay National Estuarine Research Reserve and the NOAA Office for Coastal Management worked together to map benthic habitats within Apalachicola Bay, Florida. The bay and the lower portions of four distributaries were surveyed on 11-22 October 1999 using three benthic sampling techniques. This data set represents the information gathered from a RoxAnn acoustic sensor. The instrument was used to characterize bottom type by extracting data on bottom roughness and bottom hardness from the primary and secondary sounder echoes. The data is classified on-the-fly, using the Sediment Profile Images and grab samples collected for field validation, and subject to a post-processing classification. The RoxAnn data points were exported into a geographic information system (GIS) and post-processed to remove unreliable data points and re-classified. This data set is comprised of the cleaned, attributed point data. The attributes include location, date, time, depth, field derived classification, and the classification derived from post-processing the data. Original contact information: Contact Org: NOAA Office for Coastal Management Phone: 843-740-1202 Email: coastal.info@noaa.gov
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It corresponds to a unique transaction identified by Transaction_ID and includes details such as Date, Product_ID, Product_Name, Quantity, Unit_Price, Total_Price, Customer_ID, Payment_Method, and Store_Location. The synthetic data simulates diverse transactions with random product information, quantities, prices, customer IDs, payment methods, and store locations. This dataset provides a foundation for analyzing and understanding patterns within a Point of Sale environment, facilitating research or development in related fields such as retail analytics, inventory management, and customer behavior analysis.
Facebook
TwitterThis feature dataset contains the control points used to validate the accuracies of the interpolated water density rasters for the Gulf of Maine. These control points were selected randomly from the water density data points, using Hawth's Create Random Selection Tool. Twenty-five percent of each seasonal bin (for each year and at each depth) were randomly selected and set aside for validation. For example, if there were 1,000 water density data points for the fall (September, October, November) 2003 at 0 meters, then 250 of those points were randomly selected, removed and set aside to assess the accuracy of interpolated surface. The naming convention of the validation point feature class includes the year (or years), the season, and the depth (in meters) it was selected from. So for example, the name: ValidationPoints_1997_2004_Fall_0m would indicate that this point feature class was randomly selected from water density points that were at 0 meters in the fall between 1997-2004. The seasons were defined using the same months as the remote sensing data--namely, Fall = September, October, November; Winter = December, January, February; Spring = March, April, May; and Summer = June, July, August.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This comprehensive dataset offers an in-depth exploration into US travel check-ins from Instagram. It includes detailed data scraped from Instagram, such as the location of each check-in, the USIndex for each state, average temperature for each state per month, and crime rate per state. In addition to location and time information, this dataset also provides latitude and longitude coordinates for every entry. This extensive collection of data is invaluable for those interested in studying various aspects of movement within the United States. With detailed insights on factors like climate conditions and economic health of a region at a given point in time, this dataset can help uncover fascinating trends regarding how travelers choose their destinations and how they experience their journeys around the country
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This Kaggle dataset - US Travel Check-Ins Analysis - provides valuable insights for travel researchers, marketers and businesses in the travel industry. It contains check-in location, USIndex rating (economic health of each state), average temperature, and crime rate per state. Latitude and longitude of each check-ins are also provided with added geographic context to help you visualize the data.
This guide will show you how to use this dataset for your research or business venture.
Step 1: Prepare your data First and foremost, it is important to cleanse your data before you can analyze it. Depending on what sort of analysis needs to be conducted (e.g., time series analysis) you will need to select the applicable columns from the dataset that match your needs best and exclude any unnecessary columns such as dates or season related data points as they are not relevant here. Furthermore, variable formatting should be consistent across all instances in a variable/column category as well (elevation is a good example here). You can always double check that everything is formatted correctly by running a quick summary on selected columns using conditional queries like df['var'].describe() command in Python for descriptive results about an entire column’s statistical makeup including mean values, quartile ranges etc..
Step 2: Explore & Analyze Your Data Graphically Once the data has been prepped properly you can start visualizing it in order to gain better insights into any trends or patterns that may be present within it when compared with other datasets or information sources simultaneously such as weather forecasts or nationwide trend indicators etc.. Grafana dashboards are feasible solutions when multiple dataset need to be compared but depending on what type of graphs/charts being used Excel worksheet formats can offer great customization options flexiblity along with various export file types (.csv; .jpegs; .pdfs). Plotting markers onto map applications like Google Maps API offers more geographical awareness that could useful when analyzing location dependent variables too which means we have one advantage over manual inspection tasks just by leveraging existing software applications alongside publicly available APIs!
Step 3: Interpretation & Hypothesis Testing
After generating informative graphical interpretation from exploratory visualizations the next step would involve testing out various hypotheses based on established correlations between different variables derived from overall quantitative estimates vizualizations regarding distribution trends across different regions tends towards geographical areas where certain logistical processes could yeild higher success ratios giving potential customers greater satisfaction than
- Travel trends analysis: Using this dataset, researchers could track which areas of the US are popular destinations based on travel check-ins and spot any interesting trends or correlations in terms of geography, seasonal changes, economic health or crime rates.
- Predictive Modeling: By using various features from this dataset such as average temperature, US Index and crime rate, predictors could be developed to suggest how safe an area would feel to a tourist based on their current location and other predetermined variables they choose to input into the model.
- Trip Planning Tool: The dataset can also be used to develop a tool that quickly allows travelers to plan trips according to their preferences in terms of duration and budget as well a...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Dana Point population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Dana Point across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2022, the population of Dana Point was 32,465, a 1.00% decrease year-by-year from 2021. Previously, in 2021, Dana Point population was 32,794, a decline of 0.82% compared to a population of 33,066 in 2020. Over the last 20 plus years, between 2000 and 2022, population of Dana Point decreased by 2,736. In this period, the peak population was 35,992 in the year 2009. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Dana Point Population by Year. You can refer the same here
Facebook
TwitterXverum’s Point of Interest (POI) Data is a comprehensive dataset containing 230M+ verified locations across 5000 business categories. Our dataset delivers structured geographic data, business attributes, location intelligence, and mapping insights, making it an essential tool for GIS applications, market research, urban planning, and competitive analysis.
With regular updates and continuous POI discovery, Xverum ensures accurate, up-to-date information on businesses, landmarks, retail stores, and more. Delivered in bulk to S3 Bucket and cloud storage, our dataset integrates seamlessly into mapping, geographic information systems, and analytics platforms.
🔥 Key Features:
Extensive POI Coverage: âś… 230M+ Points of Interest worldwide, covering 5000 business categories. âś… Includes retail stores, restaurants, corporate offices, landmarks, and service providers.
Geographic & Location Intelligence Data: ✅ Latitude & longitude coordinates for mapping and navigation applications. ✅ Geographic classification, including country, state, city, and postal code. ✅ Business status tracking – Open, temporarily closed, or permanently closed.
Continuous Discovery & Regular Updates: âś… New POIs continuously added through discovery processes. âś… Regular updates ensure data accuracy, reflecting new openings and closures.
Rich Business Insights: âś… Detailed business attributes, including company name, category, and subcategories. âś… Contact details, including phone number and website (if available). âś… Consumer review insights, including rating distribution and total number of reviews (additional feature). âś… Operating hours where available.
Ideal for Mapping & Location Analytics: âś… Supports geospatial analysis & GIS applications. âś… Enhances mapping & navigation solutions with structured POI data. âś… Provides location intelligence for site selection & business expansion strategies.
Bulk Data Delivery (NO API): âś… Delivered in bulk via S3 Bucket or cloud storage. âś… Available in structured format (.json) for seamless integration.
🏆Primary Use Cases:
Mapping & Geographic Analysis: 🔹 Power GIS platforms & navigation systems with precise POI data. 🔹 Enhance digital maps with accurate business locations & categories.
Retail Expansion & Market Research: 🔹 Identify key business locations & competitors for market analysis. 🔹 Assess brand presence across different industries & geographies.
Business Intelligence & Competitive Analysis: 🔹 Benchmark competitor locations & regional business density. 🔹 Analyze market trends through POI growth & closure tracking.
Smart City & Urban Planning: 🔹 Support public infrastructure projects with accurate POI data. 🔹 Improve accessibility & zoning decisions for government & businesses.
💡 Why Choose Xverum’s POI Data?
Access Xverum’s 230M+ POI dataset for mapping, geographic analysis, and location intelligence. Request a free sample or contact us to customize your dataset today!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the West Point population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of West Point across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2022, the population of West Point was 11,892, a 3.81% increase year-by-year from 2021. Previously, in 2021, West Point population was 11,456, an increase of 3.58% compared to a population of 11,060 in 2020. Over the last 20 plus years, between 2000 and 2022, population of West Point increased by 5,784. In this period, the peak population was 11,892 in the year 2022. The numbers suggest that the population has not reached its peak yet and is showing a trend of further growth. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for West Point Population by Year. You can refer the same here
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
By Department of Energy [source]
The Building Energy Data Book (2011) is an invaluable resource for gaining insight into the current state of energy consumption in the buildings sector. This dataset provides comprehensive data on residential, commercial and industrial building energy consumption, construction techniques, building technologies and characteristics. With this resource, you can get an in-depth understanding of how energy is used in various types of buildings - from single family homes to large office complexes - as well as its impact on the environment. The BTO within the U.S Department of Energy's Office of Energy Efficiency and Renewable Energy developed this dataset to provide a wealth of knowledge for researchers, policy makers, engineers and even everyday observers who are interested in learning more about our built environment and its energy usage patterns
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides comprehensive information regarding energy consumption in the buildings sector of the United States. It contains a number of key variables which can be used to analyze and explore the relations between energy consumption and building characteristics, technologies, and construction. The data is provided in both CSV format as well as tabular format which can make it helpful for those who prefer to use programs like Excel or other statistical modeling software.
In order to get started with this dataset we've developed a guide outlining how to effectively use it for your research or project needs.
Understand what's included: Before you start analyzing the data, you should read through the provided documentation so that you fully understand what is included in the datasets. You'll want to be aware of any potential limitations or requirements associated with each type of data point so that your results are valid and reliable when drawing conclusions from them.
Clean up any outliers: You may need to take some time upfront investigating suspicious outliers within your dataset before using it in any further analyses — otherwise, they can skew results down the road if not dealt with first-hand! Furthermore, they could also make complex statistical modeling more difficult as well since they artificially inflate values depending on their magnitude within each example data point (i.e., one outlier could affect an entire model’s prior distributions). Missing values should also be accounted for too since these may not always appear obvious at first glance when reviewing a table or graphical representation - but accurate statistics must still be obtained either way no matter how messy things seem!
Exploratory data analysis: After cleaning up your dataset you'll want to do some basic exploring by visualizing different types of summaries like boxplots, histograms and scatter plots etc.. This will give you an initial case into what trends might exist within certain demographic/geographic/etc.. regions & variables which can then help inform future predictive models when needed! Additionally this step will highlight any clear discontinuous changes over time due over-generalization (if applicable), making sure predictors themselves don’t become part noise instead contributing meaningful signals towards overall effect predictions accuracy etc…
Analyze key metrics & observations: Once exploratory analyses have been carried out on rawsamples post-processing steps are next such as analyzing metrics such ascorrelations amongst explanatory functions; performing significance testing regression models; imputing missing/outlier values and much more depending upon specific project needs at hand… Additionally – interpretation efforts based
- Creating an energy efficiency rating system for buildings - Using the dataset, an organization can develop a metric to rate the energy efficiency of commercial and residential buildings in a standardized way.
- Developing targeted campaigns to raise awareness about energy conservation - Analyzing data from this dataset can help organizations identify areas of high energy consumption and create targeted campaigns and incentives to encourage people to conserve energy in those areas.
- Estimating costs associated with upgrading building technologies - By evaluating various trends in building technologies and their associated costs, decision-makers can determine the most cost-effective option when it comes time to upgrade their structures' energy efficiency...
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Similar to others who have created HR data sets, we felt that the lack of data out there for HR was limiting. It is very hard for someone to test new systems or learn People Analytics in the HR space. The only dataset most HR practitioners have is their real employee data and there are a lot of reasons why you would not want to use that when experimenting. We hope that by providing this dataset with an evergrowing variation of data points, others can learn and grow their HR data analytics and systems knowledge.
Some example test cases where someone might use this dataset:
HR Technology Testing and Mock-Ups Engagement survey tools HCM tools BI Tools Learning To Code For People Analytics Python/R/SQL HR Tech and People Analytics Educational Courses/Tools
The core data CompanyData.txt has the basic demographic data about a worker. We treat this as the core data that you can join future data sets to.
Please read the Readme.md for additional information about this along with the Changelog for additional updates as they are made.
Initial names, addresses, and ages were generated using FakenameGenerator.com. All additional details including Job, compensation, and additional data sets were created by the Koluit team using random generation in Excel.
Our hope is this data is used in the HR or Research space to experiment and learn using HR data. Some examples that we hope this data will be used are listed above.
Have any suggestions for additions to the data? See any issues with our data? Want to use it for your project? Please reach out to us! https://koluit.com/ ryan@koluit.com
Facebook
TwitterLearn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets
Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.
Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.
airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).windvectors.csv, annual-precip.json).This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Map (1:10m) | us-10m.json | 627 KB | TopoJSON | CC-BY-4.0 | US state and county boundaries. Contains states and counties objects. Ideal for choropleths. | id (FIPS code) property on geometries |
| World Map (1:110m) | world-110m.json | 117 KB | TopoJSON | CC-BY-4.0 | World country boundaries. Contains countries object. Suitable for world-scale viz. | id property on geometries |
| London Boroughs | londonBoroughs.json | 14 KB | TopoJSON | CC-BY-4.0 | London borough boundaries. | properties.BOROUGHN (name) |
| London Centroids | londonCentroids.json | 2 KB | GeoJSON | CC-BY-4.0 | Center points for London boroughs. | properties.id, properties.name |
| London Tube Lines | londonTubeLines.json | 78 KB | GeoJSON | CC-BY-4.0 | London Underground network lines. | properties.name, properties.color |
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Airports | airports.csv | 205 KB | CSV | Public Domain | US airports with codes and coordinates. | iata, state, `l... |
Facebook
TwitterThis data set is part of an ongoing project to consolidate interagency fire point data. The incorporation of all available historical data is in progress.The InFORM (Interagency Fire Occurrence Reporting Modules) FODR (Fire Occurrence Data Records) are the official record of fire events. Built on top of IRWIN (Integrated Reporting of Wildland Fire Information), the FODR starts with an IRWIN record and then captures the final incident information upon certification of the record by the appropriate local authority. This service contains all wildland fire incidents from the InFORM FODR incident service that meet the following criteria:Categorized as a Wildfire (WF) or Prescribed Fire (RX) recordIs Valid and not "quarantined" due to potential conflicts with other recordsNo "fall-off" rules are applied to this service.Service is a real time display of data.Warning: Please refrain from repeatedly querying the service using a relative date range. This includes using the “(not) in the last” operators in a Web Map filter and any reference to CURRENT_TIMESTAMP. This type of query puts undue load on the service and may render it temporarily unavailable.Attributes:ABCDMiscA FireCode used by USDA FS to track and compile cost information for emergency initial attack fire suppression expenditures. for A, B, C & D size class fires on FS lands.ADSPermissionStateIndicates the permission hierarchy that is currently being applied when a system utilizes the UpdateIncident operation.CalculatedAcresA measure of acres calculated (i.e., infrared) from a geospatial perimeter of a fire. More specifically, the number of acres within the current perimeter of a specific, individual incident, including unburned and unburnable islands. The minimum size must be 0.1.ContainmentDateTimeThe date and time a wildfire was declared contained. ControlDateTimeThe date and time a wildfire was declared under control.CreatedBySystemArcGIS Server Username of system that created the IRWIN Incident record.CreatedOnDateTimeDate/time that the Incident record was created.IncidentSizeReported for a fire. The minimum size is 0.1.DiscoveryAcresAn estimate of acres burning upon the discovery of the fire. More specifically when the fire is first reported by the first person that calls in the fire. The estimate should include number of acres within the current perimeter of a specific, individual incident, including unburned and unburnable islands.DispatchCenterIDA unique identifier for a dispatch center responsible for supporting the incident.EstimatedCostToDateThe total estimated cost of the incident to date.FinalAcresReported final acreage of incident.FinalFireReportApprovedByTitleThe title of the person that approved the final fire report for the incident.FinalFireReportApprovedByUnitNWCG Unit ID associated with the individual who approved the final report for the incident.FinalFireReportApprovedDateThe date that the final fire report was approved for the incident.FireBehaviorGeneralA general category describing the manner in which the fire is currently reacting to the influences of fuel, weather, and topography. FireCodeA code used within the interagency wildland fire community to track and compile cost information for emergency fire suppression expenditures for the incident. FireDepartmentIDThe U.S. Fire Administration (USFA) has created a national database of Fire Departments. Most Fire Departments do not have an NWCG Unit ID and so it is the intent of the IRWIN team to create a new field that includes this data element to assist the National Association of State Foresters (NASF) with data collection.FireDiscoveryDateTimeThe date and time a fire was reported as discovered or confirmed to exist. May also be the start date for reporting purposes.FireMgmtComplexityThe highest management level utilized to manage a wildland fire event. FireOutDateTimeThe date and time when a fire is declared out. FSJobCodeA code use to indicate the Forest Service job accounting code for the incident. This is specific to the Forest Service. Usually displayed as 2 char prefix on FireCode.FSOverrideCodeA code used to indicate the Forest Service override code for the incident. This is specific to the Forest Service. Usually displayed as a 4 char suffix on FireCode. For example, if the FS is assisting DOI, an override of 1502 will be used.GACCA code that identifies one of the wildland fire geographic area coordination center at the point of origin for the incident.A geographic area coordination center is a facility that is used for the coordination of agency or jurisdictional resources in support of one or more incidents within a geographic coordination area.IncidentNameThe name assigned to an incident.IncidentShortDescriptionGeneral descriptive location of the incident such as the number of miles from an identifiable town. IncidentTypeCategoryThe Event Category is a sub-group of the Event Kind code and description. The Event Category further breaks down the Event Kind into more specific event categories.IncidentTypeKindA general, high-level code and description of the types of incidents and planned events to which the interagency wildland fire community responds.InitialLatitudeThe latitude location of the initial reported point of origin specified in decimal degrees.InitialLongitudeThe longitude location of the initial reported point of origin specified in decimal degrees.InitialResponseDateTimeThe date/time of the initial response to the incident. More specifically when the IC arrives and performs initial size up. IsFireCauseInvestigatedIndicates if an investigation is underway or was completed to determine the cause of a fire.IsFSAssistedIndicates if the Forest Service provided assistance on an incident outside their jurisdiction.IsReimbursableIndicates the cost of an incident may be another agency’s responsibility.IsTrespassIndicates if the incident is a trespass claim or if a bill will be pursued.LocalIncidentIdentifierA number or code that uniquely identifies an incident for a particular local fire management organization within a particular calendar year.ModifiedBySystemArcGIS Server username of system that last modified the IRWIN Incident record.ModifiedOnDateTimeDate/time that the Incident record was last modified.PercentContainedIndicates the percent of incident area that is no longer active. Reference definition in fire line handbook when developing standard.POOCityThe closest city to the incident point of origin.POOCountyThe County Name identifying the county or equivalent entity at point of origin designated at the time of collection.POODispatchCenterIDA unique identifier for the dispatch center that intersects with the incident point of origin. POOFipsThe code which uniquely identifies counties and county equivalents. The first two digits are the FIPS State code and the last three are the county code within the state.POOJurisdictionalAgencyThe agency having land and resource management responsibility for a incident as provided by federal, state or local law. POOJurisdictionalUnitNWCG Unit Identifier to identify the unit with jurisdiction for the land where the point of origin of a fire falls. POOJurisdictionalUnitParentUnitThe unit ID for the parent entity, such as a BLM State Office or USFS Regional Office, that resides over the Jurisdictional Unit.POOLandownerCategoryMore specific classification of land ownership within land owner kinds identifying the deeded owner at the point of origin at the time of the incident.POOLandownerKindBroad classification of land ownership identifying the deeded owner at the point of origin at the time of the incident.POOProtectingAgencyIndicates the agency that has protection responsibility at the point of origin.POOProtectingUnitNWCG Unit responsible for providing direct incident management and services to a an incident pursuant to its jurisdictional responsibility or as specified by law, contract or agreement. Definition Extension: - Protection can be re-assigned by agreement. - The nature and extent of the incident determines protection (for example Wildfire vs. All Hazard.)POOStateThe State alpha code identifying the state or equivalent entity at point of origin.PredominantFuelGroupThe fuel majority fuel model type that best represents fire behavior in the incident area, grouped into one of seven categories.PredominantFuelModelDescribes the type of fuels found within the majority of the incident area. UniqueFireIdentifierUnique identifier assigned to each wildland fire. yyyy = calendar year, SSUUUU = POO protecting unit identifier (5 or 6 characters), xxxxxx = local incident identifier (6 to 10 characters) FORIDUnique identifier assigned to each incident record in the FODR database.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
Data points present in this dataset were obtained following the subsequent steps: To assess the secretion efficiency of the constructs, 96 colonies from the selection plates were evaluated using the workflow presented in Figure Workflow. We picked transformed colonies and cultured in 400 μL TAP medium for 7 days in Deep-well plates (Corning Axygen®, No.: PDW500CS, Thermo Fisher Scientific Inc., Waltham, MA), covered with Breathe-Easy® (Sigma-Aldrich®). Cultivation was performed on a rotary shaker, set to 150 rpm, under constant illumination (50 μmol photons/m2s). Then 100 μL sample were transferred clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA) and fluorescence was measured using an Infinite® M200 PRO plate reader (Tecan, Männedorf, Switzerland). Fluorescence was measured at excitation 575/9 nm and emission 608/20 nm. Supernatant samples were obtained by spinning Deep-well plates at 3000 × g for 10 min and transferring 100 μL from each well to the clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA), followed by fluorescence measurement. To compare the constructs, R Statistic version 3.3.3 was used to perform one-way ANOVA (with Tukey's test), and to test statistical hypotheses, the significance level was set at 0.05. Graphs were generated in RStudio v1.0.136. The codes are deposit herein.
Info
ANOVA_Turkey_Sub.R -> code for ANOVA analysis in R statistic 3.3.3
barplot_R.R -> code to generate bar plot in R statistic 3.3.3
boxplotv2.R -> code to generate boxplot in R statistic 3.3.3
pRFU_+_bk.csv -> relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
sup_+_bl.csv -> supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
sup_raw.csv -> supernatant mCherry fluorescence dataset of 96 colonies for each construct.
who_+_bl2.csv -> whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
who_raw.csv -> whole culture mCherry fluorescence dataset of 96 colonies for each construct.
who_+_Chlo.csv -> whole culture chlorophyll fluorescence dataset of 96 colonies for each construct.
Anova_Output_Summary_Guide.pdf -> Explain the ANOVA files content
ANOVA_pRFU_+_bk.doc -> ANOVA of relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
ANOVA_sup_+_bk.doc -> ANOVA of supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
ANOVA_who_+_bk.doc -> ANOVA of whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
ANOVA_Chlo.doc -> ANOVA of whole culture chlorophyll fluorescence of all constructs, plus average and standard deviation values.
Consider citing our work.
Molino JVD, de Carvalho JCM, Mayfield SP (2018) Comparison of secretory signal peptides for heterologous protein expression in microalgae: Expanding the secretion portfolio for Chlamydomonas reinhardtii. PLoS ONE 13(2): e0192433. https://doi.org/10.1371/journal. pone.0192433
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Samples are taken from sampling points round the country and then analysed by laboratories to measure aspects of the water quality or the environment at the sampling point. The archive provides data on these measurements and samples dating from 2000 to present day. It contains 58 million measurements on nearly 4 million samples from 58 thousand sampling points.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the first batch of WiFi RSS RTT datasets with LOS conditions we published. Please see https://doi.org/10.5281/zenodo.11558792 for the second batch.
Please do use version 2 for better quality.
We provide publicly available datasets of three different indoor scenarios: building floor, office and apartment. The datasets contain both WiFi RSS and RTT signal measures with groud truth coordinates label and LOS condition label.
1.Building Floor
This is a detailed WiFi RTT and RSS dataset of a whole floor of a university building, of moare than 92 x 15 square metres. We divided the area of interest was divided into discrete grids and labelled them with correct ground truth coordinates and the LoS APs from the grid. The dataset contains WiFi RTT and RSS signal measures recorded in 642 reference points for 3 days and is well separated so that training points and testing points will not overlap.
Office scenario is of more than 4.5 x 5.5 square metres. 3 APs are set to cover the whole space. At least two LOS AP could be seen at any reference point (RP).
3.Apartment
Apartment scenario is of more than 7.7 x 9.4 square metres.Four APs were leveraged to generate WiFi signal measures for this testbed. Note that AP 1 in the apartment dataset was positioned so that it could had an NLOS path to most of the testbed.
Collection methodology
The APs utilised were Google WiFi Router AC-1304, the smartphone used to collect the data was Google Pixel 3 with Android 9.
The ground truth coordinates were collected using fixed tile size on the floor and manual post-it note markers.
Only RTT-enabled APs were included in the dataset.
The features of the datasets
The features of the building floor dataset are as follows:
Testbed area: 92 Ă— 15 m2
Grid size: 0.6 Ă— 0.6 m2
Number of AP: 13
Number of reference points: 642
Samples per reference point: 120
Number of all data samples: 77040
Number of training samples: 57960
Number of testing samples: 19080
Signal measure: WiFi RTT, WiFi RSS
Collection time interval: 3 days
The features of the office dataset are as follows:
Testbed area: 4.5 Ă— 5.5 m2
Grid size: 0.455 Ă— 0.455 m2
Number of AP: 3
Reference points: 37
Samples per reference point: 120
Data samples: 4,440
Training samples: 3,240
Testing samples: 1,200
Signal measure: WiFi RTT, WiFi RSS
Other information: LOS condition of every AP
Collection time: 1 day
Notes: A LOS scenario
The features of the apartment dataset are as follows:
Testbed area: 7.7 Ă— 9.4 m2
Grid size: 0.48 Ă— 0.48 m2
Number of AP: 4
Reference points: 110
Samples per reference point: 120
Data samples: 13,200
Training samples: 9,720
Testing samples: 3,480
Signal measure: WiFi RTT, WiFi RSS
Other information: LOS condition of every AP
Collection time: 1 day
Notes: Contains an AP with NLOS paths for most of the RPs
Dataset explanation
The columns of the dataset are as follows:
Column 'X': the X coordinates of the sample.
Column 'Y': the Y coordinates of the sample.
Column 'AP1 RTT(mm)', 'AP2 RTT(mm)', ..., 'AP13 RTT(mm)': the RTT measure from corresponding AP at a reference point.
Column 'AP1 RSS(dBm)', 'AP2 RSS(dBm)', ..., 'AP13 RSS(dBm)': the RSS measure from corresponding AP at a reference point.
Column 'LOS APs': indicating which AP has a LOS to this reference point.
Please note:
The RSS value -200 dBm indicates that the AP is too far away from the current reference point and no signals could be heard from it.
The RTT value 100,000 mm indicates that no signal is received from the specific AP.
Citation request
When using this dataset, please cite the following two items:Feng, X., Nguyen, K. A., & Luo, Z. (2024). WiFi RTT RSS dataset for indoor positioning [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11558192@article{feng2023wifi, title={WiFi round-trip time (RTT) fingerprinting: an analysis of the properties and the performance in non-line-of-sight environments}, author={Feng, Xu and Nguyen, Khuong an and Luo, Zhiyuan}, journal={Journal of Location Based Services}, volume={17}, number={4}, pages={307--339}, year={2023}, publisher={Taylor & Francis} }
Facebook
Twitter** Inputs related to Analysis for additional reference:** 1. Why do we need customer Segmentation? As every customer is unique and can be targeted in different ways. The Customer segmentation plays an important role in this case. The segmentation helps to understand profiles of customers and can be helpful in defining cross sell/upsell/activation/acquisition strategies. 2. What is RFM Segmentation? RFM Segmentation is an acronym of recency, frequency and monetary based segmentation. Recency is about when the last order of a customer. It means the number of days since a customer made the last purchase. If it’s a case for a website or an app, this could be interpreted as the last visit day or the last login time. Frequency is about the number of purchases in a given period. It could be 3 months, 6 months or 1 year. So we can understand this value as for how often or how many customers used the product of a company. The bigger the value is, the more engaged the customers are. Alternatively We can define, average duration between two transactions Monetary is the total amount of money a customer spent in that given period. Therefore big spenders will be differentiated with other customers such as MVP or VIP. 3. What is LTV and How to define it? In the current world, almost every retailer promotes its subscription and this is further used to understand the customer lifetime. Retailer can manage these customers in better manner if they know which customer is high life time value. Customer lifetime value (LTV) can also be defined as the monetary value of a customer relationship, based on the present value of the projected future cash flows from the customer relationship. Customer lifetime value is an important concept in that it encourages firms to shift their focus from quarterly profits to the long-term health of their customer relationships. Customer lifetime value is an important metric because it represents an upper limit on spending to acquire new customers. For this reason it is an important element in calculating payback of advertising spent in marketing mix modelling. 4. Why do need to predict Customer Lifetime Value? The LTV is an important building block in campaign design and marketing mix management. Although targeting models can help to identify the right customers to be targeted, LTV analysis can help to quantify the expected outcome of targeting in terms of revenues and profits. The LTV is also important because other major metrics and decision thresholds can be derived from it. For example, the LTV is naturally an upper limit on the spending to acquire a customer, and the sum of the LTVs for all of the customers of a brand, known as the customer equity, is a major metric forbusiness valuations. Similarly to many other problems of marketing analytics and algorithmic marketing, LTV modelling can be approached from descriptive, predictive, and prescriptive perspectives. 5. How Next Purchase Day helps to Retailers? Our objective is to analyse when our customer will purchase products in the future so for such customers we can build strategy and can come up with strategies and marketing campaigns accordingly. a. Group-1: Customers who will purchase in more than 60 days b. Group-2: Customers who will purchase in 30-60 days c. Group-3: Customers who will purchase in 0-30 days 6. What is Cohort Analysis? How it will be helpful? A cohort is a group of users who share a common characteristic that is identified in this report by an Analytics dimension. For example, all users with the same Acquisition Date belong to the same cohort. The Cohort Analysis report lets you isolate and analyze cohort behaviour. Cohort analysis in e-commerce means to monitor your customers’ behaviour based on common traits they share – the first product they bought, when they became customers, etc. - - to find patterns and tailor marketing activities for the group.
Transaction data has been provided for the period of 1st Jan 2019 to 31st Dec 2019. The below data sets have been provided. Online_Sales.csv: This file contains actual orders data (point of Sales data) at transaction level with below variables. CustomerID: Customer unique ID Transaction_ID: Transaction Unique ID Transaction_Date: Date of Transaction Product_SKU: SKU ID – Unique Id for product Product_Description: Product Description Product_Cateogry: Product Category Quantity: Number of items ordered Avg_Price: Price per one quantity Delivery_Charges: Charges for delivery Coupon_Status: Any discount coupon applied Customers_Data.csv: This file contains customer’s demographics. CustomerID: Customer Unique ID Gender: Gender of customer Location: Location of Customer Tenure_Months: Tenure in Months Discount_Coupon.csv: Discount coupons have been given for different categories in different months Month: Discount coupon applied in that month Product_Category: Product categor...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Center Point population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Center Point across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2022, the population of Center Point was 2,559, a 0.47% decrease year-by-year from 2021. Previously, in 2021, Center Point population was 2,571, a decline of 0.54% compared to a population of 2,585 in 2020. Over the last 20 plus years, between 2000 and 2022, population of Center Point increased by 545. In this period, the peak population was 2,585 in the year 2020. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Center Point Population by Year. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the West Point population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of West Point across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2023, the population of West Point was 12,479, a 5.01% increase year-by-year from 2022. Previously, in 2022, West Point population was 11,884, an increase of 3.75% compared to a population of 11,455 in 2021. Over the last 20 plus years, between 2000 and 2023, population of West Point increased by 6,371. In this period, the peak population was 12,479 in the year 2023. The numbers suggest that the population has not reached its peak yet and is showing a trend of further growth. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for West Point Population by Year. You can refer the same here
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The Iris dataset is a classic dataset in the field of machine learning and statistics. It's often used for demonstrating various data analysis, machine learning, and statistical techniques. Here are some key details about it:
Background - Origin: The dataset was introduced by the British statistician and biologist Ronald Fisher in his 1936 paper titled "The use of multiple measurements in taxonomic problems." - Purpose: Fisher developed the dataset as an example of linear discriminant analysis.
Data Composition - Data Points: The dataset consists of 150 samples from three species of Iris flowers: Iris Setosa, Iris Versicolour, and Iris Virginica. - Features: There are four features measured in centimeters for each sample: 1. Sepal Length 2. Sepal Width 3. Petal Length 4. Petal Width - Classes: The dataset contains three classes, corresponding to the three species of Iris. Each class has 50 samples.
Usage - Classification: The Iris dataset is widely used for classification tasks, especially to illustrate the principles of supervised machine learning algorithms. - Testing Algorithms: It's often used to test out algorithms for linear regression, classification, and clustering due to its simplicity and small size. - Educational Purpose: Because of its clarity and simplicity, it's frequently used in teaching data science and machine learning.
Characteristics - Simple and Clean: The dataset is straightforward, with minimal preprocessing required, making it ideal for beginners. - Well-Behaved Classes: The species are relatively well separated, though there's some overlap between Versicolor and Virginica. - Multivariate Data: It involves understanding the relationship between multiple variables (the four features).
Applications - Benchmarking: The Iris dataset serves as a benchmark for evaluating the performance of different algorithms. - Visualization**: It's great for practicing data visualization, especially for exploring techniques like scatter plots, box plots, and pair plots to understand feature relationships.
Despite its simplicity, the Iris dataset remains one of the most famous datasets in the world of data science and machine learning. It serves as an excellent starting point for anyone new to the field and remains a baseline for testing algorithms and teaching concepts.